In this paper, we propose to use deep reinforcement learning (DRL) for the task of cooperative spectrum sensing (CSS) in a cognitive radio network. We selected a recently proposed offline DRL method called conservative Q-learning (CQL) due to its ability to learn complex data distributions efficiently. The task of CSS is performed as follows. Each secondary user (SU) performs local sensing and using CQL algorithm, determines the presence of licensed user for current and k-1 future timeslots. These results are forwarded to the fusion centre where another CQL algorithm is operating that generates a global decision for the current and k-1 future timeslots. Then, SUs do not perform sensing for the next k-1 timeslots to save energy. The proposed CSS mechanism can significantly increase the licensed user detection accuracy and the data transmissions by SUs. In addition, it reduces the sensing results transmission overhead. The proposed solution is tested with a stochastic traffic load model for different activity patterns. Our simulation results show that the proposed problem formulation using the CQL algorithm can achieve similar detection accuracy as other state-of-the-art methods for CSS while significantly reducing the computation time.