Automatic speaker verification is an active research area for more than four decades, and the technology has gradually upgraded for real application. In this paper, a hybrid convolutional neural network (CNN) model is proposed where a combination of the 3D CNN & 2D CNN model is used for speaker verification in the text-independent scenario. For speaker verification, this novel convolutional neural network architecture was built to capture and discard speaker and non-speaker information at the same time. In the training process, the network is trained to differentiate between different identities of a speaker to establish the background model. The model development of the speaker is one of the important aspects. Most conventional techniques employed the d-vector system to create speaker models by means of an average of the features collected from the speaker utterance. Here a hybrid of convolutional neural networks model is utilized in the development and registration phases for building a speaker model. The approach suggested exceeds the existing methods of speaker verification.