Sound Source Localization (SSL) and gaze shift to the sound source behavior is an integral part of a socially interactive humanoid robot perception system. In noisy and reverberant environments, it is non-trivial to estimate the location of a sound source and accurately shift gaze in its direction. Previous SSL algorithms are deficient in the optimum approximation of distance to audio sources and to accurately detect, interpret, and differentiate the actual sound from comparable sound sources due to challenging acoustic environments. In this article, a learning-based model is presented to achieve noiseless and reverberation-resistant sound source localization in the real-world scenarios. The proposed system utilizes a multi-layered Gaussian Cross-Correlation with Phase Transform (GCC-PHAT) signal processing technique as a baseline for a Generalized Cross Correlation Convolution Neural Network (GCC-CNN) model. The proposed model is integrated with an efficient rotation algorithm to predict and orient toward the sound source. The performance of the proposed method is compared with the state-of-art deep network-based sound source localization methods. The findings of the proposed method outperform the existing neural network-based approaches by achieving the highest accuracy of 96.21% for an active binaural auditory perceptual system.