In today’s era of digitization, social media platforms play a significant role in networking and influencing the perception of the general population. Social network sites have recently been used to carry out harmful attacks against individuals, including political and theological figures, intellectuals, sports and movie stars, and other prominent dignitaries, which may or may not be intentional. However, the exchange of such information across the general population inevitably contributes to social-economic, socio-political turmoil, and even physical violence in society. By classifying the derogatory content of a social media post, this research work helps to eradicate and discourage the upsetting propagation of such hate campaigns. Social networking posts today often include the picture of Memes along with textual remarks and comments, which throw new challenges and opportunities to the research community while identifying the attacks. This article proposes a multimodal deep learning framework by utilizing ensembles of computer vision and natural language processing techniques to train an encapsulated transformer network for handling the classification problem. The proposed framework utilizes the fine-tuned state-of-the-art deep learning-based models (e.g., BERT, Electra) for multilingual text analysis along with face recognition and the optical character recognition model for Meme picture comprehension. For the study, a new Facebook meme-post dataset is created with recorded baseline results. The subject of the created dataset and context of the work is more geared toward multilingual Indian society. The findings demonstrate the efficacy of the proposed method in the identification of social media meme posts featuring derogatory content about a famous/recognized individual.