Denigrate Comment Detection in Low-Resource Hindi Language Using Attention-Based Residual Networks

Author(s):  
Saurabh R. Sangwan ◽  
M. P. S. Bhatia

Cyberspace has been recognized as a conducive environment for use of various hostile, direct, and indirect behavioural tactics to target individuals or groups. Denigration is one of the most frequently used cyberbullying ploys to actively damage, humiliate, and disparage the online reputation of target by sending, posting, or publishing cruel rumours, gossip, and untrue statements. Previous pertinent studies report detecting profane, vulgar, and offensive words primarily in the English language. This research puts forward a model to detect online denigration bullying in low-resource Hindi language using attention residual networks. The proposed model Hindi Denigrate Comment–Attention Residual Network (HDC-ARN) intends to uncover defamatory posts (denigrate comments) written in Hindi language which stake and vilify a person or an entity in public. Data with 942 denigrate comments and 1499 non-denigrate comments is scraped using certain hashtags from two recent trending events in India: Tablighi Jamaat spiked Covid-19 (April 2020, Event 1) and Sushant Singh Rajput Death (June 2020: Event 2). Only text-based features, that is, the actual content of the post, are considered. The pre-trained word embedding for Hindi language from fastText is used. The model has three ResNet blocks with an attention layer that generates a post vector for a single input, which is passed through a sigmoid activation function to get the final output as either denigrate (positive class) or non-denigrate (negative class). An F-1 score of 0.642 is achieved on the dataset.

Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Huaping Guo ◽  
Weimei Zhi ◽  
Hongbing Liu ◽  
Mingliang Xu

In recent years, imbalanced learning problem has attracted more and more attentions from both academia and industry, and the problem is concerned with the performance of learning algorithms in the presence of data with severe class distribution skews. In this paper, we apply the well-known statistical model logistic discrimination to this problem and propose a novel method to improve its performance. To fully consider the class imbalance, we design a new cost function which takes into account the accuracies of both positive class and negative class as well as the precision of positive class. Unlike traditional logistic discrimination, the proposed method learns its parameters by maximizing the proposed cost function. Experimental results show that, compared with other state-of-the-art methods, the proposed one shows significantly better performance on measures of recall,g-mean,f-measure, AUC, and accuracy.


2017 ◽  
Vol 8 (4) ◽  
pp. 99-112 ◽  
Author(s):  
Rojalina Priyadarshini ◽  
Rabindra Kumar Barik ◽  
Nilamadhab Dash ◽  
Brojo Kishore Mishra ◽  
Rachita Misra

Lots of research has been carried out globally to design a machine classifier which could predict it from some physical and bio-medical parameters. In this work a hybrid machine learning classifier has been proposed to design an artificial predictor to correctly classify diabetic and non-diabetic people. The classifier is an amalgamation of the widely used K-means algorithm and Gravitational search algorithm (GSA). GSA has been used as an optimization tool which will compute the best centroids from the two classes of training data; the positive class (who are diabetic) and negative class (who are non-diabetic). In K-means algorithm instead of using random samples as initial cluster head, the optimized centroids from GSA are used as the cluster centers. The inherent problem associated with k-means algorithm is the initial placement of cluster centers, which may cause convergence delay thereby degrading the overall performance. This problem is tried to overcome by using a combined GSA and K-means.


Author(s):  
Dedi Irwansyah

The emerging interest in using literature to teach English has not yet highlighted the significance of Islamic literature within Indonesian educational context. This article presents the portrayal of Islamic literature in English language teaching (ELT) study area and offers a possible conceptual model of integrating Islamic literature into ELT. Following a library research method, with the corpus consisting of fourteen stories and one poem derived from fifteen books, the findings of this study show that: most works of Islamic literature are designed for fluent readers; the presentation of Islamic literature is dominated by Middle East and Western writers; and the Western writers are not always sensitive to the symbols glorified by Muslim English learners in Indonesia. As to deal with the above findings, this study proposes a conceptual model consisting of input, process, and output elements. Not only does the proposed model strengthen the position of Islamic literature, but it also integrates the Islamic literature into English language teaching so that it could reach both fluent readers and beginning readers. The output of the proposed model, abridged and unabridged texts of the Islamic literature, can be utilized to teach vocabulary, grammar, the four basic skills of language, and Islamic values. 


2021 ◽  
Author(s):  
Maria Catherine Bernolo Otero ◽  
Lyre Anni E. Murao ◽  
Mary Antoinette G. Limen ◽  
Paul Lorenzo A. Gaite ◽  
Michael G. Bacus ◽  
...  

Background: Over 50 countries have used Wastewater-Based Epidemiology (WBE) and whole-genome sequencing (WGS) for SARS-CoV-2 for monitoring COVID-19 cases. COVID-19 surveillance in the Philippines relies on clinical monitoring and contact tracing, with both having limited use in early detection or prediction of community outbreaks. Complementary public health surveillance methods that can provide community-level infection data faster and using lesser resources must be explored. Objectives: This study piloted and assessed WBE and WGS as approaches for COVID-19 surveillance in low-resource and low-sanitation communities in Davao City, Philippines. Methods: Weekly wastewater samples were collected from six barangay community sewer pipes or creeks from November to December 2020. Samples were concentrated using a PEG-NaCl precipitation method and analyzed by RT-PCR to detect the SARS-CoV-2 N, RdRP, and E genes. In addition, SARS-CoV-2 RNA-positive samples were subjected to WGS for genomic mutation surveillance. Public data from clinical surveillance were also reviewed to interpret WBE data. Results: Twenty-two of the 24 samples (91.7%) obtained from the six barangays tested positive for SARS-CoV-2 RNA. The cycle threshold (Ct) values were correlated with RNA concentration and attack rate. Thirty-two SARS-CoV-2 mutations were detected in WGS, including novel non-synonymous mutations or indels in seven SARS-CoV-2 genes and ten mutations previously reported in the Philippines. Discussion: SARS-CoV-2 RNA was detected in community wastewater from the six barangays of Davao City, even when the barangays were classified as having a low risk of COVID-19 transmission and no new cases were reported. Despite the fragmented genome sequences analyzed, our genomic surveillance in wastewater confirmed the presence of previously reported mutations while identifying mutations not yet registered in clinical surveillance. The local context of a community must be considered when planning to adopt WBE and WGS as complementary COVID-19 surveillance methodologies, especially in low-sanitation and low-resource settings. Keywords: COVID-19, Philippines, SARS-CoV-2, Wastewater-Based Epidemiology, Whole Genome Sequencing.


Author(s):  
Arkadipta De ◽  
Dibyanayan Bandyopadhyay ◽  
Baban Gain ◽  
Asif Ekbal

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.


2021 ◽  
Vol 11 (1) ◽  
pp. 491-508
Author(s):  
Monika Lamba ◽  
Yogita Gigras ◽  
Anuradha Dhull

Abstract Detection of plant disease has a crucial role in better understanding the economy of India in terms of agricultural productivity. Early recognition and categorization of diseases in plants are very crucial as it can adversely affect the growth and development of species. Numerous machine learning methods like SVM (support vector machine), random forest, KNN (k-nearest neighbor), Naïve Bayes, decision tree, etc., have been exploited for recognition, discovery, and categorization of plant diseases; however, the advancement of machine learning by DL (deep learning) is supposed to possess tremendous potential in enhancing the accuracy. This paper proposed a model comprising of Auto-Color Correlogram as image filter and DL as classifiers with different activation functions for plant disease. This proposed model is implemented on four different datasets to solve binary and multiclass subcategories of plant diseases. Using the proposed model, results achieved are better, obtaining 99.4% accuracy and 99.9% sensitivity for binary class and 99.2% accuracy for multiclass. It is proven that the proposed model outperforms other approaches, namely LibSVM, SMO (sequential minimal optimization), and DL with activation function softmax and softsign in terms of F-measure, recall, MCC (Matthews correlation coefficient), specificity and sensitivity.


Author(s):  
Muhammad Khaleel ◽  
Shankar Chelliah

This chapter discusses the significance of employee wellbeing at the workplace and self-perceived English language proficiency as a predictor variable. The importance of employee wellbeing has been recognized all around the world. To generalize the findings of previous literature this study has examined the proposed model in the context of telecom MNCs in Pakistan. This chapter starts with what is wellbeing at the workplace? And moves towards it significance in the context of developed and underdeveloped countries. Further, this chapter explains the empirical findings of the proposed model. The results revealed a strong correlation between self-perceived English language proficiency and dimensions of employee wellbeing at the workplace. This is a very important chapter for both researchers and managers.


2022 ◽  
pp. 324-344
Author(s):  
Muhammad Khaleel ◽  
Shankar Chelliah

This chapter discusses the significance of employee wellbeing at the workplace and self-perceived English language proficiency as a predictor variable. The importance of employee wellbeing has been recognized all around the world. To generalize the findings of previous literature this study has examined the proposed model in the context of telecom MNCs in Pakistan. This chapter starts with what is wellbeing at the workplace? And moves towards it significance in the context of developed and underdeveloped countries. Further, this chapter explains the empirical findings of the proposed model. The results revealed a strong correlation between self-perceived English language proficiency and dimensions of employee wellbeing at the workplace. This is a very important chapter for both researchers and managers.


2020 ◽  
Vol 12 (2) ◽  
pp. 21-34
Author(s):  
Mostefai Abdelkader

In recent years, increasing attention is being paid to sentiment analysis on microblogging platforms such as Twitter. Sentiment analysis refers to the task of detecting whether a textual item (e.g., a tweet) contains an opinion about a topic. This paper proposes a probabilistic deep learning approach for sentiments analysis. The deep learning model used is a convolutional neural network (CNN). The main contribution of this approach is a new probabilistic representation of the text to be fed as input to the CNN. This representation is a matrix that stores for each word composing the message the probability that it belongs to a positive class and the probability that it belongs to a negative class. The proposed approach is evaluated on four well-known datasets HCR, OMD, STS-gold, and a dataset provided by the SemEval-2017 Workshop. The results of the experiments show that the proposed approach competes with the state-of-the-art sentiment analyzers and has the potential to detect sentiments from textual data in an effective manner.


Sign in / Sign up

Export Citation Format

Share Document