Enriching Conventional Ensemble Learner with Deep Contextual Semantics to Detect Fake News in Urdu

Ramsha Saeed ◽  
Hammad Afzal ◽  
Haider Abbas ◽  
Maheen Fatima

Increased connectivity has contributed greatly in facilitating rapid access to information and reliable communication. However, the uncontrolled information dissemination has also resulted in the spread of fake news. Fake news might be spread by a group of people or organizations to serve ulterior motives such as political or financial gains or to damage a country’s public image. Given the importance of timely detection of fake news, the research area has intrigued researchers from all over the world. Most of the work for detecting fake news focuses on the English language. However, automated detection of fake news is important irrespective of the language used for spreading false information. Recognizing the importance of boosting research on fake news detection for low resource languages, this work proposes a novel semantically enriched technique to effectively detect fake news in Urdu—a low resource language. A model based on deep contextual semantics learned from the convolutional neural network is proposed. The features learned from the convolutional neural network are combined with other n-gram-based features and are fed to a conventional majority voting ensemble classifier fitted with three base learners: Adaptive Boosting, Gradient Boosting, and Multi-Layer Perceptron. Experiments are performed with different models, and results show that enriching the traditional ensemble learner with deep contextual semantics along with other standard features shows the best results and outperforms the state-of-the-art Urdu fake news detection model.

Ahmed Nasser ◽  
Huthaifa AL-Khazraji

<p>Predictive maintenance (PdM) is a successful strategy used to reduce cost by minimizing the breakdown stoppages and production loss. The massive amount of data that results from the integration between the physical and digital systems of the production process makes it possible for deep learning (DL) algorithms to be applied and utilized for fault prediction and diagnosis. This paper presents a hybrid convolutional neural network based and long short-term memory network (CNN-LSTM) approach to a predictive maintenance problem. The proposed CNN-LSTM approach enhances the predictive accuracy and also reduces the complexity of the model. To evaluate the proposed model, two comparisons with regular LSTM and gradient boosting decision tree (GBDT) methods using a freely available dataset have been made. The PdM model based on CNN-LSTM method demonstrates better prediction accuracy compared to the regular LSTM, where the average F-Score increases form 93.34% in the case of regular LSTM to 97.48% for the proposed CNN-LSTM. Compared to the related works the proposed hybrid CNN-LSTM PdM approach achieved better results in term of accuracy.</p>

Oguz Akbilgic ◽  
Liam Butler ◽  
Ibrahim Karabayir ◽  
Patricia P Chang ◽  
Dalane W Kitzman ◽  

Abstract Aims Heart failure (HF) is a leading cause of death. Early intervention is the key to reduce HF-related morbidity and mortality. This study assesses the utility of electrocardiograms (ECGs) in HF risk prediction. Methods and results Data from the baseline visits (1987–89) of the Atherosclerosis Risk in Communities (ARIC) study was used. Incident hospitalized HF events were ascertained by ICD codes. Participants with good quality baseline ECGs were included. Participants with prevalent HF were excluded. ECG-artificial intelligence (AI) model to predict HF was created as a deep residual convolutional neural network (CNN) utilizing standard 12-lead ECG. The area under the receiver operating characteristic curve (AUC) was used to evaluate prediction models including (CNN), light gradient boosting machines (LGBM), and Cox proportional hazards regression. A total of 14 613 (45% male, 73% of white, mean age ± standard deviation of 54 ± 5) participants were eligible. A total of 803 (5.5%) participants developed HF within 10 years from baseline. Convolutional neural network utilizing solely ECG achieved an AUC of 0.756 (0.717–0.795) on the hold-out test data. ARIC and Framingham Heart Study (FHS) HF risk calculators yielded AUC of 0.802 (0.750–0.850) and 0.780 (0.740–0.830). The highest AUC of 0.818 (0.778–0.859) was obtained when ECG-AI model output, age, gender, race, body mass index, smoking status, prevalent coronary heart disease, diabetes mellitus, systolic blood pressure, and heart rate were used as predictors of HF within LGBM. The ECG-AI model output was the most important predictor of HF. Conclusions ECG-AI model based solely on information extracted from ECG independently predicts HF with accuracy comparable to existing FHS and ARIC risk calculators.

Arkadipta De ◽  
Dibyanayan Bandyopadhyay ◽  
Baban Gain ◽  
Asif Ekbal

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.

Erfan Ghadery ◽  
Sajad Movahedi ◽  
Heshaam Faili ◽  
Azadeh Shakery

The advent of the Internet has caused a significant growth in the number of opinions expressed about products or services on e-commerce websites. Aspect category detection, which is one of the challenging subtasks of aspect-based sentiment analysis, deals with categorizing a given review sentence into a set of predefined categories. Most of the research efforts in this field are devoted to English language reviews, while there are a large number of reviews in other languages that are left unexplored. In this paper, we propose a multilingual method to perform aspect category detection on reviews in different languages, which makes use of a deep convolutional neural network with multilingual word embeddings. To the best of our knowledge, our method is the first attempt at performing aspect category detection on multiple languages simultaneously. Empirical results on the multilingual dataset provided by SemEval workshop demonstrate the effectiveness of the proposed method1.

Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 256 ◽  
Jiangyong An ◽  
Wanyi Li ◽  
Maosong Li ◽  
Sanrong Cui ◽  
Huanran Yue

Drought stress seriously affects crop growth, development, and grain production. Existing machine learning methods have achieved great progress in drought stress detection and diagnosis. However, such methods are based on a hand-crafted feature extraction process, and the accuracy has much room to improve. In this paper, we propose the use of a deep convolutional neural network (DCNN) to identify and classify maize drought stress. Field drought stress experiments were conducted in 2014. The experiment was divided into three treatments: optimum moisture, light drought, and moderate drought stress. Maize images were obtained every two hours throughout the whole day by digital cameras. In order to compare the accuracy of DCNN, a comparative experiment was conducted using traditional machine learning on the same dataset. The experimental results demonstrated an impressive performance of the proposed method. For the total dataset, the accuracy of the identification and classification of drought stress was 98.14% and 95.95%, respectively. High accuracy was also achieved on the sub-datasets of the seedling and jointing stages. The identification and classification accuracy levels of the color images were higher than those of the gray images. Furthermore, the comparison experiments on the same dataset demonstrated that DCNN achieved a better performance than the traditional machine learning method (Gradient Boosting Decision Tree GBDT). Overall, our proposed deep learning-based approach is a very promising method for field maize drought identification and classification based on digital images.

2020 ◽  
Vol 61 ◽  
pp. 32-44 ◽  
Rohit Kumar Kaliyar ◽  
Anurag Goswami ◽  
Pratik Narang ◽  
Soumendu Sinha

2019 ◽  
Vol 11 (23) ◽  
pp. 2788 ◽  
Uwe Knauer ◽  
Cornelius Styp von Rekowski ◽  
Marianne Stecklina ◽  
Tilman Krokotsch ◽  
Tuan Pham Minh ◽  

In this paper, we evaluate different popular voting strategies for fusion of classifier results. A convolutional neural network (CNN) and different variants of random forest (RF) classifiers were trained to discriminate between 15 tree species based on airborne hyperspectral imaging data. The spectral data was preprocessed with a multi-class linear discriminant analysis (MCLDA) as a means to reduce dimensionality and to obtain spatial–spectral features. The best individual classifier was a CNN with a classification accuracy of 0.73 +/− 0.086. The classification performance increased to an accuracy of 0.78 +/− 0.053 by using precision weighted voting for a hybrid ensemble of the CNN and two RF classifiers. This voting strategy clearly outperformed majority voting (0.74), accuracy weighted voting (0.75), and presidential voting (0.75).

Feng Qian ◽  
Chengyue Gong ◽  
Karishma Sharma ◽  
Yan Liu

Fake news on social media is a major challenge and studies have shown that fake news can propagate exponentially quickly in early stages. Therefore, we focus on early detection of fake news, and consider that only news article text is available at the time of detection, since additional information such as user responses and propagation patterns can be obtained only after the news spreads. However, we find historical user responses to previous articles are available and can be treated as soft semantic labels, that enrich the binary label of an article, by providing insights into why the article must be labeled as fake. We propose a novel Two-Level Convolutional Neural Network with User Response Generator (TCNN-URG) where TCNN captures semantic information from article text by representing it at the sentence and word level, and URG learns a generative model of user response to article text from historical user responses which it can use to generate responses to new articles in order to assist fake news detection. We conduct experiments on one available dataset and a larger dataset collected by ourselves. Experimental results show that TCNN-URG outperforms the baselines based on prior approaches that detect fake news from article text alone.

Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 387 ◽  
Sardar Parhat ◽  
Mijit Ablimit ◽  
Askar Hamdulla

In this paper, based on the multilingual morphological analyzer, we researched the similar low-resource languages, Uyghur and Kazakh, short text classification. Generally, the online linguistic resources of these languages are noisy. So a preprocessing is necessary and can significantly improve the accuracy. Uyghur and Kazakh are the languages with derivational morphology, in which words are coined by stems concatenated with suffixes. Usually, terms are used as the representation of text content while excluding functional parts as stop words in these languages. By extracting stems we can collect necessary terms and exclude stop words. Morpheme segmentation tool can split text into morphemes with 95% high reliability. After preparing both word- and morpheme-based training text corpora, we apply convolutional neural network (CNN) as a feature selection and text classification algorithm to perform text classification tasks. Experimental results show that the morpheme-based approach outperformed the word-based approach. Word embedding technique is frequently used in text representation both in the framework of neural networks and as a value expression, and can map language units into a sequential vector space based on context, and it is a natural way to extract and predict out-of-vocabulary (OOV) from context information. Multilingual morphological analysis has provided a convenient way for processing tasks of low resource languages like Uyghur and Kazakh.

Sign in / Sign up

Export Citation Format

Share Document