scholarly journals Content Noise Detection Model Using Deep Learning in Web Forums

2020 ◽  
Vol 12 (12) ◽  
pp. 5074
Author(s):  
Jiyoung Woo ◽  
Jaeseok Yun

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.

2021 ◽  
Vol 11 (17) ◽  
pp. 7940
Author(s):  
Mohammed Al-Sarem ◽  
Abdullah Alsaeedi ◽  
Faisal Saeed ◽  
Wadii Boulila ◽  
Omair AmeerBakhsh

Spreading rumors in social media is considered under cybercrimes that affect people, societies, and governments. For instance, some criminals create rumors and send them on the internet, then other people help them to spread it. Spreading rumors can be an example of cyber abuse, where rumors or lies about the victim are posted on the internet to send threatening messages or to share the victim’s personal information. During pandemics, a large amount of rumors spreads on social media very fast, which have dramatic effects on people’s health. Detecting these rumors manually by the authorities is very difficult in these open platforms. Therefore, several researchers conducted studies on utilizing intelligent methods for detecting such rumors. The detection methods can be classified mainly into machine learning-based and deep learning-based methods. The deep learning methods have comparative advantages against machine learning ones as they do not require preprocessing and feature engineering processes and their performance showed superior enhancements in many fields. Therefore, this paper aims to propose a Novel Hybrid Deep Learning Model for Detecting COVID-19-related Rumors on Social Media (LSTM–PCNN). The proposed model is based on a Long Short-Term Memory (LSTM) and Concatenated Parallel Convolutional Neural Networks (PCNN). The experiments were conducted on an ArCOV-19 dataset that included 3157 tweets; 1480 of them were rumors (46.87%) and 1677 tweets were non-rumors (53.12%). The findings of the proposed model showed a superior performance compared to other methods in terms of accuracy, recall, precision, and F-score.


2021 ◽  
Vol 22 (16) ◽  
pp. 9054
Author(s):  
Wei Du ◽  
Xuan Zhao ◽  
Yu Sun ◽  
Lei Zheng ◽  
Ying Li ◽  
...  

Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.


2020 ◽  
Author(s):  
Zakhriya Alhassan ◽  
MATTHEW WATSON ◽  
David Budgen ◽  
Riyad Alshammari ◽  
Ali Alessan ◽  
...  

BACKGROUND Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems such as diabetes and cardiovascular diseases. Early preventive interventions based upon advanced predictive models using electronic health records (EHR) data for such patients can ultimately help provide better health outcomes. OBJECTIVE Our study investigates the performance of predictive models to forecast HbA1c elevation levels by employing machine learning approaches using data from current and previous visits in the EHR systems for patients who had not been previously diagnosed with any type of diabetes. METHODS This study employed one statistical model and three commonly used conventional machine learning models, as well as a deep learning model, to predict patients’ current levels of HbA1c. For the deep learning model, we also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and have an understanding of the reasons behind the models' decisions. All models were trained and tested using a large and naturally balanced dataset from Saudi Arabia with 18,844 unique patient records. RESULTS The machine learning models achieved the best results for predicting current HbA1c elevation risk. The deep learning model outperformed the statistical and conventional machine learning models with respect to all reported measures when employing time-series data. The best performing model was the multi-layer perceptron (MLP) which achieved an accuracy of 74.52% when used with historical data. CONCLUSIONS This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels. For deep learning in particular, utilizing the patient's longitudinal time-series data improved the performance and affected the relative importance for the predictors used. The models showed robust results that were consistent with comparable studies.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Bader Alouffi ◽  
Abdullah Alharbi ◽  
Radhya Sahal ◽  
Hager Saleh

Fake news is challenging to detect due to mixing accurate and inaccurate information from reliable and unreliable sources. Social media is a data source that is not trustworthy all the time, especially in the COVID-19 outbreak. During the COVID-19 epidemic, fake news is widely spread. The best way to deal with this is early detection. Accordingly, in this work, we have proposed a hybrid deep learning model that uses convolutional neural network (CNN) and long short-term memory (LSTM) to detect COVID-19 fake news. The proposed model consists of some layers: an embedding layer, a convolutional layer, a pooling layer, an LSTM layer, a flatten layer, a dense layer, and an output layer. For experimental results, three COVID-19 fake news datasets are used to evaluate six machine learning models, two deep learning models, and our proposed model. The machine learning models are DT, KNN, LR, RF, SVM, and NB, while the deep learning models are CNN and LSTM. Also, four matrices are used to validate the results: accuracy, precision, recall, and F1-measure. The conducted experiments show that the proposed model outperforms the six machine learning models and the two deep learning models. Consequently, the proposed system is capable of detecting the fake news of COVID-19 significantly.


2021 ◽  
Vol 53 (2) ◽  
Author(s):  
Sen Yang ◽  
Yaping Zhang ◽  
Siu-Yeung Cho ◽  
Ricardo Correia ◽  
Stephen P. Morgan

AbstractConventional blood pressure (BP) measurement methods have different drawbacks such as being invasive, cuff-based or requiring manual operations. There is significant interest in the development of non-invasive, cuff-less and continual BP measurement based on physiological measurement. However, in these methods, extracting features from signals is challenging in the presence of noise or signal distortion. When using machine learning, errors in feature extraction result in errors in BP estimation, therefore, this study explores the use of raw signals as a direct input to a deep learning model. To enable comparison with the traditional machine learning models which use features from the photoplethysmogram and electrocardiogram, a hybrid deep learning model that utilises both raw signals and physical characteristics (age, height, weight and gender) is developed. This hybrid model performs best in terms of both diastolic BP (DBP) and systolic BP (SBP) with the mean absolute error being 3.23 ± 4.75 mmHg and 4.43 ± 6.09 mmHg respectively. DBP and SBP meet the Grade A and Grade B performance requirements of the British Hypertension Society respectively.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rajat Garg ◽  
Anil Kumar ◽  
Nikunj Bansal ◽  
Manish Prateek ◽  
Shashi Kumar

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.


Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 39
Author(s):  
Zhiyuan Xie ◽  
Shichang Du ◽  
Jun Lv ◽  
Yafei Deng ◽  
Shiyao Jia

Remaining Useful Life (RUL) prediction is significant in indicating the health status of the sophisticated equipment, and it requires historical data because of its complexity. The number and complexity of such environmental parameters as vibration and temperature can cause non-linear states of data, making prediction tremendously difficult. Conventional machine learning models such as support vector machine (SVM), random forest, and back propagation neural network (BPNN), however, have limited capacity to predict accurately. In this paper, a two-phase deep-learning-model attention-convolutional forget-gate recurrent network (AM-ConvFGRNET) for RUL prediction is proposed. The first phase, forget-gate convolutional recurrent network (ConvFGRNET) is proposed based on a one-dimensional analog long short-term memory (LSTM), which removes all the gates except the forget gate and uses chrono-initialized biases. The second phase is the attention mechanism, which ensures the model to extract more specific features for generating an output, compensating the drawbacks of the FGRNET that it is a black box model and improving the interpretability. The performance and effectiveness of AM-ConvFGRNET for RUL prediction is validated by comparing it with other machine learning methods and deep learning methods on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset and a dataset of ball screw experiment.


2021 ◽  
Author(s):  
Lukman Ismael ◽  
Pejman Rasti ◽  
Florian Bernard ◽  
Philippe Menei ◽  
Aram Ter Minassian ◽  
...  

BACKGROUND The functional MRI (fMRI) is an essential tool for the presurgical planning of brain tumor removal, allowing the identification of functional brain networks in order to preserve the patient’s neurological functions. One fMRI technique used to identify the functional brain network is the resting-state-fMRI (rsfMRI). However, this technique is not routinely used because of the necessity to have a expert reviewer to identify manually each functional networks. OBJECTIVE We aimed to automatize the detection of brain functional networks in rsfMRI data using deep learning and machine learning algorithms METHODS We used the rsfMRI data of 82 healthy patients to test the diagnostic performance of our proposed end-to-end deep learning model to the reference functional networks identified manually by 2 expert reviewers. RESULTS Experiment results show the best performance of 86% correct recognition rate obtained from the proposed deep learning architecture which shows its superiority over other machine learning algorithms that were equally tested for this classification task. CONCLUSIONS The proposed end-to-end deep learning model was the most performant machine learning algorithm. The use of this model to automatize the functional networks detection in rsfMRI may allow to broaden the use of the rsfMRI, allowing the presurgical identification of these networks and thus help to preserve the patient’s neurological status. CLINICALTRIAL Comité de protection des personnes Ouest II, decision reference CPP 2012-25)


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


Sign in / Sign up

Export Citation Format

Share Document