A Two-Stage Machine Learning Classification Approach to Identify Extremism in Arabic Opinions

The increased usage of the Internet and social networks allowed and enabled people to express their views, which have generated an increasing attention lately. Sentiment Analysis (SA) techniques are used to determine the polarity of information, either positive or negative, toward a given topic, including opinions. In this research, we have introduced a machine learning approach based on Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) classifiers, to find and classify extreme opinions in Arabic reviews. To achieve this, a dataset of 1500 Arabic reviews was collected from Google Play Store. In addition, a two-stage Classification process was applied to classify the reviews. In the first stage, we built a binary classifier to sort out positive from negative reviews. In the second stage, however we applied a binary classification mechanism based on a set of proposed rules that distinguishes extreme positive from positive reviews, and extreme negative from negative reviews. Four major experiments were conducted with a total of 10 different sub experiments to fulfill the two-stage process using different X-validation schemas and Term Frequency-Inverse Document Frequency feature selection method. Obtained results have indicated that SVM was the best during the first stage classification with 30% testing data, and NB was the best with 20% testing data. The results of the second stage classification indicated that SVM has scored better results in identifying extreme positive reviews when dealing with the positive dataset with an overall accuracy of 68.7% and NB showed better accuracy results in identifying extreme negative reviews when dealing with the negative dataset, with an overall accuracy of 72.8%.

2021 ◽  
pp. 1-11
Author(s):  
Tianhong Dai ◽  
Shijie Cong ◽  
Jianping Huang ◽  
Yanwen Zhang ◽  
Xinwang Huang ◽  
...  

In agricultural production, weed removal is an important part of crop cultivation, but inevitably, other plants compete with crops for nutrients. Only by identifying and removing weeds can the quality of the harvest be guaranteed. Therefore, the distinction between weeds and crops is particularly important. Recently, deep learning technology has also been applied to the field of botany, and achieved good results. Convolutional neural networks are widely used in deep learning because of their excellent classification effects. The purpose of this article is to find a new method of plant seedling classification. This method includes two stages: image segmentation and image classification. The first stage is to use the improved U-Net to segment the dataset, and the second stage is to use six classification networks to classify the seedlings of the segmented dataset. The dataset used for the experiment contained 12 different types of plants, namely, 3 crops and 9 weeds. The model was evaluated by the multi-class statistical analysis of accuracy, recall, precision, and F1-score. The results show that the two-stage classification method combining the improved U-Net segmentation network and the classification network was more conducive to the classification of plant seedlings, and the classification accuracy reaches 97.7%.


Author(s):  
Ahmad Iwan Fadli ◽  
Selo Sulistyo ◽  
Sigit Wibowo

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation.  It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7417
Author(s):  
Alex J. Hope ◽  
Utkarsh Vashisth ◽  
Matthew J. Parker ◽  
Andreas B. Ralston ◽  
Joshua M. Roper ◽  
...  

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.


2020 ◽  
Vol 14 ◽  

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.


2018 ◽  
Vol 61 (6) ◽  
pp. 1831-1842 ◽  
Author(s):  
Yuzhen Lu ◽  
Renfu Lu

Abstract. Machine vision technology coupled with uniform illumination is now widely used for automatic sorting and grading of apples and other fruits, but it still does not have satisfactory performance for defect detection because of the large variety of defects, some of which are difficult to detect under uniform illumination. Structured-illumination reflectance imaging (SIRI) offers a new modality for imaging by using sinusoidally modulated structured illumination to obtain two sets of independent images: direct component (DC), which corresponds to conventional uniform illumination, and amplitude component (AC), which is unique for structured illumination. The objective of this study was to develop machine learning classification algorithms using DC and AC images and their combinations for enhanced detection of surface and subsurface defects of apples. A multispectral SIRI system with two phase-shifted sinusoidal illumination patterns was used to acquire images of ‘Delicious’ and ‘Golden Delicious’ apples with various types of surface and subsurface defects. DC and AC images were extracted through demodulation of the acquired images and were then enhanced using fast bi-dimensional empirical mode decomposition and subsequent image reconstruction. Defect detection algorithms were developed using random forest (RF), support vector machine (SVM), and convolutional neural network (CNN), for DC, AC, and ratio (AC divided by DC) images and their combinations. Results showed that AC images were superior to DC images for detecting subsurface defects, DC images were overall better than AC images for detecting surface defects, and ratio images were comparable to, or better than, DC and AC images for defect detection. The ensemble of DC, AC, and ratio images resulted in significantly better detection accuracies over using them individually. Among the three classifiers, CNN performed the best, with 98% detection accuracies for both varieties of apples, followed by SVM and RF. This research demonstrated that SIRI, coupled with a machine learning algorithm, can be a new, versatile, and effective modality for fruit defect detection. Keywords: Apple, Defect, Bi-dimensional empirical mode decomposition, Machine learning, Structured illumination.


2019 ◽  
Vol 58 (06) ◽  
pp. 205-212
Author(s):  
Cirruse Salehnasab ◽  
Abbas Hajifathali ◽  
Farkhondeh Asadi ◽  
Elham Roshandel ◽  
Alireza Kazemi ◽  
...  

Abstract Background The acute graft-versus-host disease (aGvHD) is the most important cause of mortality in patients receiving allogeneic hematopoietic stem cell transplantation. Given that it occurs at the stage of severe tissue damage, its diagnosis is late. With the advancement of machine learning (ML), promising real-time models to predict aGvHD have emerged. Objective This article aims to synthesize the literature on ML classification algorithms for predicting aGvHD, highlighting algorithms and important predictor variables used. Methods A systemic review of ML classification algorithms used to predict aGvHD was performed using a search of the PubMed, Embase, Web of Science, Scopus, Springer, and IEEE Xplore databases undertaken up to April 2019 based on Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statements. The studies with a focus on using the ML classification algorithms in the process of predicting of aGvHD were considered. Results After applying the inclusion and exclusion criteria, 14 studies were selected for evaluation. The results of the current analysis showed that the algorithms used were Artificial Neural Network (79%), Support Vector Machine (50%), Naive Bayes (43%), k-Nearest Neighbors (29%), Regression (29%), and Decision Trees (14%), respectively. Also, many predictor variables have been used in these studies so that we have divided them into more abstract categories, including biomarkers, demographics, infections, clinical, genes, transplants, drugs, and other variables. Conclusion Each of these ML algorithms has a particular characteristic and different proposed predictors. Therefore, it seems these ML algorithms have a high potential for predicting aGvHD if the process of modeling is performed correctly.


PLoS ONE ◽  
2018 ◽  
Vol 13 (6) ◽  
pp. e0199749
Author(s):  
Zhaopeng Deng ◽  
Maoyong Cao ◽  
Laxmisha Rai ◽  
Wei Gao

2019 ◽  
Vol 18 (1) ◽  
Author(s):  
Philip Heraud ◽  
Patutong Chatchawal ◽  
Molin Wongwattanakul ◽  
Patcharaporn Tippayawat ◽  
Christian Doerig ◽  
...  

Abstract Background Widespread elimination of malaria requires an ultra-sensitive detection method that can detect low parasitaemia levels seen in asymptomatic carriers who act as reservoirs for further transmission of the disease, but is inexpensive and easy to deploy in the field in low income settings. It was hypothesized that a new method of malaria detection based on infrared spectroscopy, shown in the laboratory to have similar sensitivity to PCR based detection, could prove effective in detecting malaria in a field setting using cheap portable units with data management systems allowing them to be used by users inexpert in spectroscopy. This study was designed to determine whether the methodology developed in the laboratory could be translated to the field to diagnose the presence of Plasmodium in the blood of patients presenting at hospital with symptoms of malaria, as a precursor to trials testing the sensitivity of to detect asymptomatic carriers. Methods The field study tested 318 patients presenting with suspected malaria at four regional clinics in Thailand. Two portable infrared spectrometers were employed, operated from a laptop computer or a mobile telephone with in-built software that guided the user through the simple measurement steps. Diagnostic modelling and validation testing using linear and machine learning approaches was performed against the gold standard qPCR. Sample spectra from 318 patients were used for building calibration models (112 positive and 110 negative samples according to PCR testing) and independent validation testing (39 positive and 57 negatives samples by PCR). Results The machine learning classification (support vector machines; SVM) performed with 92% sensitivity (3 false negatives) and 97% specificity (2 false positives). The Area Under the Receiver Operation Curve (AUROC) for the SVM classification was 0.98. These results may be better than as stated as one of the spectroscopy false positives was infected by a Plasmodium species other than Plasmodium falciparum or Plasmodium vivax, not detected by the PCR primers employed. Conclusions In conclusion, it was demonstrated that ATR-FTIR spectroscopy could be used as an efficient and reliable malaria diagnostic tool and has the potential to be developed for use at point of care under tropical field conditions with spectra able to be analysed via a Cloud-based system, and the diagnostic results returned to the user’s mobile telephone or computer. The combination of accessibility to mass screening, high sensitivity and selectivity, low logistics requirements and portability, makes this new approach a potentially outstanding tool in the context of malaria elimination programmes. The next step in the experimental programme now underway is to reduce the sample requirements to fingerprick volumes.


2018 ◽  
Vol 25 (7) ◽  
pp. 855-861 ◽  
Author(s):  
Halil Kilicoglu ◽  
Graciela Rosemblat ◽  
Mario Malički ◽  
Gerben ter Riet

Abstract Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff’s α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.


Sign in / Sign up

Export Citation Format

Share Document