scholarly journals An Empirical Approach for Extreme Behavior Identification through Tweets Using Machine Learning

2019 ◽  
Vol 9 (18) ◽  
pp. 3723
Author(s):  
Sharif ◽  
Mumtaz ◽  
Shafiq ◽  
Riaz ◽  
Ali ◽  
...  

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.

2019 ◽  
Vol 8 (4) ◽  
pp. 3226-3235

The segmentation and detection of brain pathologies in medical images is an indispensible step. This helps the radiologist to diagnose a variety of brain deformity and helps in the set up for a suitable treatment. Magnetic Resonance Imaging (MRI) plays a significant character in the research area of neuroscience. The proposed work is a study and probing of different classification techniques used for automated detection and segmentation of brain tumor from MRI in the field of machine learning. This paper try to present the feature extraction from raw MRI and fed the same to four classifier named as, Support Vector Machine (SVM), Decision Tree (DT), k-Nearest Neighbors (KNN), and Artificial Neural Network (ANN). This mechanism was done in various stages for Computer Aided Detection System. In the preliminary stage the pre-processing and post-processing of MR image enhancement is done. This was done as the processed image is more likely suitable for the analysis. Then the k-means clustering is used to sectioning the MRI by applied mean gray level method. In the subsequent stage, statistical feature analysis were done, the features were computed using Haralick’s equation for feature based on the Gray Level Co-occurrence Matrix. Feature chosen dependent on tumor region, location, periphery, and color from the sectioned image is then classified by applying the classification techniques. In the third stage SVM, DT, ANN, and KNN classifiers were used for diagnoses. The performances of the classifiers are tested and evaluated successfully.


2019 ◽  
Vol 8 (3) ◽  
pp. 8342-8348

In this paper, the research work investigated on various spectral accents, for example, M.F.C.C, pitch-chroma, skew-ness, and centroid for feeling acknowledgment. For the test arrangement, the feelings considered in this investigation are Fear, Anger, Neutral, and Happy. The framework is assessed for different blends of spectral accents. At last, it makes sense of the blend of MFCC and skewness gave a superior acknowledgment execution when contrasted with different mixes. The previously mentioned accents are inspected utilizing Gaussian Mixture models (G.M.M.s) and Support Vector Machines (S.V.M.s). To expand the framework execution and evacuate insignificant data shape the recently produced vigorous accents, in this paper investigated an approach, namely Principal Component Analysis (PCA) is utilized to expel high dimensional information. It was set up that the acknowledgment execution for include sets in the wake of applying PCA got expanded in both grouping models utilizing GMMs and SVMs. The general framework is perceived 35% preceding PCA 58.3% later than PCA utilizing GMMs, and 28% preceding PCA, 50.5% later than PCA utilizing SVMs. The database utilized as a part of this examination is Telugu feeling speech corpus (IIT-KGP)


Author(s):  
Abdullahi Adeleke ◽  
Noor Azah Samsudin ◽  
Mohd Hisyam Abdul Rahim ◽  
Shamsul Kamal Ahmad Khalid ◽  
Riswan Efendi

Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard <em>MLC</em> methods: <span>binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (k-NN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.</span>


Complexity ◽  
2022 ◽  
Vol 2022 ◽  
pp. 1-20
Author(s):  
Nihad Brahimi ◽  
Huaping Zhang ◽  
Lin Dai ◽  
Jianzi Zhang

The car-sharing system is a popular rental model for cars in shared use. It has become particularly attractive due to its flexibility; that is, the car can be rented and returned anywhere within one of the authorized parking slots. The main objective of this research work is to predict the car usage in parking stations and to investigate the factors that help to improve the prediction. Thus, new strategies can be designed to make more cars on the road and fewer in the parking stations. To achieve that, various machine learning models, namely vector autoregression (VAR), support vector regression (SVR), eXtreme gradient boosting (XGBoost), k-nearest neighbors (kNN), and deep learning models specifically long short-time memory (LSTM), gated recurrent unit (GRU), convolutional neural network (CNN), CNN-LSTM, and multilayer perceptron (MLP), were performed on different kinds of features. These features include the past usage levels, Chongqing’s environmental conditions, and temporal information. After comparing the obtained results using different metrics, we found that CNN-LSTM outperformed other methods to predict the future car usage. Meanwhile, the model using all the different feature categories results in the most precise prediction than any of the models using one feature category at a time


2019 ◽  
Vol 8 (4) ◽  
pp. 3244-3249

In the current moving technological business sector, the amount spent for attaching the new customer is highly expensive and time consuming process than adopting some methods to hold and retain the existing customers. So the business sector is in need to make a research on with holding the existing customers by using the current technology. The methods to make the retention of the existing customers with high reliablility are a challenging task. With this view, we focus on predicting the customer churn for the banking application. This paper uses the customer churn bank modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data preprocessing is done and the relationship between the attributes are identified. Second, the data set is reduced with the principal component analysis to form the 2 component feature reduced dataset. Third, the raw dataset and 2 component PCA reduced dataset is fitted to various solvers of logistic regression classifiers and the performance is analyzed with the confusion matrix. Fourth, the raw dataset and 2 component PCA reduced dataset is fitted to various neighboring algorithms of K-Nearest Neighbors classifiers and the performance is analyzed with the confusion matrix. Fifth, the raw dataset and 2 component PCA reduced dataset is fitted to various kernels of Support Vector Machine classifiers and the performance is analyzed with the confusion matrix. The implementation is carried out with python code using Anaconda Navigator. Experimental results shows that, the rbf kernel of Support vector machine classifier is effective with the accuracy of 85.8% before applying PCA and accuracy of 80.9% after applying PCA compared to other classifiers.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2020 ◽  
Vol 10 (5) ◽  
pp. 1797 ◽  
Author(s):  
Mera Kartika Delimayanti ◽  
Bedy Purnama ◽  
Ngoc Giang Nguyen ◽  
Mohammad Reza Faisal ◽  
Kunti Robiatul Mahmudah ◽  
...  

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.


Diagnostics ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 574
Author(s):  
Gennaro Tartarisco ◽  
Giovanni Cicceri ◽  
Davide Di Pietro ◽  
Elisa Leonardi ◽  
Stefania Aiello ◽  
...  

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Parvathaneni Rajendra Kumar ◽  
Suban Ravichandran ◽  
Satyala Narayana

AbstractObjectivesThis research work exclusively aims to develop a novel heart disease prediction framework including three major phases, namely proposed feature extraction, dimensionality reduction, and proposed ensemble-based classification.MethodsAs the novelty, the training of NN is carried out by a new enhanced optimization algorithm referred to as Sea Lion with Canberra Distance (S-CDF) via tuning the optimal weights. The improved S-CDF algorithm is the extended version of the existing “Sea Lion Optimization (SLnO)”. Initially, the statistical and higher-order statistical features are extracted including central tendency, degree of dispersion, and qualitative variation, respectively. However, in this scenario, the “curse of dimensionality” seems to be the greatest issue, such that there is a necessity of dimensionality reduction in the extracted features. Hence, the principal component analysis (PCA)-based feature reduction approach is deployed here. Finally, the dimensional concentrated features are fed as the input to the proposed ensemble technique with “Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN)” with optimized Neural Network (NN) as the final classifier.ResultsAn elaborative analyses as well as discussion have been provided by concerning the parameters, like evaluation metrics, year of publication, accuracy, implementation tool, and utilized datasets obtained by various techniques.ConclusionsFrom the experiment outcomes, it is proved that the accuracy of the proposed work with the proposed feature set is 5, 42.85, and 10% superior to the performance with other feature sets like central tendency + dispersion feature, central tendency qualitative variation, and dispersion qualitative variation, respectively.ResultsFinally, the comparative evaluation shows that the presented work is appropriate for heart disease prediction as it has high accuracy than the traditional works.


Molecules ◽  
2019 ◽  
Vol 24 (13) ◽  
pp. 2506 ◽  
Author(s):  
Yunfeng Chen ◽  
Yue Chen ◽  
Xuping Feng ◽  
Xufeng Yang ◽  
Jinnuo Zhang ◽  
...  

The feasibility of using the fourier transform infrared (FTIR) spectroscopic technique with a stacked sparse auto-encoder (SSAE) to identify orchid varieties was studied. Spectral data of 13 orchids varieties covering the spectral range of 4000–550 cm−1 were acquired to establish discriminant models and to select optimal spectral variables. K nearest neighbors (KNN), support vector machine (SVM), and SSAE models were built using full spectra. The SSAE model performed better than the KNN and SVM models and obtained a classification accuracy 99.4% in the calibration set and 97.9% in the prediction set. Then, three algorithms, principal component analysis loading (PCA-loading), competitive adaptive reweighted sampling (CARS), and stacked sparse auto-encoder guided backward (SSAE-GB), were used to select 39, 300, and 38 optimal wavenumbers, respectively. The KNN and SVM models were built based on optimal wavenumbers. Most of the optimal wavenumbers-based models performed slightly better than the all wavenumbers-based models. The performance of the SSAE-GB was better than the other two from the perspective of the accuracy of the discriminant models and the number of optimal wavenumbers. The results of this study showed that the FTIR spectroscopic technique combined with the SSAE algorithm could be adopted in the identification of the orchid varieties.


Sign in / Sign up

Export Citation Format

Share Document