scholarly journals An evaluation of machine learning and latent semantic analysis in text sentiment classification

2020 ◽  
pp. 1-11
Author(s):  
Justyna Miazga ◽  
Tomasz Hachaj

In this paper, we compare the following machine learning methods as classifiers for sentiment analysis: k – nearest neighbours (kNN), artificial neural network (ANN), support vector machine (SVM), random forest. We used a dataset containing 5,000 movie reviews in which 2,500 were marked as positive and 2,500 as negative. We chose 5,189 words which have an influence on sentence sentiment. The dataset was prepared using a term document matrix (TDM) and classical multidimensional scaling (MDS). This is the first time that TDM and MDS have been used to choose the characteristics of text in sentiment analysis. In this case, we decided to examine different indicators of the specific classifier, such as kernel type for SVM and neighbour count in kNN. All calculations were performed in the R language, in the program R Studio v 3.5.2. Our work can be reproduced because all of our data sets and source code are public.

2020 ◽  
Vol 190 (3) ◽  
pp. 342-351
Author(s):  
Munir S Pathan ◽  
S M Pradhan ◽  
T Palani Selvam

Abstract In the present study, machine learning (ML) methods for the identification of abnormal glow curves (GC) of CaSO4:Dy-based thermoluminescence dosimeters in individual monitoring are presented. The classifier algorithms, random forest (RF), artificial neural network (ANN) and support vector machine (SVM) are employed for identifying not only the abnormal glow curve but also the type of abnormality. For the first time, the simplest and computationally efficient algorithm based on RF is presented for GC classifications. About 4000 GCs are used for the training and validation of ML algorithms. The performance of all algorithms is compared by using various parameters. Results show a fairly good accuracy of 99.05% for the classification of GCs by RF algorithm. Whereas 96.7% and 96.1% accuracy is achieved using ANN and SVM, respectively. The RF-based classifier is recommended for GC classification as well as in assisting the fault determination of the TLD reader system.


Electrocardiogram (ECG) is the analysis of the electrical movement of the heart over a period of time. The detailed information about the condition of the heart is measured by analyzing the ECG signal. Wavelet transform, fast Fourier transform are the different methods to disorganize cardiac disease. The paper elaborates the survey on ECG signal analysis and related study on arrhythmic and non arrhythmic data. Here we discuss the efficient feature extraction process for electrocardiogram, where based on position and priority six best P-QRS-T fragments are studied. This survey examines the the outcome of the system by using various Machine learning classification algorithms for feature extraction and analysis of ECG Signals. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN) are the most important algorithms used here for this purpose. There are several publicly available data sets which are used for arrhythmia analysis and among them MIT-BIH ECG-ID database is mostly used. The drawbacks and limitations are also discussed here and from there future challenges and concluding remarks can be done.


2021 ◽  
Vol 8 (1) ◽  
pp. 147
Author(s):  
Primandani Arsi ◽  
Retno Waluyo

<p class="Abstrak">Dewasa ini, media sosial berkembang pesat di internet, salah satu yang banyak digemari adalah Twitter. Berbagai topik ramai diperbincangkan di Twitter mulai dari ekonomi, politik, sosial, budaya, hukum dan lain-lain. Salah satu topik yang ramai diperbincangkan di Twitter adalah terkait isu pemindahan ibu kota Indonesia. Namun dibalik hal tersebut terdapat kontroversi dari  pihak yang merasa  pro dan kontra, masing-masing memiiki sudut pandang yang berbeda.  Hal ini menyebabkan munculnya fenomena perdebatan khususnya di Twitter yang sebenarnya menunjukkan perhatian kolektif mengenai wacana publik tersebut. Analisis sentimen adalah proses mengekstraksi, memahami dan mengolah data berupa teks yang tidak terstruktur secara otomatis guna mendapatkan informasi sentimen yang terdapat pada sebuah kalimat pendapat atau opini. Dalam penerapan analisis sentimen menggunakan metode <em>machine learning</em> terdapat beberapa metode yang sering digunakan. Dalam penelitian ini diusulkan metode <em>Support Vector Machine</em> (SVM) untuk diterapkan pada <em>tweets</em> topik pemindahan ibu kota Indonesia untuk tujuan klasifikasi kelas sentimen pada media sosial <em>twitter</em>. Teknis klasifikasi  dilakukan dengan cara mengklasifikasikan menjadi 2 kelas yakni positif dan negatif. Berdasarkan hasil pengujian yang dilakukan terhadap <em>tweets</em> sentimen pemindahan ibu kota dari media sosial twitter sebanyak 1.236 <em>tweets</em> (404 positif dan 832 negatif) menggunakan SVM diperoleh akurasi =96,68%, <em>precision=</em>95.82%, <em>recall</em>=94.04% dan AUC = 0,979.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstrak"><em><em>Today, social media is growing fast on the internet<span lang="EN-GB">.</span><span lang="EN-GB">On</span>e of the most popular<span lang="EN-GB"> social media</span> is Twitter. Many topics are discussed on Twitter such as economic, politic, socia<span lang="EN-GB">l</span>, cultur<span lang="EN-GB">e</span>, <span lang="EN-GB">and l</span>aw<span lang="EN-GB">.</span> One of the hot topics discussed on Twitter is the issue of relocating Indonesia's capital city. However<span lang="EN-GB">, </span>there is controversy from supporters and opponents<span lang="EN-GB">. They</span> have different views. <span lang="EN-GB">This issue leads to</span> a phenomenon of debate on Twitter <span lang="EN-GB">that </span>actually show<span lang="EN-GB">s a </span>collective concern about the public discourse. Sentiment analysis is a process of extracting, understand<span lang="EN-GB">ing </span>and process<span lang="EN-GB">ing</span> unstructured data to get sentiment information which is<span lang="EN-GB"> found</span> in an opinion sentence. Application of sentiment analysis using machine learning methods<span lang="EN-GB"> shows that</span> there are several methods that are often used. In this study, the Support Vector Machine (SVM) method is proposed to be applied to tweets on the topic of relocating Indonesia's capital city for sentiment classification on social media twitter. The classification technique is carried out into 2 classes, namely positive and negative. Based on testing on the sentiment of relocating Indonesia's capital city from social media twitter from 1,116 tweets (404 positive and 832 negative) using SVM obtained accuracy = 96.68%, precision = 95.82%, recall = 94.04% and AUC = 0.979.</em></em></p>


Author(s):  
Hendri Murfi ◽  
Furida Lusi Siagian ◽  
Yudi Satria

Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 155
Author(s):  
Kristina Machova ◽  
Marian Mach ◽  
Matej Vasilko

The article focuses on solving an important problem of detecting suspicious reviewers in online discussions on social networks. We have concentrated on a special type of suspicious authors, on trolls. We have used methods of machine learning for generation of detection models to discriminate a troll reviewer from a common reviewer, but also methods of sentiment analysis to recognize the sentiment typical for troll’s comments. The sentiment analysis can be provided also using machine learning or lexicon-based approach. We have used lexicon-based sentiment analysis for its better ability to detect a dictionary typical for troll authors. We have achieved Accuracy = 0.95 and F1 = 0.80 using sentiment analysis. The best results using machine learning methods were achieved by support vector machine, Accuracy = 0.986 and F1 = 0.988, using a dataset with the set of all selected attributes. We can conclude that detection model based on machine learning is more successful than lexicon-based sentiment analysis, but the difference in accuracy is not so large as in F1 measure.


2017 ◽  
Vol 57 (8) ◽  
pp. 1012-1025 ◽  
Author(s):  
Andrei P. Kirilenko ◽  
Svetlana O. Stepchenkova ◽  
Hany Kim ◽  
Xiang (Robert) Li

Interest in applying Big Data to tourism is increasing, and automated sentiment analysis has been used to extract public opinion from various sources. This article evaluates the suitability of different types of automated classifiers for applications typical in tourism, hospitality, and marketing studies by comparing their performance to that of human raters. While the commonly used performance indices suggest that on easier-to-classify data sets machine learning methods demonstrate performance comparable to that by human raters, other performance measures such as Cohen’s kappa show that the results of machine learning are still inferior to manual processing. On more difficult and noisy data sets, automated analysis has poorer performance than human raters. The article discusses issues pertinent to selection of appropriate sentiment analysis software and offers a word of caution against using automated classifiers uncritically.


2021 ◽  
Vol 7 ◽  
pp. e813
Author(s):  
Anandan Chinnalagu ◽  
Ashok Kumar Durairaj

Customer satisfaction and their positive sentiments are some of the various goals for successful companies. However, analyzing customer reviews to predict accurate sentiments have been proven to be challenging and time-consuming due to high volumes of collected data from various sources. Several researchers approach this with algorithms, methods, and models. These include machine learning and deep learning (DL) methods, unigram and skip-gram based algorithms, as well as the Artificial Neural Network (ANN) and bag-of-word (BOW) regression model. Studies and research have revealed incoherence in polarity, model overfitting and performance issues, as well as high cost in data processing. This experiment was conducted to solve these revealing issues, by building a high performance yet cost-effective model for predicting accurate sentiments from large datasets containing customer reviews. This model uses the fastText library from Facebook’s AI research (FAIR) Lab, as well as the traditional Linear Support Vector Machine (LSVM) to classify text and word embedding. Comparisons of this model were also done with the author’s a custom multi-layer Sentiment Analysis (SA) Bi-directional Long Short-Term Memory (SA-BLSTM) model. The proposed fastText model, based on results, obtains a higher accuracy of 90.71% as well as 20% in performance compared to LSVM and SA-BLSTM models.


Author(s):  
Jing Xu ◽  
Fuyi Li ◽  
André Leier ◽  
Dongxu Xiang ◽  
Hsin-Hui Shen ◽  
...  

Abstract Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.


2020 ◽  
Vol 98 (6) ◽  
Author(s):  
Anderson Antonio Carvalho Alves ◽  
Rebeka Magalhães da Costa ◽  
Tiago Bresolin ◽  
Gerardo Alves Fernandes Júnior ◽  
Rafael Espigolan ◽  
...  

Abstract The aim of this study was to compare the predictive performance of the Genomic Best Linear Unbiased Predictor (GBLUP) and machine learning methods (Random Forest, RF; Support Vector Machine, SVM; Artificial Neural Network, ANN) in simulated populations presenting different levels of dominance effects. Simulated genome comprised 50k SNP and 300 QTL, both biallelic and randomly distributed across 29 autosomes. A total of six traits were simulated considering different values for the narrow and broad-sense heritability. In the purely additive scenario with low heritability (h2 = 0.10), the predictive ability obtained using GBLUP was slightly higher than the other methods whereas ANN provided the highest accuracies for scenarios with moderate heritability (h2 = 0.30). The accuracies of dominance deviations predictions varied from 0.180 to 0.350 in GBLUP extended for dominance effects (GBLUP-D), from 0.06 to 0.185 in RF and they were null using the ANN and SVM methods. Although RF has presented higher accuracies for total genetic effect predictions, the mean-squared error values in such a model were worse than those observed for GBLUP-D in scenarios with large additive and dominance variances. When applied to prescreen important regions, the RF approach detected QTL with high additive and/or dominance effects. Among machine learning methods, only the RF was capable to cover implicitly dominance effects without increasing the number of covariates in the model, resulting in higher accuracies for the total genetic and phenotypic values as the dominance ratio increases. Nevertheless, whether the interest is to infer directly on dominance effects, GBLUP-D could be a more suitable method.


Fibers ◽  
2018 ◽  
Vol 6 (4) ◽  
pp. 73 ◽  
Author(s):  
Mei-Ling Huang ◽  
Chien-Chang Fu

Textile pilling causes an undesirable appearance on the surface of garments, which is a long-standing problem. In this study, textile grading of fleece based on pilling assessment was performed using image processing and machine learning methods. Two image processing methods were used. The first method involved using the discrete Fourier transform combined with Gaussian filtering, and the second method involved using the Daubechies wavelet. Furthermore, binarization was used to segment the textile pilling from the background. Morphological and topological image processing methods were applied to extract the essential characteristics of textile image information to establish a database for the textile. Finally, machine learning methods, namely the artificial neural network (ANN) and the support vector machine (SVM), were used to objectively solve the textile grading problem. When the Fourier-Gaussian method was used, the classification accuracies of the ANN and SVM were 96.6% and 95.3%, and the overall accuracies of the Daubechies wavelet were 96.3% and 90.9%, respectively.


Sign in / Sign up

Export Citation Format

Share Document