scholarly journals Collaborative Classification Approach for Airline Tweets Using Sentiment Analysis

Author(s):  
M.Veera Kumari Et.al

In the world there are so many airline services which facilitate different airline facilities for their customers. Those airline services may satisfy or may not satisfy their customers. Customers cannot express their comments immediately, so airline services provide the twitter blog to give the feedback on their services. Twitter has been increased to develop the quality of services[4]. This paper develop the different classification techniques to improve accuracy for sentiment analysis. The tweets of services are classified into three polarities such as positive, negative and neutral. Classification methods are Random forest(RF), Logistic Regression(LR), K-Nearest Neighbors(KNN), Naïve Baye’s(NB), Decision Tree(DTC), Extreme Gradient Boost(XGB), merging of (two, three and four) classification techniques with majority Voting Classifier, AdaBoost measuring the accuracy achieved by the function using 20-fold and 30-fold cross validation was compassed in the validation phase. In this paper proposes a new ensemble Bagging approach for different classifiers[10]. The metrics of sentiment analysis precision, recall, f1-score, micro average, macro average and accuracy are discovered for all above mentioned classification techniques. In addition average predictions of classifiers and also accuracy of average predictions of classifiers was calculated for getting good quality of services. The result describes that bagging classifiers achieve better accuracy than non-bagging classifiers.

DIELEKTRIKA ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 1
Author(s):  
Ari Satriadi

Asma adalah penyakit pada saluran napas yang menyebabkan peningkatan hiperesponsif jalan napas dan menimbulkan gejala mengi/wheeze (napas berbunyi ngik-ngik). Bunyi napas wheeze merupakan salah satu ciri yang menandakan seseorang menderita asma. Penelitian ini dilakukan untuk membuat serta menguji suatu sistem yang dapat mengidentifikasi perbedaan ciri suara pernapasan wheeze pada pasien asma dan pernapasan lainnya dengan metode k-Nearest Neighbors (k-NN). Ciri suara yang digunakan yaitu rata-rata sinyal dan standar deviasi sinyal dalam domain waktu, rata-rata spektrum, standar deviasi spektrum, magnitude tertinggi saat frekuensi 0Hz, frekuensi dengan magnitude tertinggi pertama, kedua, dan ketiga.  K-NN adalah sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Didapatkan data suara pernapasan wheeze dan non wheeze melalui perekaman langsung kepada subjek penderita asma dan tidak asma. Dari seluruh data suara yang didapatkan kemudian dilakukan segmentasi data untuk mengambil event pernapasasn yang dibutuhkan kemudian dilakukan ekstraksi ciri untuk mendapatkan ciri matematis dari suara tersebut. 80% dari total keseluruhan data dilakukan pelatihan menggunakan metode 10 fold cross validation dan diapatkan hasil pelatihan dengan kemampuan klasifikasi maksimum pada k=3 dan k=5 dengan validitas yang sama 97,2%. Untuk pengujian kinerja k-NN pada tahap akhir diperoleh kemampuan maksimum pengklasifikasian untuk k=3 adalah 86,6% dan k=5 adalah 86,6%.


Land ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 174
Author(s):  
Desheng Wang ◽  
A-Xing Zhu

Digital soil mapping (DSM) is currently the primary framework for predicting the spatial variation of soil information (soil type or soil properties). Random forests and similarity-based methods have been used widely in DSM. However, the accuracy of the similarity-based approach is limited, and the performance of random forests is affected by the quality of the feature set. The objective of this study was to present a method for soil mapping by integrating the similarity-based approach and the random forests method. The Heshan area (Heilongjiang province, China) was selected as the case study for mapping soil subgroups. The results of the regular validation samples showed that the overall accuracy of the integrated method (71.79%) is higher than that of a similarity-based approach (58.97%) and random forests (66.67%). The results of the 5-fold cross-validation showed that the overall accuracy of the integrated method, similarity-based approach, and random forests range from 55% to 72.73%, 43.48% to 69.57%, and 54.17% to 70.83%, with an average accuracy of 66.61%, 57.39%, and 59.62%, respectively. These results suggest that the proposed method can produce a high-quality covariate set and achieve a better performance than either the random forests or similarity-based approach alone.


2019 ◽  
Vol 6 (3) ◽  
pp. 321
Author(s):  
Nanang Fakhrur Rozi ◽  
Fandi Arianto ◽  
Dian Puspita Hapsari

<p>Tingginya minat penggunaan pesawat terbang dipengaruhi oleh tingginya tingkat mobilitas masyarakat yang menuntut perpindahan kota dalam waktu yang singkat. Meski demikian, tidak semua maskapai penerbangan mampu memberikan layanan yang memuaskan bagi konsumennya. Kualitas layanan yang diberikan oleh suatu maskapai, baik dari segi keselamatan, keamanan, maupun kenyamanan, umumnya dapat diketahui melalui opini penumpang lainnya. Banyaknya opini negatif yang didapat oleh maskapai mengindikasikan buruknya kualitas layanannya, begitu pula sebaliknya. Akan tetapi, jumlah opini yang semakin hari semakin meningkat menyebabkan sulitnya konsumen dalam menilai kualitas maskapai secara cepat. Oleh karena itu, analisis sentimen dibutuhkan guna mempercepat konsumen dalam menilai kualitas layanan maskapai. Hybrid Cuckoo Search (HCS) merupakan salah satu metode yang dapat digunakan untuk melakukan analisis tersebut. Metode ini mampu mengelompokkan informasi secara cepat. Penelitian ini bertujuan untuk mengimplementasikan HCS dalam melakukan analisis sentimen pada data opini penumpang maskapai penerbangan. Hasil uji coba menunjukkan bahwa nilai rata-rata akurasi, <em>precision</em>, dan <em>recall</em> dari data opini 7 maskapai dengan 1.000 iterasi masing-masing sebesar 69,24%; 70,88%; dan 77,57%.</p><p> </p><p><em><strong>Abstract</strong></em></p><p class="Abstract"><em>The high demands of airplanes usage are influenced by the increasing levels of people's mobility who want to trip from one city to another in a short time. However, not all airlines company could provide satisfactory services for the consumers. The quality of services provided by an airline, in terms of safety, security, and convenience, is usually known through passenger opinions. The number of negative opinions gained by airlines indicates its poor quality of service and vice versa. However, the increasing number of opinion increases the difficulty of the consumer in assessing the quality of the airline quickly. Therefore, sentiment analysis is needed to accelerate the consumer in assessing the quality of airline services. Hybrid Cuckoo Search (HCS) is a method which can be used in conducting such analysis. This method is able to group information quickly. This study aims to implement HCS in conducting sentiment analysis on airline passenger opinion data. The results show that the averaged accuracy, precision, and recall from opinion dataset of 7 airlines company at 1,000 iteration are 69.24%, 70.88%, and 77.57% respectively.</em></p><p><em><strong><br /></strong></em></p>


Author(s):  
Gede Aditra Pradnyana ◽  
I Komang Agus Suryantara ◽  
I Gede Mahendra Darmawiguna

An impression can be interpreted as a psychological feeling toward a product and it plays an important role in decision making. Therefore, the understanding of the data in the domain of impressions will be very useful. This research had the objective of knowing the performance of K-Nearest Neighbors method to classify endek image impression using K-Fold Cross Validation method. The images were taken from 3 locations, namely CV. Artha Dharma, Agung Bali Collection, and Pengrajin Sri Rejeki. To get the image impression was done by consulting with an endek expert named Dr. D.A Tirta Ray, M.Si. The process of data mining was done by using K-Nearest Neighbors Method which was a classification method to a set of data based on learning data that had been classified previously and to classify new objects based on attributes and training samples. K-Fold Cross Validation testing obtained accuracy of 91% with K value in K-Nearest Neighbors of 3, 4, 7, 8.


Biology ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 33
Author(s):  
Si-Yuan Lu ◽  
Zheng Zhang ◽  
Yu-Dong Zhang ◽  
Shui-Hua Wang

Accurate and timely diagnosis of COVID-19 is indispensable to control its spread. This study proposes a novel explainable COVID-19 diagnosis system called CGENet based on graph embedding and an extreme learning machine for chest CT images. We put forward an optimal backbone selection algorithm to select the best backbone for the CGENet based on transfer learning. Then, we introduced graph theory into the ResNet-18 based on the k-nearest neighbors. Finally, an extreme learning machine was trained as the classifier of the CGENet. The proposed CGENet was evaluated on a large publicly-available COVID-19 dataset and produced an average accuracy of 97.78% based on 5-fold cross-validation. In addition, we utilized the Grad-CAM maps to present a visual explanation of the CGENet based on COVID-19 samples. In all, the proposed CGENet can be an effective and efficient tool to assist COVID-19 diagnosis.


2020 ◽  
Vol 8 (2) ◽  
Author(s):  
yohana Tri Utami ◽  
Dewi Asiah Shofiana ◽  
Yunda Heningtyas

Telecommunication industries are experiencing substantial problems related to the migration of customers due to a large number of competing companies, dynamic circumstances, as well as the presence of many innovative and attractive offerings. The situation has resulted in a high level of customer migration, affecting a decrement toward the company revenue. Regarding that condition, the customer churn is one well-know approach that can help in increasing the company's revenue and reputation. As to predict the reason behind the migration of customer, this study proposed a data mining classification technique by applying the C4.5 algorithm. Patterns generated by the model were implemented using 10-fold cross-validation, resulting in a model with an accuracy rate of 87%, precision 87.5%, and a recall of 97%. Based on the good performance quality of the model, it can be stated that the C4.5 algorithm succeeded to discover several causes from the migration of telecommunication users, in which price holds the top place as the primary reason


2020 ◽  
Author(s):  
Farshid Shirafkan ◽  
Sajjad Gharaghani ◽  
Karim Rahimian ◽  
Reza Sajedi ◽  
Javad Zahiri

Abstract Background: Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them were detected randomly. Therefore, introducing an appropriate computational approach seems to be rational. Results: In this study, we would like to represent a competent model for detecting moonlighting and non-moonlighting proteins by extracted features from protein sequences. Then, we will represent a scheme for detecting outlier proteins. To do so, 15 distinct feature vectors were used to study each one's effect on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was implemented 100 times by 10 fold cross-validation on feature vectors, then proteins which miss classified 80 times or more, were grouped. This process was applied to every single feature vector and in the end, the intersection of these groups was determined as the outlier proteins. The results of 10 fold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) show that the decision tree method on all feature vectors has the highest performance among all methods in this research and also in other available methods. Besides, the study of outliers shows that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there are non-moonlighting proteins (such as P69797) that have been misclassified by 8 different classification methods with 16 different feature types. Because these moonlighting proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing that, these proteins are non-moonlighting. Conclusions: Moonlighting proteins are difficult to identify by experiments. Our method enables identification of novel moonlighting proteins using distinct feature vectors. It also indicates that a number of non-moonlight proteins are likely to be moonlight.


Sentiment Analysis probes public opinion on user generated content on Web like blogs, social media or e-commerce websites. The results of Sentiment Analysis are getting much attention with marketers that they are able to evaluate the success of an advertising campaign or the attitude of people on a new product launch. Business owners and advertising companies are using Sentiment Analysis to start new business strategies and to identify opportunities for new product development. In this paper, with R programming, the tweets from Twitter about Samsung Galaxy mobile phone and Apple Iphone were retrieved from three countries namely USA, UK and India for creating the dataset. The collected tweets were classified into positive, negative and neutral sentiments. The machine learning classifier algorithms like Naïve Bayes, Support Vector Machine, Random Forest, Decision Tree, Artificial Neural Network, XGBoost with K Fold cross validation were applied on the dataset and the results were tabulated for comparing and estimating which classifier algorithm yields the best accuracy. Other performance metric values like F Score, Precision, Recall were also calculated for comparison of various classifier performances on Sentiment Analysis. It was found that XGBoost method combined with K Fold cross validation has produced the best accuracy in prediction. We have also applied SentiStrength algorithm to find out the intensity or the strength of positive and negative comments from each sentence. With the help of the results in hand, we were able to predict the brand of mobile phone that was preferred in each country.


2021 ◽  
Vol 10 (24) ◽  
pp. 5982
Author(s):  
Gaetano Zazzaro ◽  
Francesco Martone ◽  
Gianpaolo Romano ◽  
Luigi Pavone

Background: The aim of this study was to evaluate the performance of an automated COVID-19 detection method based on a transfer learning technique that makes use of chest computed tomography (CT) images. Method: In this study, we used a publicly available multiclass CT scan dataset containing 4171 CT scans of 210 different patients. In particular, we extracted features from the CT images using a set of convolutional neural networks (CNNs) that had been pretrained on the ImageNet dataset as feature extractors, and we then selected a subset of these features using the Information Gain filter. The resulting feature vectors were then used to train a set of k Nearest Neighbors classifiers with 10-fold cross validation to assess the classification performance of the features that had been extracted by each CNN. Finally, a majority voting approach was used to classify each image into two different classes: COVID-19 and NO COVID-19. Results: A total of 414 images of the test set (10% of the complete dataset) were correctly classified, and only 4 were misclassified, yielding a final classification accuracy of 99.04%. Conclusions: The high performance that was achieved by the method could make it feasible option that could be used to assist radiologists in COVID-19 diagnosis through the use of CT images.


2020 ◽  
Vol 2020 (9) ◽  
pp. 373-1-373-8
Author(s):  
Yi Yang ◽  
Utpal Sarkar ◽  
Isabel Borrell ◽  
Jan P. Allebach

Macro-uniformity is an important factor in the overall quality of prints from inkjet printers. The International Committee for Information Technology Standards (INCITS) defined the macrouniformity for prints, which includes several printing defects such as banding, streaks, mottle, etc. Although we can quantitatively analyze a certain kind of defect, it is difficult to assess the overall perceptual quality when multiple defects appear simultaneously in a print. We used the Macro-uniformity quality rulers designed by INCITS W1.1 as experimental references, to conduct a psychophysical experiment for pooling perceptual assessments of our print samples from subjects. Then, calculated features can describe the severity of defects in a test sample; and we trained a predictive model using these data. The predictor can automatically predict the macro-uniformity score as judged by humans. Our results show that the predictor can work accurately. The predicted scores are similar to the subjective visual scores (ground-truth). Also, we used 6-fold cross-validation to confirm the efficacy of our predictor.


Sign in / Sign up

Export Citation Format

Share Document