scholarly journals Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor

Author(s):  
M. Rizzo Irfan ◽  
M. Ali Fauzi ◽  
Tibyani Tibyani ◽  
Nurul Dyah Mentari

<span>2013 curriculum is a new curriculum in the Indonesian education system which has been enacted by the government to replace KTSP curriculum. The implementation of this curriculum in the last few years has sparked various opinions among students, teachers, and public in general, especially on social media twitter. In this study, a sentimental analysis on 2013 curriculum is conducted. Ensemble of several feature sets were used twitter specific features, textual features, Parts of Speech (POS) features, lexicon based features, and Bag of Words (BOW) features for the sentiment classification using K-Nearest Neighbor method. The experiment result showed that the the ensemble features have the best performance of sentiment classification compared to only using individual features. The best accuracy using ensemble features is 96% when k=5 is used.</span>

2021 ◽  
Vol 13 (6) ◽  
pp. 3497
Author(s):  
Hassan Adamu ◽  
Syaheerah Lebai Lutfi ◽  
Nurul Hashimah Ahamed Hassain Malim ◽  
Rohail Hassan ◽  
Assunta Di Vaio ◽  
...  

Sustainable development plays a vital role in information and communication technology. In times of pandemics such as COVID-19, vulnerable people need help to survive. This help includes the distribution of relief packages and materials by the government with the primary objective of lessening the economic and psychological effects on the citizens affected by disasters such as the COVID-19 pandemic. However, there has not been an efficient way to monitor public funds’ accountability and transparency, especially in developing countries such as Nigeria. The understanding of public emotions by the government on distributed palliatives is important as it would indicate the reach and impact of the distribution exercise. Although several studies on English emotion classification have been conducted, these studies are not portable to a wider inclusive Nigerian case. This is because Informal Nigerian English (Pidgin), which Nigerians widely speak, has quite a different vocabulary from Standard English, thus limiting the applicability of the emotion classification of Standard English machine learning models. An Informal Nigerian English (Pidgin English) emotions dataset is constructed, pre-processed, and annotated. The dataset is then used to classify five emotion classes (anger, sadness, joy, fear, and disgust) on the COVID-19 palliatives and relief aid distribution in Nigeria using standard machine learning (ML) algorithms. Six ML algorithms are used in this study, and a comparative analysis of their performance is conducted. The algorithms are Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Random Forest (RF), Logistics Regression (LR), K-Nearest Neighbor (KNN), and Decision Tree (DT). The conducted experiments reveal that Support Vector Machine outperforms the remaining classifiers with the highest accuracy of 88%. The “disgust” emotion class surpassed other emotion classes, i.e., sadness, joy, fear, and anger, with the highest number of counts from the classification conducted on the constructed dataset. Additionally, the conducted correlation analysis shows a significant relationship between the emotion classes of “Joy” and “Fear”, which implies that the public is excited about the palliatives’ distribution but afraid of inequality and transparency in the distribution process due to reasons such as corruption. Conclusively, the results from this experiment clearly show that the public emotions on COVID-19 support and relief aid packages’ distribution in Nigeria were not satisfactory, considering that the negative emotions from the public outnumbered the public happiness.


2018 ◽  
Vol 7 (3) ◽  
pp. 1372
Author(s):  
Soudamini Hota ◽  
Sudhir Pathak

‘Sentiment’ literally means ‘Emotions’. Sentiment analysis, synonymous to opinion mining, is a type of data mining that refers to the analy-sis of data obtained from microblogging sites, social media updates, online news reports, user reviews etc., in order to study the sentiments of the people towards an event, organization, product, brand, person etc. In this work, sentiment classification is done into multiple classes. The proposed methodology based on KNN classification algorithm shows an improvement over one of the existing methodologies which is based on SVM classification algorithm. The data used for analysis has been taken from Twitter, this being the most popular microblogging site. The source data has been extracted from Twitter using Python’s Tweepy. N-Gram modeling technique has been used for feature extraction and the supervised machine learning algorithm k-nearest neighbor has been used for sentiment classification. The performance of proposed and existing techniques is compared in terms of accuracy, precision and recall. It is analyzed and concluded that the proposed technique performs better in terms of all the standard evaluation parameters. 


2020 ◽  
Vol 9 (4) ◽  
pp. 1620-1630
Author(s):  
Edi Sutoyo ◽  
Ahmad Almaarif

Indonesia has a capital city which is one of the many big cities in the world called Jakarta. Jakarta's role in the dynamics that occur in Indonesia is very central because it functions as a political and government center, and is a business and economic center that drives the economy. Recently the discourse of the government to relocate the capital city has invited various reactions from the community. Therefore, in this study, sentiment analysis of the relocation of the capital city was carried out. The analysis was performed by doing a classification to describe the public sentiment sourced from twitter data, the data is classified into 2 classes, namely positive and negative sentiments. The algorithms used in this study include Naïve Bayes classifier, logistic regression, support vector machine, and K-nearest neighbor. The results of the performance evaluation algorithm showed that support vector machine outperformed as compared to 3 algorithms with the results of Accuracy, Precision, Recall, and F-measure are 97.72%, 96.01%, 99.18%, and 97.57%, respectively. Sentiment analysis of the discourse of relocation of the capital city is expected to provide an overview to the government of public opinion from the point of view of data coming from social media. 


Author(s):  
Diana Rahmawati ◽  
Mutiara Puspa Putri I ◽  
Miftachul Ulum ◽  
Koko Joni

Bacteria are a group of living things or organisms that do not have a core covering. In the grouping, some bacteria are pathogenic. With a microscopic size, many pathogenic bacteria are found around and spread through the food eaten or by touching objects around them, then cause diseases such as diarrhea, vomiting, and others. As a more effective effort to help the government and society prevent disease caused by pathogenic bacteria, a system for the identification and classification of pathogenic bacteria K-Nearest Neighbor was created. This system uses a biological microscope that is attached to a webcam camera above the ocular lens as a tool to see bacterial objects and assist in bacterial capture. Rough player rotates automatically (auto-focus) in image capture. In the process of classification and identifying bacteria, the K-Nearest Neighbor method is used, which is a method with the calculation of the nearest neighbor or calculation based on the level of similarity to the dataset. In this study, the bacteria vibrio chlorae, staphylococcus aereus, and streptococcus m. with the highest accuracy is the K = 9 value of 97.77% using the Chebyshev method.


Author(s):  
Farid Fitriyadi ◽  
Muqorobin Muqorobin

Abstract—Corona Virus is currently spreading very rapidly in many parts of Indonesia, including Central Java Province. According to the current data of corona database in Central Java, today on 17th of August 2021, the number of confirmed cases is; Confirmed in Treatment (Active Cases): 16.344, Confirmed Recovered: 408.697, and Confirmed Dead: 29.148. Therefore, the total number of cases is 454.189, obtained from the sum of the number of being treated, recovered, and dead. Corona Virus is a collection of viruses that can infect the respiratory system, generally mild, such as common cold, although, some forms of diseases like; SARS, MERS, and COVID-19 are more deadly. In anticipating this case, the government has created some policies which include; limiting activities outside the house, having school activities done from home, working from home, and even having religious activities done from home too. The purpose of this study was to predict the possible rate of new cases in one of Central Java areas with confirmed cases of corona virus. Thus, it can be used as information material for the public to anticipate early. The research method applied in this research is problem analysis and literature study, data collection and implementation. The application of the K-Nearest Neighbor (KNN) method is expected to be able to predict the level of spread of COVID-19 in Central Java. The results of the research on testing the prediction system for the new cases level were tested in the Sragen area. Testing is carried out by taking samples for new cases, namely Kudu Regency/City, Confirmed: 17,599, Treated: 89, Recovered: 18,303, Died: 1,721, Suspected: 87 and Discarded Suspected: 1,711. After doing the prediction with K-NN algorithm, it showed the Condition: High.


Author(s):  
Chavid Syukri Fatoni ◽  
Ema Utami ◽  
Ferry Wahyu Wibowo

The Diphtheria cases have special concern by the Indonesian government and are recorded as an extraordinary case (KLB) in 2017. Diphtheria is an infectious disease and cause complications of dangerous and deadly diseases if have not any treated immediately. Along this time, the communities often underestimate the common symptoms of diseases, such as throat pain, flu, and fever. The similarity of Diphtheria symptoms with common diseases and complications such as myocarditis, obstruction on breath, Acute Kidney Injury (AKI), making Diphtheria are rather difficult to treat due to the infections spread quickly. Some complications of diphtheria can cause a death if have not treated immediately and there must be any identification early for diphtheria. Then, an expert system is needed to help the community and the government in diagnosing the diphtheria. An expert system is an information system containing knowledge from experts in order provide information to be used for consultation. The knowledge from experts in this particular system is used as a basis by the Expert System to answer the questions (consultation). The study used the K-Nearest Neighbor (KNN) method, which the method calculates the similarity value of Diphtheria disease symptom. As the result, it can provide an initial diagnosis for Diphtheria before complications occur. The output of this study is the diagnosis of diphtheria based on the symptoms with the accuracy results of 93.056%, as well as providing an initial diagnosis in order to have immediately treating the diphtheria. 


2021 ◽  
Vol 15 ◽  
Author(s):  
Jingwen Feng ◽  
Bo Hu ◽  
Jingting Sun ◽  
Junpeng Zhang ◽  
Wen Wang ◽  
...  

Background: The use of social media daily could nurture a fragmented reading habit. However, little is known whether fragmented reading (FR) affects cognition and what are the underlying electroencephalogram (EEG) alterations it may lead to.Purpose: This study aimed to identify whether individuals have FR habits based on the single-trial EEG spectral features using machine learning (ML), as well as to find out the potential cognitive impairment induced by FR.Methods: Subjects were recruited through a questionnaire and divided into FR and noFR groups according to the time they spent on FR per day. Moreover, 64-channel EEG was acquired in Continuous Performance Task (CPT) and segmented into 0.5–1.5 s post-stimulus epochs under cue and background conditions. The sample sizes were as follows: FR in cue condition, 692 trials; noFR in cue condition, 688 trials; FR in background condition, 561 trials; noFR in background condition, 585 trials. For these single-trials, the relative power (RP) of six frequency bands [delta (1–3 Hz), theta (4–7 Hz), alpha (8–13 Hz), beta1 (14–20 Hz), beta2 (21–29 Hz), lower gamma (30–40 Hz)] were extracted as features. After feature selection, the most important feature sets were fed into three ML models, namely Support-Vector Machine (SVM), K-Nearest Neighbor (KNN), and Naive Bayes to perform the identification of FR. RP of six frequency bands was also used as feature sets to conduct classification tasks.Results: The classification accuracy reached up to 96.52% in the SVM model under cue conditions. Specifically, among six frequency bands, the most important features were found in alpha and gamma bands. Gamma achieved the highest classification accuracy (86.69% for cue, 86.45% for background). In both conditions, alpha RP in central sites of FR was stronger than noFR (p &lt; 0.001). Gamma RP in the frontal site of FR was weaker than noFR in the background condition (p &lt; 0.001), while alpha RP in parieto-occipital sites of FR was stronger than noFR in the cue condition (p &lt; 0.001).Conclusion: Fragmented reading can be identified based on single-trial EEG evoked by CPT using ML, and the RP of alpha and gamma may reflect the impairment on attention and working memory by FR. FR might lead to cognitive impairment and is worth further exploration.


2021 ◽  
Vol 7 ◽  
pp. e775
Author(s):  
Malik Daler Ali Awan ◽  
Nadeem Iqbal Kajla ◽  
Amnah Firdous ◽  
Mujtaba Husnain ◽  
Malik Muhammad Saad Missen

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.


the state-of-art person re-identification (prid) models for ranking generally depends on labeled pairwise feature sets information to learn a task-dependent distance metric. Further, in retrieval process, re-ranking is an important mechanism for enhancing the accuracy. However, very limited work is carried out for designing a re-ranking method, particularly for automatic and unsupervised strategies. The existing re-ranking based prid model is not efficient when multiple persons appears simultaneously in second camera. This is because the existing model identify person in second camera by matching the feature sets with feature sets in first camera, individually with respect to other person in the second camera. For overcoming research problem, this paper present robust and efficient prid (reprid) model. First, present a robust learning/ranking method using k-nearest neighbor (knn) graph. Then, this work present a re-ranking method to improve accuracy of prid by using information of co-occurrence persons for matching and reorganizing given rank lists. Experiment are conducted on standard dataset shows robustness and effectiveness of proposed prid method.


Sign in / Sign up

Export Citation Format

Share Document