scholarly journals Impact of the COVID-19 pandemic on the expression of emotions in social media

2020 ◽  
Vol 15 ◽  
pp. 23-35
Author(s):  
Debabrata Ghosh ◽  

In the age of social media, every second thousands of messages are exchanged. Analyzing those unstructured data to find out specific emotions is a challenging task. Analysis of emotions involves evaluation and classification of text into emotion classes such as Happy, Sad, Anger, Disgust, Fear, Surprise, as defined by emotion dimensional models which are described in the theory of psychology (www 1; Russell, 2005). The main goal of this paper is to cover the COVID-19 pandemic situation in India and its impact on human emotions. As people very often express their state of the mind through social media, analyzing and tracking their emotions can be very effective for government and local authorities to take required measures. We have analyzed different machine learning classification models, such as Naïve Bayes, Support Vector Machine, Random Forest Classifier, Decision Tree and Logistic Regression with 10-fold cross validation to find out top ML models for emotion classification. After tuning the Hyperparameter, we got Logistic regression as the best suited model with accuracy 77% with the given datasets. We worked on algorithm based supervised ML technique to get the expected result. Although multiple studies were conducted earlier along the same lines, none of them performed comparative study among different ML techniques or hyperparameter tuning to optimize the results. Besides, this study has been done on the dataset of the most recent COVID-19 pandemic situation, which is itself unique. We captured Twitter data for a duration of 45 days with hashtag #COVID19India OR #COVID19 and analyzed the data using Logistic Regression to find out how the emotion changed over time based on certain social factors. Keywords: classification, COVID-19, emotion, emotion analysis, Naïve Bayes, Pandemic, Random Forest, SVM.

Author(s):  
Elizabeth Ford ◽  
Philip Rooney ◽  
Seb Oliver ◽  
Richard Hoile ◽  
Peter Hurley ◽  
...  

Abstract Background Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP. Methods We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination. Results The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing. Conclusions Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2019 ◽  
Vol 15 (2) ◽  
pp. 247-254
Author(s):  
Heru Sukma Utama ◽  
Didi Rosiyadi ◽  
Dedi Aridarma ◽  
Bobby Suryo Prakoso

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.


Author(s):  
Muskan Patidar

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit


Author(s):  
Anirudh Reddy Cingireddy ◽  
Robin Ghosh ◽  
Supratik Kar ◽  
Venkata Melapu ◽  
Sravanthi Joginipeli ◽  
...  

Frequent testing of the entire population would help to identify individuals with active COVID-19 and allow us to identify concealed carriers. Molecular tests, antigen tests, and antibody tests are being widely used to confirm COVID-19 in the population. Molecular tests such as the real-time reverse transcription-polymerase chain reaction (rRT-PCR) test will take a minimum of 3 hours to a maximum of 4 days for the results. The authors suggest using machine learning and data mining tools to filter large populations at a preliminary level to overcome this issue. The ML tools could reduce the testing population size by 20 to 30%. In this study, they have used a subset of features from full blood profile which are drawn from patients at Israelita Albert Einstein hospital located in Brazil. They used classification models, namely KNN, logistic regression, XGBooting, naive Bayes, decision tree, random forest, support vector machine, and multilayer perceptron with k-fold cross-validation, to validate the models. Naïve bayes, KNN, and random forest stand out as the most predictive ones with 88% accuracy each.


2020 ◽  
Vol 19 ◽  
pp. 153303382090982
Author(s):  
Melek Akcay ◽  
Durmus Etiz ◽  
Ozer Celik ◽  
Alaattin Ozen

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.


2016 ◽  
Vol 22 (4) ◽  
pp. 751-773 ◽  
Author(s):  
Carolina Gusmão Souza ◽  
Luis Carvalho ◽  
Polyanne Aguiar ◽  
Tássia Borges Arantes

A cafeicultura é uma das principais culturas agrícolas do Brasil e realizar o mapeamento e monitoramento desta cultura é fundamental para conhecer sua distribuição espacial. Porém, mapear estas áreas utilizando imagens de Sensoriamento Remoto não é uma tarefa fácil. Sendo assim, este trabalho foi realizado com o objetivo de comparar o uso de diferentes variáveis e algoritmos de classificação para o mapeamento de áreas cafeeiras. O trabalho foi desenvolvido em três áreas diferentes, que são bastante significativas na produção de café. Foram utilizados 5 algoritmos de aprendizagem de máquinas e 7 combinações de variáveis: espectrais, texturais e geométricas, associadas ao processo de classificação. Um total de 105 classificações foram realizadas, 35 classificações para cada uma das áreas. As classificações que não usaram variáveis espectrais não resultaram em bons índices de acurácia. Nas três áreas, o algoritmo que apresentou as melhores acurácias foi o Support vector machine, com acurácia global de 85,33% em Araguari, 87% em Carmo de Minas e 88,33% em Três Pontas. Os piores resultados foram encontrados com o algoritmo Random Forest em Araguari, com acurácia global de 76,66% e com o Naive Bayes em Carmo de Minas e Três Pontas, com 76% e 82% de acerto. Nas três áreas, variáveis texturais, quando associadas às espectrais, melhoraram a acurácia da classificação. O SVM apresentou o melhor desempenho para as três áreas


2020 ◽  
Vol 7 (3) ◽  
pp. 441-450
Author(s):  
Haliem Sunata

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.


Sign in / Sign up

Export Citation Format

Share Document