Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

Author(s):  
Md. Rejaul Alam ◽  
Afsana Akter ◽  
Minhajul Abedin Shafin ◽  
Md. Mehedi Hasan ◽  
Antara Mahmud
2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2018 ◽  
Vol 24 (2) ◽  
pp. 221-264 ◽  
Author(s):  
SABINE GRÜNDER-FAHRER ◽  
ANTJE SCHLAF ◽  
GREGOR WIEDEMANN ◽  
GERHARD HEYER

AbstractSocial media are an emerging new paradigm in interdisciplinary research in crisis informatics. They bring many opportunities as well as challenges to all fields of application and research involved in the project of using social media content for an improved disaster management. Using the Central European flooding 2013 as our case study, we optimize and apply methods from the field ofnatural language processingand unsupervised machine learning to investigate the thematic and temporal structure of German social media communication. By means of topic model analysis, we will investigate which kind of content was shared on social media during the event. On this basis, we will, furthermore, investigate the development of topics over time and apply temporal clustering techniques to automatically identify different characteristic phases of communication. From the results, we, first, want to reveal properties of social media content and show what potential social media have for improving disaster management in Germany. Second, we will be concerned with the methodological issue of finding and adapting natural language processing methods that are suitable for analysing social media data in order to obtain information relevant for disaster management. With respect to the first, application-oriented focal point, our study reveals high potential of social media content in the factual, organizational and psychological dimension of the disaster and during all stages of the disaster management life cycle. Interestingly, there appear to be systematic differences in thematic profile between the different platforms Facebook and Twitter and between different stages of the event. In context of our methodological investigation, we claim that if topic model analysis is combined with appropriate optimization techniques, it shows high applicability for thematic and temporal social media analysis in disaster management.


2019 ◽  
Vol 38 ◽  
pp. 100958 ◽  
Author(s):  
Arjan S. Gosal ◽  
Ilse R. Geijzendorffer ◽  
Tomáš Václavík ◽  
Brigitte Poulin ◽  
Guy Ziv

Author(s):  
Mitta Roja

Abstract: Cyberbullying is a major problem encountered on internet that affects teenagers and also adults. It has lead to mishappenings like suicide and depression. Regulation of content on Social media platorms has become a growing need. The following study uses data from two different forms of cyberbullying, hate speech tweets from Twittter and comments based on personal attacks from Wikipedia forums to build a model based on detection of Cyberbullying in text data using Natural Language Processing and Machine learning. Threemethods for Feature extraction and four classifiers are studied to outline the best approach. For Tweet data the model provides accuracies above 90% and for Wikipedia data it givesaccuracies above 80%. Keywords: Cyberbullying, Hate speech, Personal attacks,Machine learning, Feature extraction, Twitter, Wikipedia


Sign in / Sign up

Export Citation Format

Share Document