Design and implementation of natural language processing with syntax and semantic analysis for extract traffic conditions from social media data

Author(s):  
Mochamad Vicky Ghani Aziz ◽  
Ary Setijadi Prihatmanto ◽  
Diotra Henriyan ◽  
Rifki Wijaya
2021 ◽  
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL


2021 ◽  
Author(s):  
Joo Yun Lee

This study analyzed collected social media data from South Korea containing keywords related to “pregnancy” using ontology-based natural language processing. Of the 504,725 documents, those containing concepts related to “maternal emotion” were the most frequent, followed by “family support”. Social media were used as a means of exchanging information and expressing emotions.


10.2196/29768 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e29768
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

Background A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. Objective The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. Methods We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. Results Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. Conclusions Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses.


2020 ◽  
Author(s):  
Oladapo Oyebode ◽  
Chinenye Ndulue ◽  
Ashfaq Adib ◽  
Dinesh Mulchandani ◽  
Banuchitra Suruliraj ◽  
...  

BACKGROUND The COVID-19 pandemic has caused a global health crisis that affects many aspects of human lives. In the absence of vaccines and antivirals, several behavioural change and policy initiatives, such as physical distancing, have been implemented to control the spread of the coronavirus. Social media data can reveal public perceptions toward how governments and health agencies across the globe are handling the pandemic, as well as the impact of the disease on people regardless of their geographic locations in line with various factors that hinder or facilitate the efforts to control the spread of the pandemic globally. OBJECTIVE This paper aims to investigate the impact of the COVID-19 pandemic on people globally using social media data. METHODS We apply natural language processing (NLP) and thematic analysis to understand public opinions, experiences, and issues with respect to the COVID-19 pandemic using social media data. First, we collect over 47 million COVID-19-related comments from Twitter, Facebook, YouTube, and three online discussion forums. Second, we perform data preprocessing which involves applying NLP techniques to clean and prepare the data for automated theme extraction. Third, we apply context-aware NLP approach to extract meaningful keyphrases or themes from over 1 million randomly-selected comments, as well as compute sentiment scores for each theme and assign sentiment polarity (i.e., positive, negative, or neutral) based on the scores using lexicon-based technique. Fourth, we categorize related themes into broader themes. RESULTS A total of 34 negative themes emerged, out of which 15 are health-related issues, psychosocial issues, and social issues related to the COVID-19 pandemic from the public perspective. Some of the health-related issues are increased mortality, health concerns, struggling health systems, and fitness issues; while some of the psychosocial issues include frustrations due to life disruptions, panic shopping, and expression of fear. Social issues include harassment, domestic violence, and wrong societal attitude. In addition, 20 positive themes emerged from our results. Some of the positive themes include public awareness, encouragement, gratitude, cleaner environment, online learning, charity, spiritual support, and innovative research. CONCLUSIONS We uncover various negative and positive themes representing public perceptions toward the COVID-19 pandemic and recommend interventions that can help address the health, psychosocial, and social issues based on the positive themes and other remedial ideas rooted in research. These interventions will help governments, health professionals and agencies, institutions, and individuals in their efforts to curb the spread of COVID-19 and minimize its impact, as well as in reacting to any future pandemics.


10.2196/18767 ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. e18767
Author(s):  
Jooyun Lee ◽  
Hyeoun-Ae Park ◽  
Seul Ki Park ◽  
Tae-Min Song

Background Analysis of posts on social media is effective in investigating health information needs for disease management and identifying people’s emotional status related to disease. An ontology is needed for semantic analysis of social media data. Objective This study was performed to develop a cancer ontology with terminology containing consumer terms and to analyze social media data to identify health information needs and emotions related to cancer. Methods A cancer ontology was developed using social media data, collected with a crawler, from online communities and blogs between January 1, 2014 and June 30, 2017 in South Korea. The relative frequencies of posts containing ontology concepts were counted and compared by cancer type. Results The ontology had 9 superclasses, 213 class concepts, and 4061 synonyms. Ontology-driven natural language processing was performed on the text from 754,744 cancer-related posts. Colon, breast, stomach, cervical, lung, liver, pancreatic, and prostate cancer; brain tumors; and leukemia appeared most in these posts. At the superclass level, risk factor was the most frequent, followed by emotions, symptoms, treatments, and dealing with cancer. Conclusions Information needs and emotions differed according to cancer type. The observations of this study could be used to provide tailored information to consumers according to cancer type and care process. Attention should be paid to provision of cancer-related information to not only patients but also their families and the general public seeking information on cancer.


2020 ◽  
Author(s):  
Jooyun Lee ◽  
Hyeoun-Ae Park ◽  
Seul Ki Park ◽  
Tae-Min Song

BACKGROUND Analysis of posts on social media is effective in investigating health information needs for disease management and identifying people’s emotional status related to disease. An ontology is needed for semantic analysis of social media data. OBJECTIVE This study was performed to develop a cancer ontology with terminology containing consumer terms and to analyze social media data to identify health information needs and emotions related to cancer. METHODS A cancer ontology was developed using social media data, collected with a crawler, from online communities and blogs between January 1, 2014 and June 30, 2017 in South Korea. The relative frequencies of posts containing ontology concepts were counted and compared by cancer type. RESULTS The ontology had 9 superclasses, 213 class concepts, and 4061 synonyms. Ontology-driven natural language processing was performed on the text from 754,744 cancer-related posts. Colon, breast, stomach, cervical, lung, liver, pancreatic, and prostate cancer; brain tumors; and leukemia appeared most in these posts. At the superclass level, risk factor was the most frequent, followed by emotions, symptoms, treatments, and dealing with cancer. CONCLUSIONS Information needs and emotions differed according to cancer type. The observations of this study could be used to provide tailored information to consumers according to cancer type and care process. Attention should be paid to provision of cancer-related information to not only patients but also their families and the general public seeking information on cancer.


2021 ◽  
Vol 10 (6) ◽  
pp. 389
Author(s):  
Jian Liu ◽  
Bin Meng ◽  
Juan Wang ◽  
Siyu Chen ◽  
Bin Tian ◽  
...  

The use of social media data provided powerful data support to reveal the spatiotemporal characteristics and mechanisms of human activity, as it integrated rich spatiotemporal and textual semantic information. However, previous research has not fully utilized its semantic and spatiotemporal information, due to its technical and algorithmic limitations. The efficiency of the deep mining of textual semantic resources was also low. In this research, a multi-classification of text model, based on natural language processing technology and the Bidirectional Encoder Representations from Transformers (BERT) framework is constructed. The residents’ activities in Beijing were then classified using the Sina Weibo data in 2019. The results showed that the accuracy of the classifications was more than 90%. The types and distribution of residents’ activities were closely related to the characteristics of the activities and holiday arrangements. From the perspective of a short timescale, the activity rhythm on weekends was delayed by one hour as compared to that on weekdays. There was a significant agglomeration of residents’ activities that presented a spatial co-location cluster pattern, but the proportion of balanced co-location cluster areas was small. The research demonstrated that location conditions, especially the microlocation condition (the distance to the nearest subway station), were the driving factors that affected the resident activity cluster patterns. In this research, the proposed framework integrates textual semantic analysis, statistical method, and spatial techniques, broadens the application areas of social media data, especially text data, and provides a new paradigm for the research of residents’ activities and spatiotemporal behavior.


Sign in / Sign up

Export Citation Format

Share Document