scholarly journals Classifying patient and professional voice in social media health posts

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Beatrice Alex ◽  
Donald Whyte ◽  
Daniel Duma ◽  
Roma English Owen ◽  
Elizabeth A. L. Fairley

Abstract Background Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of social media data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a convolutional neural network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). Results We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreed roughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only. Conclusion The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance. We showed that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients.

2021 ◽  
Author(s):  
Beatrice Alex ◽  
Donald Whyte ◽  
Daniel Duma ◽  
Roma English Owen ◽  
Elizabeth A.L. Fairley

Abstract Background: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of socialmedia data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a Convolutional Neural Network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). Results: We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreedroughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only.Conclusion: The main conclusion resulting from this work is that using more data for training a classifier does not necessarily result in best possible performance. In the context of classifying social media posts by patient and professional voice, we showed that it is best to train separate models per data source (Reddit andTwitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients.


Author(s):  
Mohamad Hasan

This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.


Author(s):  
F. O. Ostermann ◽  
H. Huang ◽  
G. Andrienko ◽  
N. Andrienko ◽  
C. Capineri ◽  
...  

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.


10.2196/26119 ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. e26119
Author(s):  
Guanghui Fu ◽  
Changwei Song ◽  
Jianqiang Li ◽  
Yue Ma ◽  
Pan Chen ◽  
...  

Background Web-based social media provides common people with a platform to express their emotions conveniently and anonymously. There have been nearly 2 million messages in a particular Chinese social media data source, and several thousands more are generated each day. Therefore, it has become impossible to analyze these messages manually. However, these messages have been identified as an important data source for the prevention of suicide related to depression disorder. Objective We proposed in this paper a distant supervision approach to developing a system that can automatically identify textual comments that are indicative of a high suicide risk. Methods To avoid expensive manual data annotations, we used a knowledge graph method to produce approximate annotations for distant supervision, which provided a basis for a deep learning architecture that was built and refined by interactions with psychology experts. There were three annotation levels, as follows: free annotations (zero cost), easy annotations (by psychology students), and hard annotations (by psychology experts). Results Our system was evaluated accordingly and showed that its performance at each level was promising. By combining our system with several important psychology features from user blogs, we obtained a precision of 80.75%, a recall of 75.41%, and an F1 score of 77.98% for the hardest test data. Conclusions In this paper, we proposed a distant supervision approach to develop an automatic system that can classify high and low suicide risk based on social media comments. The model can therefore provide volunteers with early warnings to prevent social media users from committing suicide.


2021 ◽  
Author(s):  
Nick Boettcher

BACKGROUND The study of depression and anxiety using publicly available social media data is a research activity that has grown considerably over the last decade. The discussion platform Reddit has become a popular social media data source in this nascent area of study, in part because of the unique ways in which the platform is facilitative of research. To date, no work has been done to synthesize existing studies of depression and anxiety using Reddit. OBJECTIVE The objective of this review is to understand the scope and nature of research using Reddit as a primary data source for studying depression and anxiety. METHODS A scoping review was conducted using the Arksey and O’Malley framework. Academic databases searched include MEDLINE/PubMed, EMBASE, CINAHL, PsycINFO, PsycARTICLES, Scopus, ScienceDirect, IEEE Xplore, and ACM database. Inclusion criteria were developed using the Participants/Concept/Context framework outlined by the Joanna Briggs Institute Scoping Review Methodology Group. Eligible studies featured a methodological focus on analyzing depression and/or anxiety using naturalistic written expressions from Reddit users as the primary data source. RESULTS 54 Studies were included for review. Tables and corresponding analysis delineate key methodological features including a comparatively larger focus on depression versus anxiety, an even split of original and premade datasets, a favored analytic focus on classifying the mental health states of Reddit users, and practical implications often recommending new methods of professionally-driven mental health monitoring and outreach for Reddit users. CONCLUSIONS Studies of depression and anxiety using Reddit data are currently driven by a prevailing methodology which favors a technical, solution-based orientation. Researchers interested in advancing this research area will benefit from further consideration of conceptual issues surrounding interpretation of Reddit data with the medical model of mental health. Further efforts are also needed to locate accountability and autonomy within practice implications suggesting new forms of engagement with Reddit users.


Aksara ◽  
2021 ◽  
Vol 32 (2) ◽  
pp. 323-338
Author(s):  
Hari Kusmanto ◽  
Nadia Puji Ayu ◽  
Harun Joko Prayitno ◽  
Laili Etika Rahmawati ◽  
Dini Restiyanti Pratiwi ◽  
...  

Abstrak Studi ini bertujuan mendeskripsikan wujud kesantunan berkomunikasi dalam media sosial WhatsApp antara mahasiswa dan dosen. Studi ini adalah kualitatif. Data dalam studi ini adalah kalimat-kalimat santun dalam wacana akademik di media sosial. Sumber data dalam studi ini adalah tuturan wacana akademik di media sosial. Pengumpulan data dalam studi ini menggunakan metode dokumentasi, simak, dan dilanjutkan dengan teknik catat. Analisis data dalam studi ini dilakukan dengan metode padan intralingual; padan pragmatis dan diperkuat dengan teknik analisis kesantunan Brown dan Levinson berperspektif humanis. Hasil studi ini menunjukkan tindak kesantunan positif meliputi: (1) mengucapkan terima kasih sebagai penghormatan kepada mitra tutur, 48%; (2) memberikan pertanyaan sebagai wujud perhatian kepada mitra tutur, 8%; (3) memberikan informasi kepada mitra tutur sebagai wujud kepedulian, 18%; (4) menunjukkan keoptimisan kepada mitra tutur supaya termotivasi, 4%; (5) memberikan hadiah kepada mitra tutur dengan memberikan dukungan, 4%; (6) mengucapkan salam kepada mitra tutur sebagai upaya mendoakan kebaikan kepada mitra tutur, 8%; dan (7) menggunakan penanda identitas sebagai wujud menjalin solidaritas antara penutur dan mitra tutur, 10%. Hal ini menunjukkan mahasiswa memiliki sikap penghormatan yang tinggi kepada dosen dengan menunjukkan komunikasi bernada positif. Tindak kesantunan mengucapkan terima kasih, memberikan informasi yang dibutuhkan mitra tutur, menunjukkan sikap percaya diri, mengucapkan salam merupakan wujud komunikasi yang berperspektif humanis, yakni menjunjung nilai-nilai kemanusian. Penelitian ini bermanfaat dalam membangun komunikasi pembelajaran yang berorientasi pada kesantunan berbahasa yang memartabatkan nilai-nilai humanitas dalam pembelajaran. Kata kunci: kesantunan positif, akademik, media sosial, humanis Abstract This study aims to describe the form of politeness in communicating on WhatsApp social media between students and lecturers. This study is qualitative. The data in this study are polite sentences in academic discourse on social media. The data source in this study is the speech of academic discourse on social media. Data collection in this study uses the documentation method, refer to it, and proceed with note taking technique. Data analysis in this study was carried out using the intralingual equivalent method; pragmatic equivalent and strengthened by Brown and Levinson’s politeness analysis techniques with a sweet perspective. The results of this study show positive politeness actions include: (1) Thank you for the speech partner observer 48%; (2) giving questions as a form of attention to the speech partners 8%; (3) providing information to the speech partners as a form of concern 18%; (4) showing optimism for the speech partners to be motivated 4%; (5) giving gifts to speech partners by giving support 4%; (6) greeting the speech partners in an effort to pray for the kindness of the speech partners 8%; and (7) using identity markers as a form of establishing solidarity between the speaker and the speech partner 10%.. ISSN 0854-3283 (Print), ISSN 2580-0353 (Online) , Vol. 32, No. 2, Desember 2020 323 Realisasi Tindak Kesantunan Positif dalam Wacana Akademik di Media Sosial Berperspektif Humanitas Halaman 323 — 338 (Hari Kusmanto, Nadia P. Ayu, Harun J. Prayitno, Laili E. Rahmawati, Dini R. Pratiwi, dan Tri Santoso) This shows students have a high attitude of respect for lecturers by showing positive communication. Actions of thanksgiving, giving information needed by the speech partner, showing self-con dence, greeting is a form of communication with a humanist perspective, namely upholding human values. This research is useful in building learning communication that is oriented towards language politeness that digni es human values in learning. Keywords: positive politeness, academic, social media, humanity 


2015 ◽  
Author(s):  
Evika Karamagioli

Background: As the use of social media creates huge amounts of data, the need for big data analysis has to synthesize the information and determine which actions is generated. Online communication channels such as Facebook, Twitter, Instagram etc provide a wealth of passively collected data that may be mined for public health purposes such as health surveillance, health crisis management, and last but not least health promotion and education. Objective: We explore international bibliography on the potential role and perceptive of use for social media as a big data source for public health purposes. Method: Systematic literature review. Data extraction and synthesis was performed with the use of thematic analysis. Results: Examples of those currently collecting and analyzing big data from generated social content include scientists who are working with the Centers for Disease Control and Prevention to track the spread of flu by analyzing what user searches, and the World Health Organization is working on disaster management relief. But what exactly do we do with this big social media data? We can track real-time trends and understand them quicker through the platforms and processing services. By processing this big social media data, it is possible to determine specific patterns in conversation topics, users behaviors, overall trends and influencers, sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Conclusion: The key to fostering big data and social media converge is process and analyze the right data that may be mined for purposes of public health, so as to provide strategic insights for planning, execution and measurement of effective and efficient public health interventions. In this effort, political, economic and legal obstacles need to be seriously considered.


2021 ◽  
Vol 10 (8) ◽  
pp. 524
Author(s):  
Xiang Feng ◽  
Peipei Wu ◽  
Wei Shen ◽  
Qian Huang

This paper measures the cultural consumption patterns of expatriates in Shanghai by applying a geo-information approach to data derived from social media. In order to reveal the geographical characteristics, the paper zooms in on the level of city districts and presents a typology based on the degree of spatial and functional aggregation of cultural venues. Three major contextual parameters underlying the typology are discerned: the geographies of the Shanghai space-economy, the imprint of Shanghai’s spatio-political strategies, and the overall policy approach toward this community. We discuss how this study can be used as the starting point for further comparative studies on cultural patterns of expatriates in other geographical contexts.


Author(s):  
T. Moyo ◽  
W. Musakwa

The study of commuters’ origins and destinations (O_D) promises to assist transportation planners with prediction models to inform decision making. Conventionally O_D surveys are undertaken through travel surveys and traffic counts, however data collection for these surveys has historically proven to be time consuming and having a strain on human resources, thus a need for an alternative data source arises. This study combines the use social media data and geographic information systems in the creation of a model for origin and destination surveys. The model tests the potential of using big data from Echo echo software which contains Twitter and Facebook data obtained from social media users in Gauteng. This data contains geo-location and it is used to determine origin and destination as well as concentration levels of Gautrain commuters. A kriging analysis was performed on the data to determine the O-D and concentration levels of Gautrain users. The results reveal the concentration of Gautrain commuters at various points of interest that is where they work, live or socialise. The results from the study highlight which nodes attract the most commuters and also possible locations for the expansion for Gautrain. Lastly, the study also highlights some weakness of crowdsourced data for informing transportation planning.


2020 ◽  
Author(s):  
Mahmoud Arafat

<p>In response to the Coronavirus disease (COVID-19) outbreak and the Transportation Research Board’s (TRB) urgent need for work related to transportation and pandemics, this paper contributes with a sense of urgency and provides a starting point for research on the topic. The main goal of this paper is to support transportation researchers and the TRB community during this COVID-19 pandemic by reviewing the performance of software models used for extracting large-scale data from Twitter streams related to COVID-19. The study extends the previous research efforts in social media data mining by providing a review of contemporary tools, including their computing maturity and their potential usefulness. The paper also includes an open repository for the processed data frames to facilitate the quick development of new transportation research studies. The output of this work is recommended to be used by the TRB community when deciding to further investigate topics related to COVID-19 and social media data mining tools.</p>


Sign in / Sign up

Export Citation Format

Share Document