Overview of the NLPCC 2018 Shared Task: Social Media User Modeling

Author(s):  
Fuzheng Zhang ◽  
Xing Xie
Author(s):  
Lingfei Qian ◽  
Anran Wang ◽  
Yan Wang ◽  
Yuhang Huang ◽  
Jian Wang ◽  
...  

2020 ◽  
pp. 1-1
Author(s):  
Xiaolin Chen ◽  
Xuemeng Song ◽  
Siwei Cui ◽  
Tian Gan ◽  
Zhiyong Cheng ◽  
...  

Author(s):  
Andre F. Ribeiro

AbstractWe present an approach for the prediction of user authorship and feedback behavior with shared content. We consider that users use models of other users and their feedback to choose what to publish next. We look at the problem as a game between authors and audiences and relate it to current content-based user modeling solutions with no prior strategic models. As applications, we consider the large-scale authorship of Wikipedia pages, movies and food recipes. We demonstrate analytic properties, authorship and feedback prediction results, and an overall framework to study content authorship regularities in social media.


Author(s):  
ABEED SARKER ◽  
AZADEH NIKFARJAM ◽  
GRACIELA GONZALEZ

Author(s):  
Roberto Napoli ◽  
Ali Mert Ertugrul ◽  
Alessandro Bozzon ◽  
Marco Brambilla

2018 ◽  
Vol 25 (10) ◽  
pp. 1274-1283 ◽  
Author(s):  
Abeed Sarker ◽  
Maksim Belousov ◽  
Jasper Friedrichs ◽  
Kai Hakala ◽  
Svetlana Kiritchenko ◽  
...  

AbstractObjectiveWe executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.Materials and MethodsWe organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.ResultsAmong 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.DiscussionAmong individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).ConclusionsData imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).


Sign in / Sign up

Export Citation Format

Share Document