An ontological artifact for classifying social media: Text mining analysis for financial data

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text

Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining

European Journal of Personality ◽

10.1002/per.2290 ◽

2020 ◽

Vol 34 (5) ◽

pp. 826-844 ◽

Cited By ~ 1

Author(s):

Louis Tay ◽

Sang Eun Woo ◽

Louis Hickman ◽

Rachel M. Saef

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Personality Assessment ◽

Ground Truth ◽

Psychometric Validation ◽

Learning Approaches ◽

Text Data ◽

Personality Psychology ◽

Social Media Text

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology

Download Full-text

Low Resource Social Media Text Mining

10.1007/978-981-16-5625-5 ◽

2021 ◽

Author(s):

Shriphani Palakodety ◽

Ashiqur R. KhudaBukhsh ◽

Guha Jayachandran

Keyword(s):

Social Media ◽

Text Mining ◽

Low Resource ◽

Social Media Text

Download Full-text

Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction

International Review of Financial Analysis ◽

10.1016/j.irfa.2016.10.009 ◽

2016 ◽

Vol 48 ◽

pp. 272-281 ◽

Cited By ~ 33

Author(s):

Andrew Sun ◽

Michael Lachanski ◽

Frank J. Fabozzi

Keyword(s):

Social Media ◽

Text Mining ◽

Stock Market ◽

Matrix Factorization ◽

Sparse Matrix ◽

Stock Market Prediction ◽

Social Media Text ◽

Sparse Matrix Factorization

Download Full-text

HealthMine: A Tool for Social Media Text Mining in Health

2020 3rd International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM) ◽

10.1109/elecom49001.2020.9297002 ◽

2020 ◽

Author(s):

Somendra Jeelall ◽

Sudha Cheerkoot-Jalim

Keyword(s):

Social Media ◽

Text Mining ◽

Social Media Text

Download Full-text

Causality Patterns for Detecting Adverse Drug Reactions From Social Media: Text Mining Approach (Preprint)

10.2196/preprints.8214 ◽

2017 ◽

Cited By ~ 1

Author(s):

Danushka Bollegala ◽

Simon Maskell ◽

Richard Sloane ◽

Joanna Hajne ◽

Munir Pirmohamed

Keyword(s):

Social Media ◽

Text Mining ◽

Adverse Drug Reactions ◽

Drug Reactions ◽

Social Media Text

Download Full-text

Who is leading China's family planning policy discourse in Weibo? A social media text mining analysis

10.26686/wgtn.14925384 ◽

2021 ◽

Author(s):

Wen Deng ◽

Jia‐Huey Hsu ◽

Karl Löfgren ◽

Wonhyuk Cho

Keyword(s):

Social Media ◽

Family Planning ◽

Text Mining ◽

Policy Discourse ◽

Family Planning Policy ◽

Planning Policy ◽

Social Media Text

No description supplied

Download Full-text

Public attention and sentiment of recycled water: Evidence from social media text mining in China

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.126814 ◽

2021 ◽

pp. 126814

Author(s):

Li Li ◽

Xiaojun Liu ◽

Xinyue Zhang

Keyword(s):

Social Media ◽

Text Mining ◽

Recycled Water ◽

Public Attention ◽

Social Media Text

Download Full-text

Proactive Personality Measurement Using Item Response Theory and Social Media Text Mining

Frontiers in Psychology ◽

10.3389/fpsyg.2021.705005 ◽

2021 ◽

Vol 12 ◽

Author(s):

Gancheng Zhu ◽

Yuci Zhou ◽

Fengfeng Zhou ◽

Min Wu ◽

Xiangping Zhan ◽

...

Keyword(s):

Social Media ◽

Item Response Theory ◽

Text Mining ◽

Item Response ◽

Evaluation Model ◽

Proactive Personality ◽

Combined Method ◽

Response Data ◽

Essay Question ◽

Social Media Text

This prospective study was designed to propose a novel method of assessing proactive personality by combining text mining technology and Item Response Theory (IRT) to measure proactive personality more efficiently. We got freely expressed texts (essay question text dataset and social media text dataset) and item response data on the topic of proactive personality from 901 college students. To enhance validity and reliability, three different approaches were employed in the study. In Method 1, we used item response data to develop a proactive personality evaluation model based on IRT. In Method 2, we used freely expressed texts to develop a proactive personality evaluation model based on text mining. In Method 3, we utilized the text mining results as the prior information for the IRT estimation and built a proactive personality evaluation model combining text mining and IRT. Finally, we evaluated those three approaches via the confusion matrix indicators. The major result revealed that (1) the combined method based on essay question text, micro-blog text with pre-estimated IRT parameters performed the highest accuracy of 0.849; (2) the combined method using essay question text and pre-estimated IRT parameters performed the highest sensitivity of 0.821; (3) the text classification method based on essay question text had the best performance on the specificity of 0.959; and (4) if the models were considered comprehensively, the combined method using essay question text, micro-blog text, and pre-estimated IRT parameters achieved the best performance. Thus, we concluded that the novel combined method was significantly better than the other two traditional methods based on IRT and text mining.

Download Full-text