CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

Mapping Intimacies ◽

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text

Sentiment Analysis of Brand Personality Positioning Through Text Mining

Journal of Information Technology Research ◽

10.4018/jitr.2019070106 ◽

2019 ◽

Vol 12 (3) ◽

pp. 93-103

Author(s):

Ruei-Shan Lu ◽

Hsiu-Yuan Tsao ◽

Hao-Chaing Koong Lin ◽

Yu-Chun Ma ◽

Cheng-Tung Chuang

Keyword(s):

Social Media ◽

Text Mining ◽

Brand Personality ◽

Internet Marketing ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Key Factors ◽

Factors Affecting ◽

Processing Group

This article uses text mining and a Chinese word segmentation program developed by the Chinese Knowledge and Information Processing Group in Taiwan's Academia Sinica to analyze Facebook posts from 14 e-commerce companies. In addition, a list of keywords representing brand personalities is analyzed to reveal key factors affecting which social media posts attract consumers' attention. This research uses statistical analysis with a nonmanual questionnaire that is efficient and based on computer science to provide a reference for businesses operating Facebook fan pages and internet marketing.

Download Full-text

Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining

European Journal of Personality ◽

10.1002/per.2290 ◽

2020 ◽

Vol 34 (5) ◽

pp. 826-844 ◽

Cited By ~ 1

Author(s):

Louis Tay ◽

Sang Eun Woo ◽

Louis Hickman ◽

Rachel M. Saef

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Personality Assessment ◽

Ground Truth ◽

Psychometric Validation ◽

Learning Approaches ◽

Text Data ◽

Personality Psychology ◽

Social Media Text

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology

Download Full-text