scholarly journals CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

2021 ◽  
Author(s):  
Fei Shen ◽  
Wenting Yu ◽  
Chen Min ◽  
Qianying Ye ◽  
Chuanli Xia ◽  
...  

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

2019 ◽  
Vol 12 (3) ◽  
pp. 93-103
Author(s):  
Ruei-Shan Lu ◽  
Hsiu-Yuan Tsao ◽  
Hao-Chaing Koong Lin ◽  
Yu-Chun Ma ◽  
Cheng-Tung Chuang

This article uses text mining and a Chinese word segmentation program developed by the Chinese Knowledge and Information Processing Group in Taiwan's Academia Sinica to analyze Facebook posts from 14 e-commerce companies. In addition, a list of keywords representing brand personalities is analyzed to reveal key factors affecting which social media posts attract consumers' attention. This research uses statistical analysis with a nonmanual questionnaire that is efficient and based on computer science to provide a reference for businesses operating Facebook fan pages and internet marketing.


2020 ◽  
Vol 34 (5) ◽  
pp. 826-844 ◽  
Author(s):  
Louis Tay ◽  
Sang Eun Woo ◽  
Louis Hickman ◽  
Rachel M. Saef

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology


2015 ◽  
Author(s):  
Xinchi Chen ◽  
Xipeng Qiu ◽  
Chenxi Zhu ◽  
Xuanjing Huang

Sign in / Sign up

Export Citation Format

Share Document