Joint Chinese word segmentation and punctuation prediction using deep recurrent neural network for social media data

Background Social media data can be explored as a tool to detect sleep deprivation. First-year undergraduate students in their first quarter were invited to wear sleep-tracking devices (Basis; Intel), allow us to follow them on Twitter, and complete weekly surveys regarding their sleep. Objective This study aimed to determine whether social media data can be used to monitor sleep deprivation. Methods The sleep data obtained from the device were utilized to create a tiredness model that aided in labeling the tweets as sleep deprived or not at the time of posting. Labeled data were used to train and test a gated recurrent unit (GRU) neural network as to whether or not study participants were sleep deprived at the time of posting. Results Results from the GRU neural network suggest that it is possible to classify the sleep-deprivation status of a tweet’s author with an average area under the curve of 0.68. Conclusions It is feasible to use social media to identify students’ sleep deprivation. The results add to the body of research suggesting that social media data should be further explored as a potential source for monitoring health.

Download Full-text

Application of MPSO-Based Neural Network Model in Chinese Word Segmentation

2009 Second International Conference on Intelligent Computation Technology and Automation ◽

10.1109/icicta.2009.78 ◽

2009 ◽

Author(s):

Xiaorong Cheng ◽

Dong Wang ◽

Kun Xie

Keyword(s):

Neural Network ◽

Network Model ◽

Neural Network Model ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

Research of Chinese word segmentation based on neural network and particle swarm optimization

The 2010 International Conference on Apperceiving Computing and Intelligence Analysis Proceeding ◽

10.1109/icacia.2010.5709850 ◽

2010 ◽

Author(s):

Jia He ◽

Guan-Hong Li

Keyword(s):

Neural Network ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Swarm Optimization

Download Full-text

Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts

Computational Intelligence and Neuroscience ◽

10.1155/2016/1638936 ◽

2016 ◽

Vol 2016 ◽

pp. 1-13 ◽

Cited By ~ 10

Author(s):

Helena Gómez-Adorno ◽

Ilia Markov ◽

Grigori Sidorov ◽

Juan-Pablo Posadas-Durán ◽

Miguel A. Sanchez-Perez ◽

...

Keyword(s):

Neural Network ◽

Social Media ◽

Data Preprocessing ◽

Feature Representation ◽

Social Media Data ◽

Lexical Resource ◽

Media Texts ◽

Author Profiling ◽

Media Data

We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available.

Download Full-text

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text