Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology

Download Full-text

Rapid Assessment of Customer Marketplace in Disaster Settings through Machine Learning, Geospatial Information, and Social Media Text Mining: An Abstract

Developments in Marketing Science: Proceedings of the Academy of Marketing Science - Finding New Ways to Engage and Satisfy Global Customers ◽

10.1007/978-3-030-02568-7_133 ◽

2019 ◽

pp. 479-480

Author(s):

Rajiv Garg ◽

Patrick Brockett ◽

Linda L. Golden ◽

Yuxin Zhang

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Rapid Assessment ◽

Geospatial Information ◽

Social Media Text

Download Full-text

Suspicious Tweet Identification Using Machine Learning Approaches for Improving Social Media Marketing Analysis

International Journal of Business Intelligence and Data Mining ◽

10.1504/ijbidm.2022.10040478 ◽

2022 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Thamaraiselvan Natarajan ◽

Senthil Arasu Balasubramanian ◽

Jonath BackiaSeelan

Keyword(s):

Machine Learning ◽

Social Media ◽

Social Media Marketing ◽

Learning Approaches ◽

Marketing Analysis

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

An ontological artifact for classifying social media: Text mining analysis for financial data

International Journal of Accounting Information Systems ◽

10.1016/j.accinf.2020.100469 ◽

2020 ◽

Vol 38 ◽

pp. 100469

Author(s):

Zamil Alzamil ◽

Deniz Appelbaum ◽

Robert Nehmer

Keyword(s):

Social Media ◽

Text Mining ◽

Financial Data ◽

Social Media Text

Download Full-text

Spoiler alert: Machine learning approaches to detect social media posts with revelatory information

Proceedings of the American Society for Information Science and Technology ◽

10.1002/meet.14505001073 ◽

2013 ◽

Vol 50 (1) ◽

pp. 1-9 ◽

Cited By ~ 13

Author(s):

Jordan Boyd-Graber ◽

Kimberly Glasgow ◽

Jackie Sauter Zajac

Keyword(s):

Machine Learning ◽

Social Media ◽

Learning Approaches

Download Full-text

Schizophrenia Detection Using Machine Learning Approach from Social Media Content

Sensors ◽

10.3390/s21175924 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5924

Author(s):

Yi Ji Bae ◽

Midan Shim ◽

Won Hee Lee

Keyword(s):

Mental Health ◽

Machine Learning ◽

Social Media ◽

Mental Health Problems ◽

Negative Emotion ◽

Supervised Machine Learning ◽

Control Group ◽

Learning Approaches ◽

Linguistic Features ◽

Media Texts

Schizophrenia is a severe mental disorder that ranks among the leading causes of disability worldwide. However, many cases of schizophrenia remain untreated due to failure to diagnose, self-denial, and social stigma. With the advent of social media, individuals suffering from schizophrenia share their mental health problems and seek support and treatment options. Machine learning approaches are increasingly used for detecting schizophrenia from social media posts. This study aims to determine whether machine learning could be effectively used to detect signs of schizophrenia in social media users by analyzing their social media texts. To this end, we collected posts from the social media platform Reddit focusing on schizophrenia, along with non-mental health related posts (fitness, jokes, meditation, parenting, relationships, and teaching) for the control group. We extracted linguistic features and content topics from the posts. Using supervised machine learning, we classified posts belonging to schizophrenia and interpreted important features to identify linguistic markers of schizophrenia. We applied unsupervised clustering to the features to uncover a coherent semantic representation of words in schizophrenia. We identified significant differences in linguistic features and topics including increased use of third person plural pronouns and negative emotion words and symptom-related topics. We distinguished schizophrenic from control posts with an accuracy of 96%. Finally, we found that coherent semantic groups of words were the key to detecting schizophrenia. Our findings suggest that machine learning approaches could help us understand the linguistic characteristics of schizophrenia and identify schizophrenia or otherwise at-risk individuals using social media texts.

Download Full-text

A REVIEW ON SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING TEXT MINING AND MACHINE LEARNING.

International Journal of Advanced Research ◽

10.21474/ijar01/526 ◽

2016 ◽

Vol 4 (5) ◽

pp. 772-775

Author(s):

GURPREET KAUR ◽

◽

MANOJ KUMAR ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Sentiment Analysis ◽

Social Media Data ◽

Media Data

Download Full-text

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

10.7287/peerj.preprints.1460 ◽

2015 ◽

Author(s):

Jeffrey A Thompson ◽

Jie Tan ◽

Casey S Greene

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Quantile Normalization ◽

Learning Approaches ◽

Rna Seq ◽

Distribution Matching ◽

Machine Learning Applications ◽

Cross Platform ◽

R Programming

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

Download Full-text