Pattern Mining Approaches Used in Social Media Data

Jyotismita Chaki; Nilanjan Dey; B. K. Panigrahi; Fuqian Shi; Simon James Fong; R. Simon Sherratt

doi:10.1142/s021848852040019x

Pattern Mining Approaches Used in Social Media Data

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848852040019x ◽

2020 ◽

Vol 28 (Supp02) ◽

pp. 123-152

Author(s):

Jyotismita Chaki ◽

Nilanjan Dey ◽

B. K. Panigrahi ◽

Fuqian Shi ◽

Simon James Fong ◽

...

Keyword(s):

Social Media ◽

Pattern Mining ◽

Strong Dependence ◽

Future Research ◽

Data Generation ◽

Grammatical Structure ◽

Social Media Data ◽

Social Media Networks ◽

Research Guidelines ◽

Media Data

Social media conveys a reachable platform for users to share information. The inescapable practice of social media has produced remarkable volumes of social data. Social media gathers the data in both structured-unstructured and formal-informal ways as users are not concerned with the exact grammatical structure and spelling when interacting with each other by means of various social networking websites (Twitter, Facebook, YouTube, LinkedIn, etc.). People are increasingly involved in and dependent on social media networks for data, news and opinions of other handlers on a variety of topics. The strong dependence on social media network sites contributes to enormous data generation characterized by three issues: scale, noise, and variety. Such problems also hinder social network data to be evaluated manually, resulting in the correct use of statistical analytical methods. Mining social media data can extract significant patterns that can be advantageous for consumers, users, and business. Pattern mining offers a wide variety of methods to detect valuable knowledge from huge datasets, such as patterns, trends, and rules. In this work, data was collected comprised of users’ opinions and sentiments and then processed using a significant number of pattern mining methods. The results were then further analyzed to attain meaningful information. The aim of this paper is to deliver a summary and a set of strategies for utilizing the ubiquitous pattern mining approaches, and to recognize the challenges and future research guidelines of dealing out social media data.

Tweeting back: predicting new cases of back pain with mass social media data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv168 ◽

2015 ◽

Vol 23 (3) ◽

pp. 644-648 ◽

Cited By ~ 10

Author(s):

Hopin Lee ◽

James H McAuley ◽

Markus Hübscher ◽

Heidi G Allen ◽

Steven J Kamper ◽

...

Keyword(s):

Risk Factors ◽

Social Media ◽

Back Pain ◽

Health Interventions ◽

Future Research ◽

Preventive Interventions ◽

Public Health Interventions ◽

Social Media Data ◽

Media Data ◽

Global Health Problem

Background Back pain is a global health problem. Recent research has shown that risk factors that are proximal to the onset of back pain might be important targets for preventive interventions. Rapid communication through social media might be useful for delivering timely interventions that target proximal risk factors. Identifying individuals who are likely to discuss back pain on Twitter could provide useful information to guide online interventions. Methods We used a case-crossover study design for a sample of 742 028 tweets about back pain to quantify the risks associated with a new tweet about back pain. Results The odds of tweeting about back pain just after tweeting about selected physical, psychological, and general health factors were 1.83 (95% confidence interval [CI], 1.80-1.85), 1.85 (95% CI: 1.83-1.88), and 1.29 (95% CI, 1.27-1.30), respectively. Conclusion These findings give directions for future research that could use social media for innovative public health interventions.

Automated Discovery of Lead Users and Latent Product Features by Mining Large Scale Social Media Networks

Journal of Mechanical Design ◽

10.1115/1.4030049 ◽

2015 ◽

Vol 137 (7) ◽

Cited By ~ 39

Author(s):

Suppawong Tuarob ◽

Conrad S. Tucker

Keyword(s):

Social Media ◽

Large Scale ◽

User Preferences ◽

Social Media Data ◽

Social Media Networks ◽

Lead User ◽

Product Features ◽

Latent Features ◽

Lead Users ◽

Media Data

Lead users play a vital role in next generation product development, as they help designers discover relevant product feature preferences months or even years before they are desired by the general customer base. Existing design methodologies proposed to extract lead user preferences are typically constrained by temporal, geographic, size, and heterogeneity limitations. To mitigate these challenges, the authors of this work propose a set of mathematical models that mine social media networks for lead users and the product features that they express relating to specific products. The authors hypothesize that: (i) lead users are discoverable from large scale social media networks and (ii) product feature preferences, mined from lead user social media data, represent product features that do not currently exist in product offerings but will be desired in future product launches. An automated approach to lead user product feature identification is proposed to identify latent features (product features unknown to the public) from social media data. These latent features then serve as the key to discovering innovative users from the ever increasing pool of social media users. The authors collect 2.1 × 109 social media messages in the United States during a period of 31 months (from March 2011 to September 2013) in order to determine whether lead user preferences are discoverable and relevant to next generation cell phone designs.

Who is Tweeting? A Scoping Review of Methods to Establish Race and Ethnicity from Twitter Datasets (Preprint)

10.2196/preprints.35788 ◽

2021 ◽

Author(s):

Su Golder ◽

Robin Stevens ◽

Karen O'Conor ◽

Richard James ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Social Media ◽

Scoping Review ◽

Race And Ethnicity ◽

Best Practice ◽

English Language ◽

Census Data ◽

Future Research ◽

Social Media Data ◽

Lower Accuracy ◽

Media Data

BACKGROUND Background: A growing amount of health research uses social media data. Those critical of social media research often cite that it may be unrepresentative of the population, but the suitability of social media data in digital epidemiology is more nuanced. Identifying the demographics of social media users can help establish representativeness. OBJECTIVE Objectives: We sought to identify the different approaches or combination of approaches to extract race or ethnicity from social media and report on the challenges of using these methods. METHODS Methods: We present a scoping review to identify the methods used to extract race or ethnicity from Twitter datasets. We searched 17 electronic databases and carried out reference checking and handsearching in order to identify relevant articles. Sifting of each record was undertaken independently by at least two researchers with any disagreement discussed. The included studies could be categorized by the methods the authors applied to extract race or ethnicity. RESULTS Results: From 1249 records we identified 67 that met our inclusion criteria. The majority focus on US based users and English language tweets. A range of types of data were used including Twitter profile -pictures or information from bios (such as names or self-declarations), or location and/or content in the tweets themselves. A range of methodologies were used including using manual inference, linkage to census data, commercial software, language/dialect recognition and machine learning. Not all studies evaluated their methods. Those that did found accuracy to vary from 45% to 93% with significantly lower accuracy identifying non-white race categories. The inference of race/ethnicity raises important ethical questions which can be exacerbated by the data and methods used. The comparative accuracy of different methods is also largely unknown. CONCLUSIONS Conclusion: There is no standard accepted approach or current guidelines for extracting or inferring race or ethnicity of Twitter users. Social media researchers must use careful interpretation of race or ethnicity and not over-promise what can be achieved, as even manual screening is a subjective, imperfect method. Future research should establish the accuracy of methods to inform evidence-based best practice guidelines for social media researchers, and be guided by concerns of equity and social justice.

A mobile visualization platform for exploring social media data

Global Journal of Information Technology Emerging Technologies ◽

10.18844/gjit.v6i1.386 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Yonghong Tong ◽

Muhammet Bakan

Keyword(s):

Social Media ◽

Data Visualization ◽

Mobile Device ◽

Large Scale ◽

Future Research ◽

Social Media Data ◽

Human Behaviors ◽

Big Picture ◽

Analysis Platform ◽

Media Data

With the increasing application of using mobile device and social media, large amount of continuous information about human behaviors is available. Data visualization provides an insightful presentation for the large-scale social media datasets. The focus of this paper is on the development of a mobile-device based visualization and analysis platform for social media data for the purpose of retrieving and visualizing visitors’ information for a specific region. This developed platform allows users to view the “big picture” of the visitors’ locations information. The result shows that the developed platform 1) performs a satisfied data collection and data visualization on a mobile device, 2) assists users to understand the varieties of human behaviors while visiting a place, and 3) offers a feasible role in imaging immediate information from social media and leading to further policy-making in related sectors and areas. Future research opportunities and challenges for social media data visualization are discussed.Keywords: Social media, data visualization, mobile device

From FAIR data to fair data use: Methodological data fairness in health-related social media research

Big Data & Society ◽

10.1177/20539517211010310 ◽

2021 ◽

Vol 8 (1) ◽

pp. 205395172110103

Author(s):

Sabina Leonelli ◽

Rebecca Lovell ◽

Benedict W Wheeler ◽

Lora Fleming ◽

Hywel Williams

Keyword(s):

Social Media ◽

Well Being ◽

Research Process ◽

Future Research ◽

Social Media Data ◽

Data Practices ◽

Health Related Research ◽

Health Related ◽

Use Of Social Media ◽

Media Data

The paper problematises the reliability and ethics of using social media data, such as sourced from Twitter or Instagram, to carry out health-related research. As in many other domains, the opportunity to mine social media for information has been hailed as transformative for research on well-being and disease. Considerations around the fairness, responsibilities and accountabilities relating to using such data have often been set aside, on the understanding that as long as data were anonymised, no real ethical or scientific issue would arise. We first counter this perception by emphasising that the use of social media data in health research can yield problematic and unethical results. We then provide a conceptualisation of methodological data fairness that can complement data management principles such as FAIR by enhancing the actionability of social media data for future research. We highlight the forms that methodological data fairness can take at different stages of the research process and identify practical steps through which researchers can ensure that their practices and outcomes are scientifically sound as well as fair to society at large. We conclude that making research data fair as well as FAIR is inextricably linked to concerns around the adequacy of data practices. The failure to act on those concerns raises serious ethical, methodological and epistemic issues with the knowledge and evidence that are being produced.

Synthetic Social Media Data Generation

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2018.2854668 ◽

2018 ◽

Vol 5 (3) ◽

pp. 605-620 ◽

Cited By ~ 2

Author(s):

Yalin E. Sagduyu ◽

Alexander Grushin ◽

Yi Shi

Keyword(s):

Social Media ◽

Data Generation ◽

Social Media Data ◽

Media Data

Social Media Data Mining: An Analysis & Overview of Social Media Networks and Political Landscape

International Journal of Database Theory and Application ◽

10.14257/ijdta.2016.9.7.25 ◽

2016 ◽

Vol 9 (7) ◽

pp. 291-296 ◽

Cited By ~ 1

Author(s):

Sethunya R Joseph ◽

Keletso Letsholo ◽

Hlomani Hlomani

Keyword(s):

Data Mining ◽

Social Media ◽

Social Media Data ◽

Social Media Networks ◽

Political Landscape ◽

Media Data

Who is Tweeting? A Scoping Review of Methods to Establish Race and Ethnicity from Twitter Datasets

10.31235/osf.io/wru5q ◽

2021 ◽

Author(s):

su golder ◽

Robin Stevens ◽

Karen O'Connor ◽

Richard James ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Social Media ◽

Scoping Review ◽

Race And Ethnicity ◽

Best Practice ◽

Census Data ◽

Future Research ◽

Social Media Data ◽

Lower Accuracy ◽

Reference Checking ◽

Media Data

Background: A growing amount of health research uses social media data. Those critical of social media research often cite that it may be unrepresentative of the population. Identifying the demographics of social media users enables us to measure the representativeness. Extracting race or ethnicity from social media data can be difficult and researchers may choose from a multitude of different approaches. Methods: We present a scoping review to identify the methods used to extract race or ethnicity from Twitter datasets. We searched 16 electronic databases and carried out reference checking in order to identify relevant articles. Sifting of each record was undertaken independently by at least two researchers with any disagreement discussed. The research could be grouped by the methods applied to extract race or ethnicity.Results: From 1093 records we identified 56 that met our inclusion criteria. The majority focus on Twitter users based in the US. A range of types of data were used including Twitter profile -pictures, bios, and/or location, and the content in the tweets themselves. The methods used were wide ranging and included using manual inference, linkage to census data, commercial software, language/dialect recognition and machine learning. Not all studies evaluated their methods. Those that did found accuracy to vary from 45% to 93% with significantly lower accuracy identifying non-white race categories. There may be some ethical questions over some of the methods used, particularly using photos or dialect, as well as questions surrounding accuracy.Conclusion: There is no standard approach or guidelines for extracting race or ethnicity from Twitter or other social media. Social media researchers must use careful interpretation of race or ethnicity and not over-promise what can be achieved, as even manual screening is a subjective, imperfect method. Future research should establish the accuracy of methods to inform evidence-based best practice guidelines for social media researchers, and be guided by concerns of equity and social justice.

Dashboarding The Online Strategic Communications of Anti-slavery Organizations During COVID-19

10.31235/osf.io/e9rbh ◽

2020 ◽

Author(s):

Benjamin Lucas ◽

Liana Bravo-Balsa ◽

Vicky Brotherton ◽

Nicola Wright ◽

Todd Landman

Keyword(s):

Social Media ◽

Preliminary Evidence ◽

Future Research ◽

Strategic Communications ◽

Social Media Data ◽

Working Paper ◽

Modern Slavery ◽

Use Of Social Media ◽

High Level ◽

Media Data

In this working paper, we investigate high-level changes in the online strategic communications of organizations engaged with SDG 8.7 (ending modern slavery) during the COVID-19 crisis. We present preliminary evidence of important semantic and thematic shifts based on data from Twitter during this time, with an emphasis on developing the SOLACE (Social Listening and Communications Engagement) dashboard, and with recommendations for important future research involving the use of social media data as a basis for distilling organizational-agenda proxies based on digital campaigns and activism during times of crisis.

The Relationship Between Social Media Data and Crime Rates in the United States

Social Media + Society ◽

10.1177/2056305119834585 ◽

2019 ◽

Vol 5 (1) ◽

pp. 205630511983458

Author(s):

Yan Wang ◽

Wenchao Yu ◽

Sam Liu ◽

Sean D. Young

Keyword(s):

Public Health ◽

United States ◽

Social Media ◽

The United States ◽

Initial Study ◽

Future Research ◽

Social Media Data ◽

Crime Data ◽

Targeted Interventions ◽

Media Data

Crime monitoring tools are needed for public health and law enforcement officials to deploy appropriate resources and develop targeted interventions. Social media, such as Twitter, has been shown to be a feasible tool for monitoring and predicting public health events such as disease outbreaks. Social media might also serve as a feasible tool for crime surveillance. In this study, we collected Twitter data between May and December 2012 and crime data for the years 2012 and 2013 in the United States. We examined the association between crime data and drug-related tweets. We found that tweets from 2012 were strongly associated with county-level crime data in both 2012 and 2013. This study presents preliminary evidence that social media data can be used to help predict future crimes. We discuss how future research can build upon this initial study to further examine the feasibility and effectiveness of this approach.