scholarly journals A Product Feature Inference Model for Mining Implicit Customer Preferences Within Large Scale Social Media Networks

Author(s):  
Suppawong Tuarob ◽  
Conrad S. Tucker

The acquisition and mining of product feature data from online sources such as customer review websites and large scale social media networks is an emerging area of research. In many existing design methodologies that acquire product feature preferences form online sources, the underlying assumption is that product features expressed by customers are explicitly stated and readily observable to be mined using product feature extraction tools. In many scenarios however, product feature preferences expressed by customers are implicit in nature and do not directly map to engineering design targets. For example, a customer may implicitly state “wow I have to squint to read this on the screen”, when the explicit product feature may be a larger screen. The authors of this work propose an inference model that automatically assigns the most probable explicit product feature desired by a customer, given an implicit preference expressed. The algorithm iteratively refines its inference model by presenting a hypothesis and using ground truth data, determining its statistical validity. A case study involving smartphone product features expressed through Twitter networks is presented to demonstrate the effectiveness of the proposed methodology.

2015 ◽  
Vol 137 (7) ◽  
Author(s):  
Suppawong Tuarob ◽  
Conrad S. Tucker

Lead users play a vital role in next generation product development, as they help designers discover relevant product feature preferences months or even years before they are desired by the general customer base. Existing design methodologies proposed to extract lead user preferences are typically constrained by temporal, geographic, size, and heterogeneity limitations. To mitigate these challenges, the authors of this work propose a set of mathematical models that mine social media networks for lead users and the product features that they express relating to specific products. The authors hypothesize that: (i) lead users are discoverable from large scale social media networks and (ii) product feature preferences, mined from lead user social media data, represent product features that do not currently exist in product offerings but will be desired in future product launches. An automated approach to lead user product feature identification is proposed to identify latent features (product features unknown to the public) from social media data. These latent features then serve as the key to discovering innovative users from the ever increasing pool of social media users. The authors collect 2.1 × 109 social media messages in the United States during a period of 31 months (from March 2011 to September 2013) in order to determine whether lead user preferences are discoverable and relevant to next generation cell phone designs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


10.2196/14986 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e14986 ◽  
Author(s):  
Ashlynn R Daughton ◽  
Rumi Chunara ◽  
Michael J Paul

Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.


Author(s):  
Shimei Pan ◽  
Tao Ding

Automated representation learning is behind many recent success stories in machine learning. It is often used to transfer knowledge learned from a large dataset (e.g., raw text) to tasks for which only a small number of training examples are available. In this paper, we review recent advance in learning to represent social media users in low-dimensional embeddings. The technology is critical for creating high performance social media-based human traits and behavior models since the ground truth for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this survey, we review typical methods for learning a unified user embeddings from heterogeneous user data (e.g., combines social media texts with images to learn a unified user representation). Finally we point out some current issues and future directions.


2021 ◽  
Author(s):  
Koustuv Saha ◽  
Asra Yousuf ◽  
Ryan L. Boyd ◽  
James W. Pennebaker ◽  
Munmun Choudhury

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable “passive sensor” of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.


2020 ◽  
Author(s):  
Vishrawas Gopalakrishnan ◽  
Sayali Pethe ◽  
Sarah Kefayati ◽  
Raman Srinivasan ◽  
Paul Hake ◽  
...  

AbstractMultiple efforts to model the epidemiology of SARS-CoV-2 have recently been launched in support of public health response at the national, state, and county levels. While the pandemic is global, the dynamics of this infectious disease varies with geography, local policies, and local variations in demographics. An underlying assumption of most infectious disease compartment modeling is that of a well mixed population at the resolution of the areas being modeled. The implicit need to model at fine spatial resolution is impeded by the quality of ground truth data for fine scale administrative subdivisions. To understand the trade-offs and benefits of such modeling as a function of scale, we compare the predictive performance of a SARS-CoV-2 modeling at the county, county cluster, and state level for the entire United States. Our results demonstrate that accurate prediction at the county level requires hyper-local modeling with county resolution. State level modeling does not accurately predict community spread in smaller sub-regions because state populations are not well mixed, resulting in large prediction errors. As an important use case, leveraging high resolution modeling with public health data and admissions data from Hillsborough County Florida, we performed weekly forecasts of both hospital admission and ICU bed demand for the county. The repeated forecasts between March and August 2020 were used to develop accurate resource allocation plans for Tampa General Hospital.2010 MSC92-D30, 91-C20


Author(s):  
Marian Muste ◽  
Ton Hoitink

With a continuous global increase in flood frequency and intensity, there is an immediate need for new science-based solutions for flood mitigation, resilience, and adaptation that can be quickly deployed in any flood-prone area. An integral part of these solutions is the availability of river discharge measurements delivered in real time with high spatiotemporal density and over large-scale areas. Stream stages and the associated discharges are the most perceivable variables of the water cycle and the ones that eventually determine the levels of hazard during floods. Consequently, the availability of discharge records (a.k.a. streamflows) is paramount for flood-risk management because they provide actionable information for organizing the activities before, during, and after floods, and they supply the data for planning and designing floodplain infrastructure. Moreover, the discharge records represent the ground-truth data for developing and continuously improving the accuracy of the hydrologic models used for forecasting streamflows. Acquiring discharge data for streams is critically important not only for flood forecasting and monitoring but also for many other practical uses, such as monitoring water abstractions for supporting decisions in various socioeconomic activities (from agriculture to industry, transportation, and recreation) and for ensuring healthy ecological flows. All these activities require knowledge of past, current, and future flows in rivers and streams. Given its importance, an ability to measure the flow in channels has preoccupied water users for millennia. Starting with the simplest volumetric methods to estimate flows, the measurement of discharge has evolved through continued innovation to sophisticated methods so that today we can continuously acquire and communicate the data in real time. There is no essential difference between the instruments and methods used to acquire streamflow data during normal conditions versus during floods. The measurements during floods are, however, complex, hazardous, and of limited accuracy compared with those acquired during normal flows. The essential differences in the configuration and operation of the instruments and methods for discharge estimation stem from the type of measurements they acquire—that is, discrete and autonomous measurements (i.e., measurements that can be taken any time any place) and those acquired continuously (i.e., estimates based on indirect methods developed for fixed locations). Regardless of the measurement situation and approach, the main concern of the data providers for flooding (as well as for other areas of water resource management) is the timely delivery of accurate discharge data at flood-prone locations across river basins.


Author(s):  
Yael Levaot ◽  
Talya Greene ◽  
Yuval Palgi

ABSTRACT Objectives: Social media provides an opportunity to engage in social contact and to give and receive help by means of online social networks. Social support following trauma exposure, even in a virtual community, may reduce feelings of helplessness and isolation, and, therefore, reduce posttraumatic stress symptoms (PTS), and increase posttraumatic growth (PTG). The current study aimed to assess whether giving and/or receiving offers of help by means of social media following large community fires predicted PTS and/or PTG. Methods: A convenience sample of 212 adults living in communities that were affected by large-scale community fires in Israel (November 2016) completed questionnaires on giving and receiving offers of help by means of social media within 1 mo of the fire (W1), and the PTSD checklist for DSM-5 (PCL-5) and PTG questionnaire (PTGI-SF), 4 mo after the fire (W2). Results: Regression analyses showed that, after controlling for age, gender, and distance from fire, offering help by means of social media predicted higher PTG (β = 0.22; t = 3.18; P < 0.01), as did receiving offers of help by means of social media (β = 0.18; t = 2.64; P < 0.01). There were no significant associations between giving and/or receiving offers of help and PTS. Conclusions: Connecting people to social media networks may help in promoting posttraumatic growth, although might not impact on posttraumatic symptoms. This is one of the first studies to highlight empirically the advantages of social media in the aftermath of trauma exposure.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Zhecheng Qiang ◽  
Eduardo L. Pasiliao ◽  
Qipeng P. Zheng

AbstractSocial networks have become widely used platforms for their users to share information. Learning the information diffusion process is essential for successful applications of viral marketing and cyber security in social media networks. This paper proposes two learning models that are aimed at learning person-to-person influence in information diffusion from historical cascades based on the threshold propagation model. The first model is based on the linear threshold propagation model. In addition, by considering multi-step information propagation in one time period, this paper proposes a learning model for multi-step diffusion influence between pairs of users based on the idea of random walk. Mixed integer programs (MIP) have been used to learn these models by minimizing the prediction errors, where decision variables are estimations of the diffusion influence between pairs of users. For large-scale networks, this paper develops approximate methods for those learning models by using artificial neural networks to learn the pairwise influence. Extensive computational experiments using both synthetic data and real data have been conducted to demonstrate the effectiveness of the proposed models and methods.


Sign in / Sign up

Export Citation Format

Share Document