Automated Domain Bias Correction and Its Application in Text-Based Personality Analysis

Personality prediction based on textual data is one topic gaining attention recently for its potential in large-scale personalized applications such as social media-based marketing. However, when applying this technology in real-world applications, users often encounter situations in which the personality traits derived from different sources (e.g., social media posts versus emails) are inconsistent. Varying results for the same individual renders the technology ineffective and untrustworthy. In this paper, we demonstrate the impact of domain differences in automated text-based personality prediction. We also propose different approaches for domain error correction to meet different needs: (a) single or multi-domain correction and (b) outcome-based or input feature-based error correction. We conduct comprehensive experiments to evaluate the effectiveness of these methods. Our findings demonstrate a significant improvement of prediction accuracy with the proposed methods. (e.g., 20–30% relative error reduction using outcome-based error correction or 48% increase of F1 score using feature-based error correction).

Download Full-text

The Impact of Social vs. Nonsocial Referring Channels on Online News Consumption

Management Science ◽

10.1287/mnsc.2020.3637 ◽

2020 ◽

Author(s):

Sagit Bar-Gill ◽

Yael Inbar ◽

Shachar Reichman

Keyword(s):

Social Media ◽

Large Scale ◽

Online News ◽

Clickstream Data ◽

News Consumption ◽

Online Newspaper ◽

News Website ◽

The Impact ◽

News Sharing ◽

Sharing Behavior

The digitization of news markets has created a key role for online referring channels. This research combines field and laboratory experiments and analysis of large-scale clickstream data to study the effects of social versus nonsocial referral sources on news consumption in a referred news website visit. We theorize that referrer-specific browsing modes and referrer-induced news consumption thresholds interact to impact news consumption in referred visits to an online newspaper and that news sharing motivations invoked by the referral source impact sharing behavior in these referred visits. We find that social media referrals promote directed news consumption—visits with fewer articles, shorter durations, yet higher reading completion rates—compared with nonsocial referrals. Furthermore, social referrals invoke weaker informational sharing motivations relative to nonsocial referrals, thus leading to a lower news sharing propensity relative to nonsocial referrals. The results highlight how news consumption changes when an increasing amount of traffic is referred by social media, provide insights applicable to news outlets’ strategies, and speak to ongoing debates regarding biases arising from social media’s growing importance as an avenue for news consumption. This paper was accepted by Anandhi Bharadwaj, information systems.

Download Full-text

Social Media and Journalism Ethics in Nigeria: A Study of Journalists in Kwara State of Nigeria

International Journal of Social Science and Human Research ◽

10.47191/ijsshr/v4-i3-20 ◽

2021 ◽

Vol 04 (03) ◽

Author(s):

Bernice Titilola Gbadeyan ◽

Keyword(s):

Social Media ◽

New Media ◽

Large Scale ◽

Code Of Ethics ◽

Survey Method ◽

Journalism Ethics ◽

Broadcast Media ◽

The North ◽

Use Of Social Media ◽

The Impact

Journalism is a term that has been used to describe the act of gathering and reporting news, either through the print media which includes newspaper, magazine or through the broadcast media to mention television, radio broadcasting system and recently journalism has been extended throughout the world through unrestricted use of social media, whereby the act of gathering and disseminating of news is done without restraint. Conversely, one important thing to note about journalism is the ethics that enhance the profession, its notes worthy to know that any information that is disseminated via any media should be ethically standard. The new media has on a large scale given the opportunity to a whole large number of people to practice journalism without them knowing the ethics that guide the profession, which is affecting the dynamics of the profession. Therefore this study is based on assessing the impact of a new communication system on journalism; whether social media promote the ethics of journalism profession and to know if social media journalists are in compliance with the journalism code of ethics in their dissemination of news and information. In this research, the survey method was adopted and the north-central geo-political zone, Kwara state to be précised was selected for the study.

Download Full-text

‘Digital Citizens’: Political Participation of Russian Youth Online

Russian Foundation for Basic Research Journal Humanities and social sciences ◽

10.22204/2587-8956-2020-102-05-143-155 ◽

2021 ◽

pp. 143-155

Author(s):

Roman Pyrma

Keyword(s):

Social Media ◽

Political Participation ◽

Large Scale ◽

Online Survey ◽

Social Activity ◽

Political Activity ◽

Political Aspect ◽

The Status ◽

Combination Of Methods ◽

The Impact

The study contributes to defining the impact of digital communication on civic and political participation, explaining how social media mediate public activism. Based on the concept of the ‘digital citizenship’ the paper reveals the political aspect of the public activism of Russian youth online. The empirical model is based on a combination of methods and procedures of applied research in order to reveal the details of civil and political participation, and protest activism of youth online. The research model includes analysis of social media and a large-scale online survey of the younger audience. Based on the analysis of social media information flows, the paper states the prevalence of the youth’s civic participation over political participation, as well as the fact that the dynamics of social activity depend on the events and the current agenda. The authors describe the level of civic and political activity of youth online based on sociological data. They also divide the audience of the protest theatre according to the following models: leaders, activists, followers, and spectators. In general, the study reveals the status and details of the younger generation’s communication activity online, where communities establish and implications of linking actions appear.

Download Full-text

New Approach of Measuring Human Personality Traits Using Ontology-Based Model from Social Media Data

Information ◽

10.3390/info12100413 ◽

2021 ◽

Vol 12 (10) ◽

pp. 413

Author(s):

Andry Alamsyah ◽

Nidya Dudija ◽

Sri Widiyanesti

Keyword(s):

Social Media ◽

Large Scale ◽

Public Involvement ◽

Real Life ◽

Big Five Personality ◽

Sorting Algorithm ◽

Personality Measurement ◽

Human Personality ◽

Textual Data ◽

N Gram

Human online activities leave digital traces that provide a perfect opportunity to understand their behavior better. Social media is an excellent place to spark conversations or state opinions. Thus, it generates large-scale textual data. In this paper, we harness those data to support the effort of personality measurement. Our first contribution is to develop the Big Five personality trait-based model to detect human personalities from their textual data in the Indonesian language. The model uses an ontology approach instead of the more famous machine learning model. The former better captures the meaning and intention of phrases and words in the domain of human personality. The legacy and more thorough ways to assess nature are by doing interviews or by giving questionnaires. Still, there are many real-life applications where we need to possess an alternative method, which is cheaper and faster than the legacy methodology to select individuals based on their personality. The second contribution is to support the model implementation by building a personality measurement platform. We use two distinct features for the model: an n-gram sorting algorithm to parse the textual data and a crowdsourcing mechanism that facilitates public involvement contributing to the ontology corpus addition and filtering.

Download Full-text

Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining

Frontiers in Big Data ◽

10.3389/fdata.2020.594302 ◽

2021 ◽

Vol 3 ◽

Author(s):

Kijung Shin ◽

Bryan Hooi ◽

Jisu Kim ◽

Christos Faloutsos

Keyword(s):

Social Media ◽

Real World ◽

Large Scale ◽

Detection Method ◽

Tensor Decomposition ◽

Main Memory ◽

Rating Data ◽

Network Attacks ◽

Real World Applications ◽

Memory Efficient

How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods suffer from low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,561× less memory and handles 1,000× larger data (2.6TB), (2) Fast: up to 7× faster due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately.

Download Full-text

Visualising the Social Media Conversations of a National Information Technology Professional Association

International Journal of Human Capital and Information Technology Professionals ◽

10.4018/ijhcitp.2019010103 ◽

2019 ◽

Vol 10 (1) ◽

pp. 38-54

Author(s):

Stuart Palmer

Keyword(s):

Social Media ◽

Large Scale ◽

Professional Associations ◽

Media System ◽

The Social ◽

Technology Professional ◽

Information And Communication ◽

Media Systems ◽

National Information ◽

The Impact

Social media systems are important for professional associations (PAs), providing new ways for them to interact with their members and stakeholders. Evaluation of the impact of social media is not straightforward. Here text analytics, specifically multidimensional scaling visualisation, is proposed as an approach for the characterisation of the large scale ‘conversations' occurring between an information and communication technology PA and its stakeholders via the Twitter social media system. In the case presented, there was found to be a significant level of congruence between the corresponding visualisations of tweets from the PA, and tweets to/about the PA, although differences were also observed. The new method proposed and piloted here offers a way for organisations to conceptualise, identify, capture and visualise the large-scale, ephemeral, text conversations about themselves on Twitter, and to assist them with key strategic uses of social media.

Download Full-text

Potential follow-up increases private contributions to public goods

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1524899113 ◽

2016 ◽

Vol 113 (19) ◽

pp. 5218-5220 ◽

Cited By ~ 8

Author(s):

Todd Rogers ◽

John Ternovski ◽

Erez Yoeli

Keyword(s):

Social Media ◽

Public Goods ◽

Field Experiment ◽

Large Scale ◽

Scale Field ◽

Large Scale Field ◽

The Impact ◽

Made In ◽

Get Out The Vote

People contribute more to public goods when their contributions are made more observable to others. We report an intervention that subtly increases the observability of public goods contributions when people are solicited privately and impersonally (e.g., mail, email, social media). This intervention is tested in a large-scale field experiment (n = 770,946) in which people are encouraged to vote through get-out-the-vote letters. We vary whether the letters include the message, “We may call you after the election to ask about your voting experience.” Increasing the perceived observability of whether people vote by including that message increased the impact of the get-out-the-vote letters by more than the entire effect of a typical get-out-the-vote letter. This technique for increasing perceived observability can be replicated whenever public goods solicitations are made in private.

Download Full-text

Assessing the Impact of the COVID-19 Pandemic in Spain: Large-Scale, Online, Self-Reported Population Survey

Journal of Medical Internet Research ◽

10.2196/21319 ◽

2020 ◽

Vol 22 (9) ◽

pp. e21319 ◽

Cited By ~ 2

Author(s):

Nuria Oliver ◽

Xavier Barber ◽

Kirsten Roomp ◽

Kristof Roomp

Keyword(s):

Social Media ◽

Social Behavior ◽

Economic Impact ◽

Large Scale ◽

Online Survey ◽

Spanish Population ◽

Survey Method ◽

Economic Damage ◽

Large Sample ◽

The Impact

Background Spain has been one of the countries most impacted by the COVID-19 pandemic. Since the first confirmed case was reported on January 31, 2020, there have been over 405,000 cases and 28,000 deaths in Spain. The economic and social impact is without precedent. Thus, it is important to quickly assess the situation and perception of the population. Large-scale online surveys have been shown to be an effective tool for this purpose. Objective We aim to assess the situation and perception of the Spanish population in four key areas related to the COVID-19 pandemic: social contact behavior during confinement, personal economic impact, labor situation, and health status. Methods We obtained a large sample using an online survey with 24 questions related to COVID-19 in the week of March 28-April 2, 2020, during the peak of the first wave of COVID-19 in Spain. The self-selection online survey method of nonprobability sampling was used to recruit 156,614 participants via social media posts that targeted the general adult population (age >18 years). Given such a large sample, the 95% CI was ±0.843 for all reported proportions. Results Regarding social behavior during confinement, participants mainly left their homes to satisfy basic needs. We found several statistically significant differences in social behavior across genders and age groups. The population’s willingness to comply with the confinement measures is evident. From the survey answers, we identified a significant adverse economic impact of the pandemic on those working in small businesses and a negative correlation between economic damage and willingness to stay in confinement. The survey revealed that close contacts play an important role in the transmission of the disease, and 28% of the participants lacked the necessary resources to properly isolate themselves. We also identified a significant lack of testing, with only 1% of the population tested and 6% of respondents unable to be tested despite their doctor’s recommendation. We developed a generalized linear model to identify the variables that were correlated with a positive SARS-CoV-2 test result. Using this model, we estimated an average of 5% for SARS-CoV-2 prevalence in the Spanish population during the time of the study. A seroprevalence study carried out later by the Spanish Ministry of Health reported a similar level of disease prevalence (5%). Conclusions Large-scale online population surveys, distributed via social media and online messaging platforms, can be an effective, cheap, and fast tool to assess the impact and prevalence of an infectious disease in the context of a pandemic, particularly when there is a scarcity of official data and limited testing capacity.

Download Full-text

Assessing the Impact of the COVID-19 Pandemic in Spain: Large-Scale, Online, Self-Reported Population Survey (Preprint)

10.2196/preprints.21319 ◽

2020 ◽

Author(s):

Nuria Oliver ◽

Xavier Barber ◽

Kirsten Roomp ◽

Kristof Roomp

Keyword(s):

Social Media ◽

Social Behavior ◽

Economic Impact ◽

Large Scale ◽

Online Survey ◽

Spanish Population ◽

Survey Method ◽

Economic Damage ◽

Large Sample ◽

The Impact

BACKGROUND Spain has been one of the countries most impacted by the COVID-19 pandemic. Since the first confirmed case was reported on January 31, 2020, there have been over 405,000 cases and 28,000 deaths in Spain. The economic and social impact is without precedent. Thus, it is important to quickly assess the situation and perception of the population. Large-scale online surveys have been shown to be an effective tool for this purpose. OBJECTIVE We aim to assess the situation and perception of the Spanish population in four key areas related to the COVID-19 pandemic: social contact behavior during confinement, personal economic impact, labor situation, and health status. METHODS We obtained a large sample using an online survey with 24 questions related to COVID-19 in the week of March 28-April 2, 2020, during the peak of the first wave of COVID-19 in Spain. The self-selection online survey method of nonprobability sampling was used to recruit 156,614 participants via social media posts that targeted the general adult population (age >18 years). Given such a large sample, the 95% CI was ±0.843 for all reported proportions. RESULTS Regarding social behavior during confinement, participants mainly left their homes to satisfy basic needs. We found several statistically significant differences in social behavior across genders and age groups. The population’s willingness to comply with the confinement measures is evident. From the survey answers, we identified a significant adverse economic impact of the pandemic on those working in small businesses and a negative correlation between economic damage and willingness to stay in confinement. The survey revealed that close contacts play an important role in the transmission of the disease, and 28% of the participants lacked the necessary resources to properly isolate themselves. We also identified a significant lack of testing, with only 1% of the population tested and 6% of respondents unable to be tested despite their doctor’s recommendation. We developed a generalized linear model to identify the variables that were correlated with a positive SARS-CoV-2 test result. Using this model, we estimated an average of 5% for SARS-CoV-2 prevalence in the Spanish population during the time of the study. A seroprevalence study carried out later by the Spanish Ministry of Health reported a similar level of disease prevalence (5%). CONCLUSIONS Large-scale online population surveys, distributed via social media and online messaging platforms, can be an effective, cheap, and fast tool to assess the impact and prevalence of an infectious disease in the context of a pandemic, particularly when there is a scarcity of official data and limited testing capacity.

Download Full-text

Non-Spatial Data towards Spatially Located News about COVID-19: A Semi-Automated Aggregator of Pandemic Data from (Social) Media within the Olomouc Region, Czechia

Data ◽

10.3390/data5030076 ◽

2020 ◽

Vol 5 (3) ◽

pp. 76

Author(s):

Jakub Konicek ◽

Rostislav Netek ◽

Tomas Burian ◽

Tereza Novakova ◽

Jakub Kaplan

Keyword(s):

Social Media ◽

Czech Republic ◽

Spatial Data ◽

Ad Hoc ◽

The Czech Republic ◽

Web Map ◽

Map Solution ◽

The Impact ◽

Official Sources ◽

Different Sources

The article describes the process of aggregation of media-based data about the coronavirus pandemic in the Olomouc region, the Czech Republic. Originally non-spatially located news from different sources and various platforms (government, social media, news portals) were automatically aggregated into a centralized database. The application “COVID-map” is an interactive web map solution which visualizes records from the database in a spatial way. The COVID-map has been developed within the Ad hoc online hackathon as an academic project at the Department of Geoinformatics, Palacký University Olomouc, Czech Republic. Alongside spatially localized data, the map application collects statistical data from official sources e.g., from the governmental crisis management office. The impact of the application was immediate. Within a few days after the launch, tens of thousands users per day visited the COVID-map. It has been published by regional and national media. The COVID-map solution could be considered as a suitable implementation of the correctly used cartographical method for the example of the coronavirus pandemic.

Download Full-text