The Big Duplicity of Big Data

2015 ◽  
Vol 8 (4) ◽  
pp. 509-515 ◽  
Author(s):  
Thomas J. Whelan ◽  
Amy M. DuVernet

As discussed in Guzzo, Fink, King, Tonidandel, and Landis's (2015) focal article, big data is more than a passing trend in business analytics. The plethora of information available presents a host of interesting challenges and opportunities for industrial and organizational (I-O) psychology. When utilizing big data sources to make organizational decisions, our field has a considerable amount to offer in the form of advice on how big data metrics are derived and used and on the potential threats to validity that their use presents. We’ve all heard the axiom, “garbage in, garbage out,” and that applies regardless of whether the scale is a small wastebasket or a dump truck.

2021 ◽  
Author(s):  
Heinrich Peters ◽  
Zachariah Marrero ◽  
Samuel D. Gosling

As human interactions have shifted to virtual spaces and as sensing systems have become more affordable, an increasing share of peoples’ everyday lives can be captured in real time. The availability of such fine-grained behavioral data from billions of people has the potential to enable great leaps in our understanding of human behavior. However, such data also pose challenges to engineers and behavioral scientists alike, requiring a specialized set of tools and methodologies to generate psychologically relevant insights.In particular, researchers may need to utilize machine learning techniques to extract information from unstructured or semi-structured data, reduce high-dimensional data to a smaller number of variables, and efficiently deal with extremely large sample sizes. Such procedures can be computationally expensive, requiring researchers to balance computation time with processing power and memory capacity. Whereas modelling procedures on small datasets will usually take mere moments to execute, applying modeling procedures to big data can take much longer with typical execution times spanning hours, days, or even weeks depending on the complexity of the problem and the resources available. Seemingly subtle decisions regarding preprocessing and analytic strategy can end up having a huge impact on the viability of executing analyses within a reasonable timeframe. Consequently, researchers must anticipate potential pitfalls regarding the interplay of their analytic strategy with memory and computational constraints.Many researchers who are interested in using “big data” report having problems learning about new analytic methods or software, finding collaborators with the right skills and knowledge, and getting access to commercial or proprietary data for their research (Metzler et al. 2016). This chapter aims to serve as a practical introduction for psychologists who want to use large datasets and datasets from non-traditional data sources in their research (i.e., data not generated in the lab or through conventional surveys). First, we discuss the concept of big data and review some of the theoretical challenges and opportunities that arise with the availability of ever larger amounts of data. Second, we discuss practical implications and best practices with respect to data collection, data storage, data processing, and data modelling for psychological research in the age of big data.


2018 ◽  
Vol 26 (3) ◽  
pp. 382-405 ◽  
Author(s):  
Ceylan Onay ◽  
Elif Öztürk

Purpose This paper aims to survey the credit scoring literature in the past 41 years (1976-2017) and presents a research agenda that addresses the challenges and opportunities Big Data bring to credit scoring. Design/methodology/approach Content analysis methodology is used to analyze 258 peer-reviewed academic papers from 147 journals from two comprehensive academic research databases to identify their research themes and detect trends and changes in the credit scoring literature according to content characteristics. Findings The authors find that credit scoring is going through a quantitative transformation, where data-centric underwriting approaches, usage of non-traditional data sources in credit scoring and their regulatory aspects are the up-coming avenues for further research. Practical implications The paper’s findings highlight the perils and benefits of using Big Data in credit scoring algorithms for corporates, governments and non-profit actors who develop and use new technologies in credit scoring. Originality/value This paper presents greater insight on how Big Data challenges traditional credit scoring models and addresses the need to develop new credit models that identify new and secure data sources and convert them to useful insights that are in compliance with regulations.


2020 ◽  
Author(s):  
Bankole Olatosi ◽  
Jiajia Zhang ◽  
Sharon Weissman ◽  
Zhenlong Li ◽  
Jianjun Hu ◽  
...  

BACKGROUND The Coronavirus Disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) remains a serious global pandemic. Currently, all age groups are at risk for infection but the elderly and persons with underlying health conditions are at higher risk of severe complications. In the United States (US), the pandemic curve is rapidly changing with over 6,786,352 cases and 199,024 deaths reported. South Carolina (SC) as of 9/21/2020 reported 138,624 cases and 3,212 deaths across the state. OBJECTIVE The growing availability of COVID-19 data provides a basis for deploying Big Data science to leverage multitudinal and multimodal data sources for incremental learning. Doing this requires the acquisition and collation of multiple data sources at the individual and county level. METHODS The population for the comprehensive database comes from statewide COVID-19 testing surveillance data (March 2020- till present) for all SC COVID-19 patients (N≈140,000). This project will 1) connect multiple partner data sources for prediction and intelligence gathering, 2) build a REDCap database that links de-identified multitudinal and multimodal data sources useful for machine learning and deep learning algorithms to enable further studies. Additional data will include hospital based COVID-19 patient registries, Health Sciences South Carolina (HSSC) data, data from the office of Revenue and Fiscal Affairs (RFA), and Area Health Resource Files (AHRF). RESULTS The project was funded as of June 2020 by the National Institutes for Health. CONCLUSIONS The development of such a linked and integrated database will allow for the identification of important predictors of short- and long-term clinical outcomes for SC COVID-19 patients using data science.


Author(s):  
Philip Habel ◽  
Yannis Theocharis

In the last decade, big data, and social media in particular, have seen increased popularity among citizens, organizations, politicians, and other elites—which in turn has created new and promising avenues for scholars studying long-standing questions of communication flows and influence. Studies of social media play a prominent role in our evolving understanding of the supply and demand sides of the political process, including the novel strategies adopted by elites to persuade and mobilize publics, as well as the ways in which citizens react, interact with elites and others, and utilize platforms to persuade audiences. While recognizing some challenges, this chapter speaks to the myriad of opportunities that social media data afford for evaluating questions of mobilization and persuasion, ultimately bringing us closer to a more complete understanding Lasswell’s (1948) famous maxim: “who, says what, in which channel, to whom, [and] with what effect.”


Author(s):  
Marco Angrisani ◽  
Anya Samek ◽  
Arie Kapteyn

The number of data sources available for academic research on retirement economics and policy has increased rapidly in the past two decades. Data quality and comparability across studies have also improved considerably, with survey questionnaires progressively converging towards common ways of eliciting the same measurable concepts. Probability-based Internet panels have become a more accepted and recognized tool to obtain research data, allowing for fast, flexible, and cost-effective data collection compared to more traditional modes such as in-person and phone interviews. In an era of big data, academic research has also increasingly been able to access administrative records (e.g., Kostøl and Mogstad, 2014; Cesarini et al., 2016), private-sector financial records (e.g., Gelman et al., 2014), and administrative data married with surveys (Ameriks et al., 2020), to answer questions that could not be successfully tackled otherwise.


2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


Omega ◽  
2021 ◽  
pp. 102479
Author(s):  
Zhongbao Zhou ◽  
Meng Gao ◽  
Helu Xiao ◽  
Rui Wang ◽  
Wenbin Liu

Sign in / Sign up

Export Citation Format

Share Document