scholarly journals The Big Data toolkit for Psychologists: Data Sources and Methodologies

2021 ◽  
Author(s):  
Heinrich Peters ◽  
Zachariah Marrero ◽  
Samuel D. Gosling

As human interactions have shifted to virtual spaces and as sensing systems have become more affordable, an increasing share of peoples’ everyday lives can be captured in real time. The availability of such fine-grained behavioral data from billions of people has the potential to enable great leaps in our understanding of human behavior. However, such data also pose challenges to engineers and behavioral scientists alike, requiring a specialized set of tools and methodologies to generate psychologically relevant insights.In particular, researchers may need to utilize machine learning techniques to extract information from unstructured or semi-structured data, reduce high-dimensional data to a smaller number of variables, and efficiently deal with extremely large sample sizes. Such procedures can be computationally expensive, requiring researchers to balance computation time with processing power and memory capacity. Whereas modelling procedures on small datasets will usually take mere moments to execute, applying modeling procedures to big data can take much longer with typical execution times spanning hours, days, or even weeks depending on the complexity of the problem and the resources available. Seemingly subtle decisions regarding preprocessing and analytic strategy can end up having a huge impact on the viability of executing analyses within a reasonable timeframe. Consequently, researchers must anticipate potential pitfalls regarding the interplay of their analytic strategy with memory and computational constraints.Many researchers who are interested in using “big data” report having problems learning about new analytic methods or software, finding collaborators with the right skills and knowledge, and getting access to commercial or proprietary data for their research (Metzler et al. 2016). This chapter aims to serve as a practical introduction for psychologists who want to use large datasets and datasets from non-traditional data sources in their research (i.e., data not generated in the lab or through conventional surveys). First, we discuss the concept of big data and review some of the theoretical challenges and opportunities that arise with the availability of ever larger amounts of data. Second, we discuss practical implications and best practices with respect to data collection, data storage, data processing, and data modelling for psychological research in the age of big data.

Computers ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 73 ◽  
Author(s):  
Rossi ◽  
Rubattino ◽  
Viscusi

Big data and analytics have received great attention from practitioners and academics, nowadays representing a key resource for the renewed interest in artificial intelligence, especially for machine learning techniques. In this article we explore the use of big data and analytics by different types of organizations, from various countries and industries, including the ones with a limited size and capabilities compared to corporations or new ventures. In particular, we are interested in organizations where the exploitation of big data and analytics may have social value in terms of, e.g., public and personal safety. Hence, this article discusses the results of two multi-industry and multi-country surveys carried out on a sample of public and private organizations. The results show a low rate of utilization of the data collected due to, among other issues, privacy and security, as well as the lack of staff trained in data analysis. Also, the two surveys show a challenge to reach an appropriate level of effectiveness in the use of big data and analytics, due to the shortage of the right tools and, again, capabilities, often related to a low rate of digital transformation.


Transformation presents the second step in the ETL process that is responsible for extracting, transforming and loading data into a data warehouse. The role of transformation is to set up several operations to clean, to format and to unify types and data coming from multiple and different data sources. The goal is to get data to conform to the schema of the data warehouse to avoid any ambiguity problems during the data storage and analytical operations. Transforming data coming from structured, semi-structured and unstructured data sources need two levels of treatments: the first one is transformation schema to schema to get a unified schema for all selected data sources and the second treatment is transformation data to data to unify all types and data gathered. To ensure the setting up of these steps we propose in this paper a process switch from one database schema to another as a part of transformation schema to schema, and a meta-model based on MDA approach to describe the main operations of transformation data to data. The results of our transformations propose a data loading in one of the four schemas of NoSQL to best meet the constraints and requirements of Big Data.


2014 ◽  
Vol 23 (01) ◽  
pp. 42-47 ◽  
Author(s):  
J. H. Holmes ◽  
J. Sun ◽  
N. Peek

Summary Objectives: To review technical and methodological challenges for big data research in biomedicine and health. Methods: We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. Results: The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. Conclusions: The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.


2018 ◽  
Vol 63 (5) ◽  
pp. 560-583 ◽  
Author(s):  
Arielle Hesse ◽  
Leland Glenna ◽  
Clare Hinrichs ◽  
Robert Chiles ◽  
Carolyn Sachs

This article examines the developments that have motivated this special issue on Qualitative Research Ethics in the Big Data Era. The article offers a broad overview of many pressing challenges and opportunities that the Big Data era raises particularly for qualitative research. Big Data has introduced to the social sciences new data sources, new research methods, new researchers, and new forms of data storage that have immediate and potential effects on the ethics and practice of qualitative research. Drawing from a literature review and insights gathered at a National Science Foundation-funded workshop in 2016, we present five principles for qualitative researchers and their institutions to consider in navigating these emerging research landscapes. These principles include (a) valuing methodological diversity; (b) encouraging research that accounts for and retains context, specificity, and marginalized and overlooked populations; (c) pushing beyond legal concerns to address often messy ethical dilemmas; (d) attending to regional and disciplinary differences; and (e) considering the entire lifecycle of research, including the data afterlife in archives or in open-data facilities.


2015 ◽  
Vol 27 (6) ◽  
pp. 515-528 ◽  
Author(s):  
Ivana Šemanjski

Travel time forecasting is an interesting topic for many ITS services. Increased availability of data collection sensors increases the availability of the predictor variables but also highlights the high processing issues related to this big data availability. In this paper we aimed to analyse the potential of big data and supervised machine learning techniques in effectively forecasting travel times. For this purpose we used fused data from three data sources (Global Positioning System vehicles tracks, road network infrastructure data and meteorological data) and four machine learning techniques (k-nearest neighbours, support vector machines, boosting trees and random forest). To evaluate the forecasting results we compared them in-between different road classes in the context of absolute values, measured in minutes, and the mean squared percentage error. For the road classes with the high average speed and long road segments, machine learning techniques forecasted travel times with small relative error, while for the road classes with the small average speeds and segment lengths this was a more demanding task. All three data sources were proven itself to have a high impact on the travel time forecast accuracy and the best results (taking into account all road classes) were achieved for the k-nearest neighbours and random forest techniques.


2012 ◽  
Vol 13 (03n04) ◽  
pp. 1250009 ◽  
Author(s):  
CHANGQING JI ◽  
YU LI ◽  
WENMING QIU ◽  
YINGWEI JIN ◽  
YUJIE XU ◽  
...  

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.


2015 ◽  
Vol 8 (4) ◽  
pp. 509-515 ◽  
Author(s):  
Thomas J. Whelan ◽  
Amy M. DuVernet

As discussed in Guzzo, Fink, King, Tonidandel, and Landis's (2015) focal article, big data is more than a passing trend in business analytics. The plethora of information available presents a host of interesting challenges and opportunities for industrial and organizational (I-O) psychology. When utilizing big data sources to make organizational decisions, our field has a considerable amount to offer in the form of advice on how big data metrics are derived and used and on the potential threats to validity that their use presents. We’ve all heard the axiom, “garbage in, garbage out,” and that applies regardless of whether the scale is a small wastebasket or a dump truck.


2018 ◽  
Vol 14 (2) ◽  
pp. 127-138
Author(s):  
Asif Banka ◽  
Roohie Mir

The advancements in modern day computing and architectures focus on harnessing parallelism and achieve high performance computing resulting in generation of massive amounts of data. The information produced needs to be represented and analyzed to address various challenges in technology and business domains. Radical expansion and integration of digital devices, networking, data storage and computation systems are generating more data than ever. Data sets are massive and complex, hence traditional learning methods fail to rescue the researchers and have in turn resulted in adoption of machine learning techniques to provide possible solutions to mine the information hidden in unseen data. Interestingly, deep learning finds its place in big data applications. One of major advantages of deep learning is that it is not human engineered. In this paper, we look at various machine learning algorithms that have already been applied to big data related problems and have shown promising results. We also look at deep learning as a rescue and solution to big data issues that are not efficiently addressed using traditional methods. Deep learning is finding its place in most applications where we come across critical and dominating 5Vs of big data and is expected to perform better.


Sign in / Sign up

Export Citation Format

Share Document