Crud (Re)Defined

The idea that in behavioral research everything correlates with everything else was a niche area of the scientific literature for more than half a century. With the increasing availability of large data sets in psychology, the “crud” factor has, however, become more relevant than ever before. When referenced in empirical work, it is often used by researchers to discount minute—but statistically significant—effects that are deemed too small to be considered meaningful. This review tracks the history of the crud factor and examines how its use in the psychological- and behavioral-science literature has developed to this day. We highlight a common and deep-seated lack of understanding about what the crud factor is and discuss whether it can be proven to exist or estimated and how it should be interpreted. This lack of understanding makes the crud factor a convenient tool for psychologists to use to disregard unwanted results, even though the presence of a crud factor should be a large inconvenience for the discipline. To inspire a concerted effort to take the crud factor more seriously, we clarify the definitions of important concepts, highlight current pitfalls, and pose questions that need to be addressed to ultimately improve understanding of the crud factor. Such work will be necessary to develop the crud factor into a useful concept encouraging improved psychological research.

Download Full-text

Introduction

10.1093/acprof:oso/9780190602321.003.0001 ◽

2018 ◽

Author(s):

Helen K. Black ◽

John T. Groce ◽

Charles E. Harmon

Keyword(s):

African American ◽

Physical Health ◽

Family Caregivers ◽

Large Data ◽

Large Data Sets ◽

Care Work ◽

Advanced Technology ◽

Data Sets ◽

History Of ◽

Male Caregivers

Chapter One offers a brief history of the rise in awareness of the vast numbers of informal, family caregivers caring for aged, demented, and impaired loved ones in the home. The importance of informal caregivers to the healthcare system, both financially and emotionally, emerged in studies exploring the numbers of home caregivers and the nature of their care work. Early studies also focused on the sense of burden caregivers experienced due to caregiving. Since the 1980s, caregiving studies have been a constant in research, and have become increasingly complex in the use of large data sets and advanced technology to study the number of caregivers, their characteristics and labors, and the outcomes of caregiving on their emotional and physical health. Few studies have focused solely on the experience of caregiving in African-American elder male caregivers, and in the way we accomplish here.

Download Full-text

Machine Learning Methods for Demand Estimation

The American Economic Review ◽

10.1257/aer.p20151021 ◽

2015 ◽

Vol 105 (5) ◽

pp. 481-485 ◽

Cited By ~ 44

Author(s):

Patrick Bajari ◽

Denis Nekipelov ◽

Stephen P. Ryan ◽

Miaoyu Yang

Keyword(s):

Large Data ◽

Demand Estimation ◽

Large Data Sets ◽

Linear Functions ◽

Data Sets ◽

Science Literature ◽

Data Set ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Scanner Panel

We survey and apply several techniques from the statistical and computer science literature to the problem of demand estimation. To improve out-of-sample prediction accuracy, we propose a method of combining the underlying models via linear regression. Our method is robust to a large number of regressors; scales easily to very large data sets; combines model selection and estimation; and can flexibly approximate arbitrary non-linear functions. We illustrate our method using a standard scanner panel data set and find that our estimates are considerably more accurate in out-of-sample predictions of demand than some commonly used alternatives.

Download Full-text

Distributional Concept Analysis

Contributions to the History of Concepts ◽

10.3167/choc.2019.140104 ◽

2019 ◽

Vol 14 (1) ◽

pp. 66-92 ◽

Cited By ~ 1

Author(s):

PETER DE BOLLA ◽

EWAN JONES ◽

PAUL NULTY ◽

GABRIEL RECCHIA ◽

JOHN REGAN

Keyword(s):

Large Data ◽

Semantic Networks ◽

Computational Method ◽

Large Data Sets ◽

Data Sets ◽

Change Over Time ◽

Occurrence Data ◽

History Of ◽

History Of Concepts ◽

Over Time

This article proposes a novel computational method for discerning the structure and history of concepts. Based on the analysis of co-occurrence data in large data sets, the method creates a measure of “binding” that enables the construction of verbal constellations that comprise the larger units, “concepts,” that change over time. In contrast to investigation into semantic networks, our method seeks to uncover structures of conceptual operation that are not simply semantic. These larger units of lexical operation that are visualized as interconnected networks may have underlying rules of formation and operation that have as yet unexamined—perhaps tangential—connection to meaning as such. The article is thus exploratory and intended to open the history of concepts to some new avenues of investigation.

Download Full-text

Exploiting genomic surveillance to map the spatio-temporal dispersal of SARS-CoV-2 spike mutations in Belgium across 2020

Scientific Reports ◽

10.1038/s41598-021-97667-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nena Bollen ◽

Maria Artesi ◽

Keith Durkin ◽

Samuel L. Hong ◽

Barney Potter ◽

...

Keyword(s):

Large Data ◽

Large Data Sets ◽

Surveillance Program ◽

Data Sets ◽

Proof Of Concept ◽

Analytical Strategies ◽

History Of ◽

Spatio Temporal ◽

Induced Immunity ◽

Temporal Dispersal

AbstractAt the end of 2020, several new variants of SARS-CoV-2—designated variants of concern—were detected and quickly suspected to be associated with a higher transmissibility and possible escape of vaccine-induced immunity. In Belgium, this discovery has motivated the initiation of a more ambitious genomic surveillance program, which is drastically increasing the number of SARS-CoV-2 genomes to analyse for monitoring the circulation of viral lineages and variants of concern. In order to efficiently analyse the massive collection of genomic data that are the result of such increased sequencing efforts, streamlined analytical strategies are crucial. In this study, we illustrate how to efficiently map the spatio-temporal dispersal of target mutations at a regional level. As a proof of concept, we focus on the Belgian province of Liège that has been consistently sampled throughout 2020, but was also one of the main epicenters of the second European epidemic wave. Specifically, we employ a recently developed phylogeographic workflow to infer the regional dispersal history of viral lineages associated with three specific mutations on the spike protein (S98F, A222V and S477N) and to quantify their relative importance through time. Our analytical pipeline enables analysing large data sets and has the potential to be quickly applied and updated to track target mutations in space and time throughout the course of an epidemic.

Download Full-text

Introduction to Clustering

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch010 ◽

2010 ◽

pp. 224-254

Author(s):

Raymond Greenlaw ◽

Sanpawat Kantabutra

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Research Directions ◽

History Of ◽

Representative Points ◽

Parallel Clustering ◽

Extensive List

This chapter provides the reader with an introduction to clustering algorithms and applications. A number of important well-known clustering methods are surveyed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. Algorithms are described for top-down and bottom-up hierarchical clustering, as are algorithms for K-Means clustering and for K-Medians clustering. The technique of representative points is also presented. Given the large data sets involved with clustering, the need to apply parallel computing to clustering arises, so they discuss issues related to parallel clustering as well. Throughout the chapter references are provided to works that contain a large number of experimental results. A comparison of the various clustering methods is given in tabular format. They conclude the chapter with a summary and an extensive list of references.

Download Full-text

Crud (Re)defined

10.31234/osf.io/96dpy ◽

2019 ◽

Author(s):

Amy Orben ◽

Daniel Lakens

Keyword(s):

Scientific Literature ◽

Psychological Research ◽

Large Datasets ◽

Concerted Effort ◽

Null Results ◽

Convenient Tool ◽

Behavioural Research ◽

History Of ◽

Niche Area

NOW PUBLISHED: https://doi.org/10.1177%2F2515245920917961. The idea that in behavioural research everything correlates with everything else was a niche area of the scientific literature for over half a century. With the increasing availability of large datasets in psychology, and the heightened interest in falsifiability and null results, the ‘crud’ factor has however become more relevant than ever before. It is often referenced by researchers to discount minute – but statistically significant – effects that are deemed too small to be considered meaningful. This review tracks the history of the crud factor and examines how its use in the scientific literature has developed to this day. It highlights a common and deep-seated lack of understanding about what the crud factor is, whether it can be proven to exist or estimated, and how it should be interpreted. This makes the crud factor a convenient tool for psychologists to disregard unwanted results, even though the presence of a crud factor should be a large inconvenience for the discipline. To inspire a concerted effort to take the crud factor more seriously, this review clarifies the definitions of important concepts, highlights current pitfalls and poses questions that need to be addressed to ultimately improve our understanding of the crud factor. Such work will be necessary to develop the crud factor into a useful concept encouraging improved psychological research and theory corroboration practices.

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Faculty Opinions recommendation of Detecting novel associations in large data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13805958.793484294 ◽

2014 ◽

Author(s):

Daniel Lee

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Novel Associations

Download Full-text

NVESTIGATION OF THE EFFICIENCY OF DISTRIBUTED INFORMATION SYSTEMS BASED ON THE PROCESSING OF LARGE AMOUNTS OF DATA

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-03 ◽

2019 ◽

Author(s):

Mykhajlo Klymash ◽

Olena Hordiichuk — Bublivska ◽

Ihor Tchaikovskyi ◽

Oksana Urikova

Keyword(s):

Distributed Systems ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Decomposition ◽

Distributed Information ◽

Software Model ◽

Computing Performance ◽

Mapreduce Model ◽

Singular Data

In this article investigated the features of processing large arrays of information for distributed systems. A method of singular data decomposition is used to reduce the amount of data processed, eliminating redundancy. Dependencies of computational efficiency on distributed systems were obtained using the MPI messaging protocol and MapReduce node interaction software model. Were analyzed the efficiency of the application of each technology for the processing of different sizes of data: Non — distributed systems are inefficient for large volumes of information due to low computing performance. It is proposed to use distributed systems that use the method of singular data decomposition, which will reduce the amount of information processed. The study of systems using the MPI protocol and MapReduce model obtained the dependence of the duration calculations time on the number of processes, which testify to the expediency of using distributed computing when processing large data sets. It is also found that distributed systems using MapReduce model work much more efficiently than MPI, especially with large amounts of data. MPI makes it possible to perform calculations more efficiently for small amounts of information. When increased the data sets, advisable to use the Map Reduce model.

Download Full-text