Multiverse analyses in fear conditioning research

There is heterogeneity in and a lack of consensus on the preferred statistical analyses foranalyzing fear conditioning effects in light of a multitude of potentially equally justifiablestatistical approaches. Here, we introduce the concept of multiverse analysis for fearconditioning research. We also present a model multiverse approach specifically tailored tofear conditioning research and introduce the novel and easy to use R package ‘multifear’ thatallows to run all the models though a single line of code. Model specifications and datareduction approaches employed in the ‘multifear’ package were identified through arepresentative systematic literature search. The heterogeneity of statistical models identifiedincluded Bayesian ANOVA and t-tests as well as frequentist ANOVA, t-test as well as mixedmodels with a variety of data reduction approaches (i.e., number of trials, trial blocks,averages) as input. We illustrate the power of a multiverse analysis for fear conditioning databased on two pre-existing data sets with partial (data set 1) and 100% reinforcement rate(data set 2) by using CS discrimination in skin conductance responses (SCRs) during fearacquisition and extinction training as case examples. Both the effect size and the direction ofeffect was impacted by choice of the model and data reduction techniques. We anticipatethat an increase in multiverse-type of studies in the field of fear conditioning research andtheir extension to other outcome measures as well as data and design multiverse analyseswill aid the development of formal theories through the accumulation of empirical evidence.This may contribute to facilitated and more successful clinical translation.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Cancer Informatics ◽

10.1177/1176935118771082 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877108 ◽

Cited By ~ 4

Author(s):

Min Wang ◽

Steven M Kornblau ◽

Kevin R Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

How to Automatically Document Data With the codebook Package to Facilitate Data Reuse

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919838783 ◽

2019 ◽

Vol 2 (2) ◽

pp. 169-187 ◽

Cited By ~ 7

Author(s):

Ruben C. Arslan

Keyword(s):

R Package ◽

Data Reuse ◽

Data Sets ◽

Data Set ◽

Psychological Scales ◽

Rich Data ◽

Data Documentation ◽

Machine Readable ◽

Basic Standards ◽

Existing Data

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.

Download Full-text

Quantifying instrument errors in macromolecular X-ray data sets

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444910014836 ◽

2010 ◽

Vol 66 (6) ◽

pp. 733-740 ◽

Cited By ~ 56

Author(s):

Kay Diederichs

Keyword(s):

Data Reduction ◽

Spindle Speed ◽

Experimental Setup ◽

Data Sets ◽

Signal To Noise ◽

Instrument Error ◽

Acta Cryst ◽

Data Set ◽

X Ray ◽

The Stability

An indicator which is calculated after the data reduction of a test data set may be used to estimate the (systematic) instrument error at a macromolecular X-ray source. The numerical value of the indicator is the highest signal-to-noise [I/σ(I)] value that the experimental setup can produce and its reciprocal is related to the lower limit of the mergingRfactor. In the context of this study, the stability of the experimental setup is influenced and characterized by the properties of the X-ray beam, shutter, goniometer, cryostream and detector, and also by the exposure time and spindle speed. Typical values of the indicator are given for data sets from the JCSG archive. Some sources of error are explored with the help of test calculations usingSIM_MX[Diederichs (2009),Acta Cryst.D65, 535–542]. One conclusion is that the accuracy of data at low resolution is usually limited by the experimental setup rather than by the crystal. It is also shown that the influence of vibrations and fluctuations may be mitigated by a reduction in spindle speed accompanied by stronger attenuation.

Download Full-text

Deep Neural Networks for the Classification of Bank Marketing Data using Data Reduction Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5522.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 4373-4378

Keyword(s):

Neural Network ◽

Data Mining ◽

Data Reduction ◽

Deep Neural Network ◽

Training Data ◽

Neural Network Classifier ◽

Data Set ◽

Reduction Techniques ◽

Marketing Data ◽

Bank Marketing

The amount of data belonging to different domains are being stored rapidly in various repositories across the globe. Extracting useful information from the huge volumes of data is always difficult due to the dynamic nature of data being stored. Data Mining is a knowledge discovery process used to extract the hidden information from the data stored in various repositories, termed as warehouses in the form of patterns. One of the popular tasks of data mining is Classification, which deals with the process of distinguishing every instance of a data set into one of the predefined class labels. Banking system is one of the realworld domains, which collects huge number of client data on a daily basis. In this work, we have collected two variants of the bank marketing data set pertaining to a Portuguese financial institution consisting of 41188 and 45211 instances and performed classification on them using two data reduction techniques. Attribute subset selection has been performed on the first data set and the training data with the selected features are used in classification. Principal Component Analysis has been performed on the second data set and the training data with the extracted features are used in classification. A deep neural network classification algorithm based on Backpropagation has been developed to perform classification on both the data sets. Finally, comparisons are made on the performance of each deep neural network classifier with the four standard classifiers, namely Decision trees, Naïve Bayes, Support vector machines, and k-nearest neighbors. It has been found that the deep neural network classifier outperforms the existing classifiers in terms of accuracy

Download Full-text

Navigating the garden of forking paths for data exclusions in fear conditioning research

eLife ◽

10.7554/elife.52465 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 16

Author(s):

Tina B Lonsdorf ◽

Maren Klingelhöfer-Jens ◽

Marta Andreatta ◽

Tom Beckers ◽

Anastasia Chalkia ◽

...

Keyword(s):

Fear Conditioning ◽

Degrees Of Freedom ◽

Evidence Based ◽

Data Sets ◽

Exclusion Criteria ◽

Case Examples ◽

Data Collection And Analysis ◽

Research Findings ◽

Considerable Impact ◽

Substantial Heterogeneity

In this report, we illustrate the considerable impact of researcher degrees of freedom with respect to exclusion of participants in paradigms with a learning element. We illustrate this empirically through case examples from human fear conditioning research, in which the exclusion of ‘non-learners’ and ‘non-responders’ is common – despite a lack of consensus on how to define these groups. We illustrate the substantial heterogeneity in exclusion criteria identified in a systematic literature search and highlight the potential problems and pitfalls of different definitions through case examples based on re-analyses of existing data sets. On the basis of these studies, we propose a consensus on evidence-based rather than idiosyncratic criteria, including clear guidelines on reporting details. Taken together, we illustrate how flexibility in data collection and analysis can be avoided, which will benefit the robustness and replicability of research findings and can be expected to be applicable to other fields of research that involve a learning element.

Download Full-text

Two-stage Linked Component Analysis for Joint Decomposition of Multiple Biologically Related Data Sets

10.1101/2021.03.22.435728 ◽

2021 ◽

Author(s):

By Huan Chen ◽

Brian Caffo ◽

Genevieve Stein-O’Brien ◽

Jinrui Liu ◽

Ben Langmead ◽

...

Keyword(s):

R Package ◽

Component Analysis ◽

Biological Data ◽

Joint Analysis ◽

Data Sets ◽

Biological Processes ◽

Two Stage ◽

Data Set ◽

Multiple Data ◽

Multiple Data Sets

SummaryIntegrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets. The code to conduct 2s-LCA has been complied into an R package “PJD”, which is available at https://github.com/CHuanSite/PJD.

Download Full-text

Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification

10.1101/671404 ◽

2019 ◽

Cited By ~ 2

Author(s):

Pavlin G. Poličar ◽

Martin Stražar ◽

Blaž Zupan

Keyword(s):

Single Cell ◽

Secondary Data ◽

Primary Data ◽

Data Sets ◽

Batch Effects ◽

Data Set ◽

Reduction Techniques ◽

Straightforward Application ◽

Cell Gene Expression ◽

Multiple Data Sets

AbstractDimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data. When working with multiple data sets, a straightforward application of these methods often fails; instead of revealing underlying classes, the resulting visualizations expose data set-specific clusters. To circumvent these batch effects, we propose an embedding procedure that takes a t-SNE visualization constructed on a reference data set and uses it as a scaffold for embedding new data. The new, secondary data is embedded one data-point at the time. This prevents any interactions between instances in the secondary data and implicitly mitigates batch effects. We demonstrate the utility of this approach with an analysis of six recently published single-cell gene expression data sets containing up to tens of thousands of cells and thousands of genes. In these data sets, the batch effects are particularly strong as the data comes from different institutions and was obtained using different experimental protocols. The visualizations constructed by our proposed approach are cleared of batch effects, and the cells from secondary data sets correctly co-cluster with cells from the primary data sharing the same cell type.

Download Full-text

A SURVEY ON THE CURES FOR THE CURSE OF DIMENSIONALITY IN BIG DATA

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19755 ◽

2017 ◽

Vol 10 (13) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Reshma Remesh ◽

Pattabiraman. V

Keyword(s):

Dimensionality Reduction ◽

Input Data ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

High Dimensional ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network have been studied.

Download Full-text

WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online Taxonomic Backbone data

10.1101/2020.02.02.930719 ◽

2020 ◽

Cited By ~ 2

Author(s):

Roeland Kindt

Keyword(s):

Success Rate ◽

Tree Species ◽

Vascular Plants ◽

R Package ◽

Data Sets ◽

Data Set ◽

Plant Names ◽

The World ◽

Fuzzy Matching ◽

Timber Tree

ABSTRACTPremise of the studyStandardization of plant names is a critical step in various fields of biology including biodiversity, biogeography and vegetation research. WorldFlora matches lists of plant names with a static copy from World Flora Online (WFO), an ongoing global effort of completing an online flora of all known vascular plants and bryophytes by 2020.Methods and resultsBased on direct and fuzzy matching, WorldFlora inserts matching cases from the WFO to a submitted data set of with taxa. Results of selecting the expected best single matches are presented for four data sets, including a working list of commercial timber tree species, a subset from GlobalTreeSearch and 2 data sets used in previous comparisons of software tools for correcting taxon names. The success rate of credible matches varied from 94.7 (568 taxa) to 99.9 (1740 taxa) percent.ConclusionsWorldFlora offers a straightforward pipeline for semi-automatic plant name checking.

Download Full-text

Haplotype Classification Using Copy Number Variation and Principal Components Analysis

The Open Bioinformatics Journal ◽

10.2174/1875036201307010019 ◽

2013 ◽

Vol 7 (1) ◽

pp. 19-24

Author(s):

Kevin Blighe

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Large Scale ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Reduction Techniques ◽

Number Variation ◽

Components Analysis

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

Download Full-text