scholarly journals Simphony: simulating large-scale, rhythmic data

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6985 ◽  
Author(s):  
Jordan M. Singer ◽  
Darwin Y. Fu ◽  
Jacob J. Hughey

Simulated data are invaluable for assessing a computational method’s ability to distinguish signal from noise. Although many biological systems show rhythmicity, there is no general-purpose tool to simulate large-scale, rhythmic data. Here we present Simphony, an R package for simulating data from experiments in which the abundances of rhythmic and non-rhythmic features (e.g., genes) are measured at multiple time points in multiple conditions. Simphony has parameters for specifying experimental design and each feature’s rhythmic properties (e.g., amplitude and phase). In addition, Simphony can sample measurements from Gaussian and negative binomial distributions, the latter of which approximates read counts from RNA-seq data. We show an example of using Simphony to evaluate the accuracy of rhythm detection. Our results suggest that Simphony will aid experimental design and computational method development. Simphony is thoroughly documented and freely available at https://github.com/hugheylab/simphony.

2018 ◽  
Author(s):  
Jordan M Singer ◽  
Darwin Y Fu ◽  
Jacob J Hughey

Simulated data are invaluable for assessing a computational method's ability to distinguish signal from noise. Although many biological systems show rhythmicity, there is no general-purpose tool to simulate large-scale, rhythmic data. Here we present Simphony, an R package for simulating data from experiments in which the abundances of rhythmic and non-rhythmic features (e.g., genes) are measured at multiple time points in multiple conditions. Simphony has parameters for specifying experimental design and each feature's rhythmic properties (e.g., shape, amplitude, and phase). In addition, Simphony can sample measurements from Gaussian and negative binomial distributions, the latter of which approximates read counts from next-generation sequencing data. We show an example of using Simphony to benchmark a method for detecting rhythms. Our results suggest that Simphony can aid experimental design and computational method development. Simphony is thoroughly documented and freely available at https://github.com/hugheylab/simphony.


2018 ◽  
Author(s):  
Wei Vivian Li ◽  
Jingyi Jessica Li

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths, and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information.ResultsHere we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and six different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experiment design based on specific research goals and compares various scRNA-seq computational methods.AvailabilityWe have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/[email protected]


2019 ◽  
Vol 35 (14) ◽  
pp. i41-i50 ◽  
Author(s):  
Wei Vivian Li ◽  
Jingyi Jessica Li

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. Results Here we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and 6 different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experimental design and comparison of scRNA–seq computational methods based on specific research goals. Availability and implementation We have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Alicia L. Jurek ◽  
Matthew C. Matusiak ◽  
Randa Embry Matusiak

Purpose The current research explores the structural elaboration of municipal American police organizations, specifically, the structural complexity of police organizations and its relationship to time. The purpose of this paper is to describe and test essential elements of the structural elaboration hypothesis. Design/methodology/approach The authors explore the structural elaboration hypothesis utilizing a sample of 219 large police departments across the USA. Data are drawn from multiple waves of the Law Enforcement Management and Administrative Statistics survey and are analyzed using tobit and OLS regression techniques. Findings While there is some evidence that police departments are becoming more elaborate, little evidence for the structural elaboration hypothesis as a function of time is found. Originality/value This project is the first to specifically explore the structural elaboration hypothesis across multiple time points. Additionally, results highlight structural trends across a panel of large American police organizations and provide potential explanations for changes. Suggestions for large-scale policing data collection are also provided.


2019 ◽  
Author(s):  
Tuong-Van Vu ◽  
Martijn Meeter ◽  
Abe Dirk Hofman ◽  
Brenda Jansen ◽  
Lucía Magis-Weinberg ◽  
...  

The present research investigates the relations between motivation belief, motivational behaviors and academic achievement in educational contexts. The first main question is whether motivational belief is reciprocally related to achievement and whether a cyclic loop of motivation and achievement is formed over time. The second objective is to study the mediating pathway between motivation and achievement by measuring actual effort spent on learning (i.e. motivational behaviors). Third, we examine the causality of these relations by investigating how they are affected when achievement is experimentally manipulated. We design an intensive longitudinal experiment in which participants will learn new English vocabulary and their motivational belief, effort, and achievement are measured at multiple time points. In the second half of the experiment, participants receive rigged feedback that their achievement has dropped which is expected to influence their subsequent motivation, effort, and actual achievement. To study these dynamics, the changes in one construct are related to changes in other constructs over time, and will be analyzed within a latent change score modeling framework. Planned analyses, expected (narrative) results, and a simulated data set are provided.


2018 ◽  
Vol 35 (11) ◽  
pp. 1901-1906 ◽  
Author(s):  
Mary D Fortune ◽  
Chris Wallace

Abstract Motivation Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. Results We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. Availability and implementation Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Mary D. Fortune ◽  
Chris Wallace

AbstractMotivationMethods for analysis of GWAS summary statistics have encouraged data sharing and democratised the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some “truth” is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study.ResultsWe have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.Availability and ImplementationOur method is available under a GPL license as an R package from http://github.com/chr1swallace/[email protected] InformationSupplementary Information is appended.


2020 ◽  
Author(s):  
Anne Pelikan ◽  
Hanspeter Herzel ◽  
Achim Kramer ◽  
Bharath Ananthasubramaniam

AbstractThe circadian clock modulates key physiological processes in many organisms. This widespread role of circadian rhythms is typically characterized at the molecular level by profiling the transcriptome at multiple time points. Subsequent analysis identifies transcripts with altered rhythms between control and perturbed conditions, i.e., are differentially rhythmic (DiffR). Commonly, Venn Diagram analysis (VDA) compares lists of rhythmic transcripts to catalog transcripts with rhythms in both conditions or have gained or lost rhythms. However, unavoidable errors in the rhythmicity detection propagate to the final DiffR classification resulting in overestimated DiffR. We show using artificial experiments constructed from biological data that VDA indeed produces excessive false DiffR hits both in the presence and absence of true DiffR transcripts. We present a hypothesis testing and a model selection approaches in an R-package compareRhythms that instead compare circadian amplitude and phase of transcripts between the two conditions. These methods identify transcripts with ‘gain’, ‘loss’, ‘change’ or have the ‘same’ rhythms; the third category is missed by VDA. We reanalyzed three studies on the interplay between metabolism and the clock in the mouse liver that used VDA. We found not only fewer DiffR transcripts than originally reported, but VDA overlooked many relevant DiffR transcripts. Our analyses confirmed some and contradicted other conclusions in the original studies and also generated novel hypotheses. Our insights also generalize easily to studies using other -omics technologies. We trust that avoiding Venn Diagrams and using our R-package will contribute to improved reproducibility in chronobiology.


2020 ◽  
Vol 36 (8) ◽  
pp. 2352-2358
Author(s):  
Guodong Yang ◽  
Aiqun Ma ◽  
Zhaohui S Qin ◽  
Li Chen

Abstract Motivation The availability of thousands of genome-wide coupling chromatin immunoprecipitation (ChIP)-Seq datasets across hundreds of transcription factors (TFs) and cell lines provides an unprecedented opportunity to jointly analyze large-scale TF-binding in vivo, making possible the discovery of the potential interaction and cooperation among different TFs. The interacted and cooperated TFs can potentially form a transcriptional regulatory module (TRM) (e.g. co-binding TFs), which helps decipher the combinatorial regulatory mechanisms. Results We develop a computational method tfLDA to apply state-of-the-art topic models to multiple ChIP-Seq datasets to decipher the combinatorial binding events of multiple TFs. tfLDA is able to learn high-order combinatorial binding patterns of TFs from multiple ChIP-Seq profiles, interpret and visualize the combinatorial patterns. We apply the tfLDA to two cell lines with a rich collection of TFs and identify combinatorial binding patterns that show well-known TRMs and related TF co-binding events. Availability and implementation A software R package tfLDA is freely available at https://github.com/lichen-lab/tfLDA. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 8 (1) ◽  
pp. 3 ◽  
Author(s):  
Timo von Oertzen ◽  
Florian Schmiedek ◽  
Manuel C. Voelkle

Properties of psychological variables at the mean or variance level can differ between persons and within persons across multiple time points. For example, cross-sectional findings between persons of different ages do not necessarily reflect the development of a single person over time. Recently, there has been an increased interest in the difference between covariance structures, expressed by covariance matrices, that evolve between persons and within a single person over multiple time points. If these structures are identical at the population level, the structure is called ergodic. However, recent data confirms that ergodicity is not generally given, particularly not for cognitive variables. For example, the g factor that is dominant for cognitive abilities between persons seems to explain far less variance when concentrating on a single person’s data. However, other subdimensions of cognitive abilities seem to appear both between and within persons; that is, there seems to be a lower-dimensional subspace of cognitive abilities in which cognitive abilities are in fact ergodic. In this article, we present ergodic subspace analysis (ESA), a mathematical method to identify, for a given set of variables, which subspace is most important within persons, which is most important between person, and which is ergodic. Similar to the common spatial patterns method, the ESA method first whitens a joint distribution from both the between and the within variance structure and then performs a principle component analysis (PCA) on the between distribution, which then automatically acts as an inverse PCA on the within distribution. The difference of the eigenvalues allows a separation of the rotated dimensions into the three subspaces corresponding to within, between, and ergodic substructures. We apply the method to simulated data and to data from the COGITO study to exemplify its usage.


Sign in / Sign up

Export Citation Format

Share Document