scholarly journals A statistical model for describing and simulating microbial community profiles

2021 ◽  
Vol 17 (9) ◽  
pp. e1008913
Author(s):  
Siyuan Ma ◽  
Boyu Ren ◽  
Himel Mallick ◽  
Yo Sup Moon ◽  
Emma Schwager ◽  
...  

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.

2021 ◽  
Author(s):  
Siyuan Ma ◽  
Boyu Ren ◽  
Himel Mallick ◽  
Yo Sup Moon ◽  
Emma Schwager ◽  
...  

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA’s model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. “taxa”) or between features and “phenotypes” to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA’s performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA’s utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at  http://huttenhower.sph.harvard.edu/sparsedossa2 .


2017 ◽  
Vol 262 ◽  
pp. 135-138 ◽  
Author(s):  
Carlos L. Aspiazu ◽  
Paulina Aguirre ◽  
Sabrina Hedrich ◽  
Axel Schippers

In a mine owned by the company Orenas S.A. (Equador), a biooxidation process for gold recovery has been developed. Refractory gold ore was crushed, milled and 500 ton of flotation concentrate was agglomerated by coating a support rock. This was piled up on a liner and the biooxidation process in the heap of 35x25x6 m3 was run for approximately 150 days. The oxidized material was subsequently removed for further processing. An outcrop allowed for depth dependent sampling of altogether 36 samples at three sites over the complete depth of 6 m. The fine fraction was removed from the host rock and sent to the laboratory for analysis of the microbial community. The pH ranged between 2.2 and 2.9. Total cell counts determined via counting under a fluorescence microscope after SYBR Green staining indicated a high microbial colonialization of the heap in all depths between 106 to 109 cells per g concentrate, however the highest cell numbers were mainly found in the upper 50 cm. Most-probable-number determination of living, acidophilic iron (II)-oxidizers for one site also revealed a decrease of cell numbers with depth (between 104 to 108 cells per g concentrate). Further molecular analyses of the community composition based on extracted DNA and 16S rRNA gene analyses by TRFLP and qPCR revealed a complex archaeal and bacterial community within the heap. It can be stated that an active community of acidophiles runs the biooxidation process in all sampled parts of the heap.


2011 ◽  
pp. 1-17
Author(s):  
Stephan Kudyba

Despite the research written, the software developed and the business applications that can be enhanced by it, the terms data mining and multivariate modeling continue to stoke uncertainty, complexity and sometimes fear in business managers and strategic decision-makers across industry sectors. Why is this? There are a number of reasons to cite, but probably the most common involves the complex nature of the methodologies incorporated in this analytic technique. The complexity we refer to involves the use of mathematical equations, sophisticated algorithms and advanced search and query techniques, not to mention statistical applications that are utilized in analyzing data. If that is not enough to throw management back on their heels, how about data acquisition, normalization, and model optimization, which are often involved in the process? Let’s add one more attribute to the list, and that is the ability to not only understand these complex methods, but more importantly, to understand when and where they can be used to enhance operational efficiency. Now is there any wonder why data mining continues to be this mysterious phenomenon in the world of commerce? No doubt; however, to dispel some of the uncertainties regarding this issue, the following book will provide the reader with expert input on how these quantitative methods are being used in prominent organizations in a variety of industry sectors to help enhance productivity, efficiency and to some extent, profitability. Before we get into the details of the applied material, the following chapter will provide some general information on what data mining and multivariate modeling is, where it came from, and how it can be used in a corporate setting to enhance operational efficiency.


Author(s):  
Rifat Kamasak

Purpose – This study aims to investigate the complex interaction of different resource sets and capabilities in the process of performance creation within the context of resource-based theory. Design/methodology/approach – An inductive case study approach that included multiple data collection methods such as in-depth interviews, observation and documentation was utilized. Findings – Organizational culture, reputational assets, human capital, business processes and networking capabilities were found as the most important determinants of firm performance within the context of Ülker case study. Originality/value – Although large-scale empirical studies can be used to explore the direct resource–performance relationship, these quantitative methods bypass the complex and embedded nature of intangibles and provide only a limited understanding of why some resources are identified as strategic but others are not, what their roles are, and how these resources are converted into positions of competitive advantage. However, understanding of complex nature of resources that are embedded in organizations designates the need for more fieldwork-based qualitative studies. This study aims to address this gap by providing a thorough understanding about the managerial and organizational processes through which the resources become valuable.


1989 ◽  
Vol 33 (18) ◽  
pp. 1228-1232 ◽  
Author(s):  
Floyd Glenn

This paper examines the appropriate role of human performance micro-models in simulations of human-machine system operations. Requirements for general human micro-models are considered relative to the objectives of simulation studies, the conditions under which simulations are constructed and used, the status of human performance data bases and models, and the features provided with general-purpose simulation software. This investigation focuses particularly on a new simulation tool for simulating human-machine systems; it is known as the Human Operator Simulator – Version V (HOS-V). A general design principle of HOS-V has been to provide embedded human performance micro-models for the basic performance processes that seem most pervasive and interactive with other processes. These include representations for processes of body movement, cognition, and attention. Key to these representations are the substructures in each area. Body movement models describe locations of body parts and constraints on their movement. Cognition models describe how the human processes information through perception, memory, decision-making, and action initiation. The attention model describes how a limited attentional resource is allocated to the various body movement and cognition processes, each of which has a defined attentional requirement. Plans for implementation of micro-model components of HOS-V are discussed.


1978 ◽  
Vol 26 (1) ◽  
pp. 14-21 ◽  
Author(s):  
G Berlin ◽  
L Enerbäck

A cytofluorometric method, based on berberine staining of mast cell heparin, was used for flow cytofluorometric counting and heparin quantitation of mast cells in crude peritoneal suspensions of growing rats. The automatic flow cytofluorometric counting of mast cells correlated well with hemocytometer cell counts. The mean mast cell heparin content obtained by flow cytofluorometry showed good agreement with such obtained by cytofluorometry of microscopically identified mast cells. The number of peritoneal mast cells and the mean mast cell heparin content was found to increase as the animals grew older. The results of the microscope fluorometric measurements suggested that the heparin content was normally distributed within mast cell populations of both young and old rats. However, the heparin distributions obtained by flow cytofluorometry were often positively skewed but did not fulfill the condition of the log-normal distribution.


This study aims to: (1) mapping the components of the heutagogy model by lecturers; (2) analyzing the percentage of heutagogy model components by lecturers; and (3) interpret the components of the heutagogy model by lecturers to the era of education 4.0. The method used is quantitative methods. The population in this study is all vocational education lecturers at State University of Malang (UM), Indonesia. The sample in this study were 200 vocational education lecturers at UM. Data analysis techniques with SPSS 24 through descriptive statistics. The findings in this study include: (1) components in the heutagogy model include explore, Create, Collaborate, Connect, Share, Reflect; (2) the percentage of the components of the heutagogy model by lecturers includes explore (86.92%), Create (87.87%), Collaborate (87.42%), Connect (87.89%), Share (88.72), Reflect (89,30); and (3) all components of the heutagogy model are related to the Education 4.0 era.


2017 ◽  
Vol 30 (4) ◽  
pp. 885-891 ◽  
Author(s):  
SILVANA SILVA RED QUINTAL ◽  
ALEXANDRE PIO VIANA ◽  
BIANCA MACHADO CAMPOS ◽  
MARCELO VIVAS ◽  
ANTONIO TEIXEIRA DO AMARAL JÚNIOR

ABSTRACT The present study was conducted with the objective of analyzing the covariance structure and repeatability estimates of the variables related to guava productivity, such as fruit weight (FW), fruit number (FN) and fruit production (FP) of three harvests, in 95 genotypes of a segregating population. The study also aims to choose the most appropriate covariance structure of the observations within the same individual by means of AIC (Akaike's Information Criterion) and SBC (Schwarz's Bayesian Criterion) criteria. A covariance structure between repeated measures could be incorporated into the statistical model, with the self-regression and compound symmetry forms being the most adequate. The values of repeatability coefficients obtained for FW (0.25), FN (0.14), and FP (0.29) were considered low, indicating that the three harvests were not sufficient to select the best individuals with greater accuracy for the study population. For the variables PF and FP, estimates of accuracy around 0.50 could be obtained from five measurements, while for the variable FN more harvests would be necessary. These values indicate that in guava-segregating populations, evaluations in the first harvests are not enough to select more stable genotypes for the variables considered in this study.


2005 ◽  
Vol 13 (2) ◽  
pp. 67-77
Author(s):  
Ioana Banicescu ◽  
Ricolindo L. Cariño ◽  
Jane L. Harvill ◽  
John Patrick Lestrade

The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.


Radiocarbon ◽  
2002 ◽  
Vol 44 (1) ◽  
pp. 195-212 ◽  
Author(s):  
Delil Gómez Portugal Aguilar ◽  
Cliff D Litton ◽  
Anthony O'Hagan

The process of calibrating radiocarbon determinations onto the calendar scale requires the setting of a specific statistical model for the calibration curve. This model specification will bear fundamental importance for the resulting inference regarding the parameter of interest—namely, in general, the calendar age associated to the sample that has been 14C-dated.Traditionally, the 14C calibration curve has been modelled simply as the piece-wise linear curve joining the (internationally agreed) high-precision calibration data points; or, less frequently, by proposing spline functions in order to obtain a smoother curve.We present a model for the 14C calibration curve which, based on specific characteristics of the dating method, yields a piece-wise linear curve, but one which rather than interpolating the data points, smooths them. We show that with this specific model if a piece-wise linear curve is desired, an underlying random walk model is implied as covariance structure (and vice versa). Furthermore, by making use of all the information provided by the calibration data in a comprehensive way, we achieve an improvement over current models by getting more realistic variance values for the calibration curve.


Sign in / Sign up

Export Citation Format

Share Document