scholarly journals A brief introduction to mixed effects modelling and multi-model inference in ecology

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4794 ◽  
Author(s):  
Xavier A. Harrison ◽  
Lynda Donaldson ◽  
Maria Eugenia Correa-Cano ◽  
Julian Evans ◽  
David N. Fisher ◽  
...  

The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.

Author(s):  
Xavier A Harrison ◽  
Lynda Donaldson ◽  
Maria Eugenia Correa-Cano ◽  
Julian Evans ◽  
David N Fisher ◽  
...  

The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues relating to the use of information theory and multi-model inference in ecology, and demonstrate the tendency for data dredging to lead to greatly inflated Type I error rate (false positives) and impaired inference. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.


Author(s):  
Xavier A Harrison ◽  
Lynda Donaldson ◽  
Maria Eugenia Correa-Cano ◽  
Julian Evans ◽  
David N Fisher ◽  
...  

The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues relating to the use of information theory and multi-model inference in ecology, and demonstrate the tendency for data dredging to lead to greatly inflated Type I error rate (false positives) and impaired inference. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.


Author(s):  
Xavier A Harrison ◽  
Lynda Donaldson ◽  
Maria Eugenia Correa-Cano ◽  
Julian Evans ◽  
David N Fisher ◽  
...  

The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues relating to the use of information theory and multi-model inference in ecology, and demonstrate the tendency for data dredging to lead to greatly inflated Type I error rate (false positives) and impaired inference. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.


2021 ◽  

Abstract R is an open-source statistical environment modelled after the previously widely used commercial programs S and S-Plus, but in addition to powerful statistical analysis tools, it also provides powerful graphics outputs. In addition to its statistical and graphical capabilities, R is a programming language suitable for medium-sized projects. This book presents a set of studies that collectively represent almost all the R operations that beginners, analysing their own data up to perhaps the early years of doing a PhD, need. Although the chapters are organized around topics such as graphing, classical statistical tests, statistical modelling, mapping and text parsing, examples have been chosen based largely on real scientific studies at the appropriate level and within each the use of more R functions is nearly always covered than are simply necessary just to get a p-value or a graph. R comes with around a thousand base functions which are automatically installed when R is downloaded. This book covers the use of those of most relevance to biological data analysis, modelling and graphics. Throughout each chapter, the functions introduced and used in that chapter are summarized in Tool Boxes. The book also shows the user how to adapt and write their own code and functions. A selection of base functions relevant to graphics that are not necessarily covered in the main text are described in Appendix 1, and additional housekeeping functions in Appendix 2.


2018 ◽  
Vol 22 (8) ◽  
pp. 4565-4581 ◽  
Author(s):  
Florian U. Jehn ◽  
Lutz Breuer ◽  
Tobias Houska ◽  
Konrad Bestian ◽  
Philipp Kraft

Abstract. The ambiguous representation of hydrological processes has led to the formulation of the multiple hypotheses approach in hydrological modeling, which requires new ways of model construction. However, most recent studies focus only on the comparison of predefined model structures or building a model step by step. This study tackles the problem the other way around: we start with one complex model structure, which includes all processes deemed to be important for the catchment. Next, we create 13 additional simplified models, where some of the processes from the starting structure are disabled. The performance of those models is evaluated using three objective functions (logarithmic Nash–Sutcliffe; percentage bias, PBIAS; and the ratio between the root mean square error and the standard deviation of the measured data). Through this incremental breakdown, we identify the most important processes and detect the restraining ones. This procedure allows constructing a more streamlined, subsequent 15th model with improved model performance, less uncertainty and higher model efficiency. We benchmark the original Model 1 and the final Model 15 with HBV Light. The final model is not able to outperform HBV Light, but we find that the incremental model breakdown leads to a structure with good model performance, fewer but more relevant processes and fewer model parameters.


2014 ◽  
Vol 15 (1) ◽  
Author(s):  
Koen Van der Borght ◽  
Geert Verbeke ◽  
Herman van Vlijmen

Author(s):  
José Caldas ◽  
Samuel Kaski

Biclustering is the unsupervised learning task of mining a data matrix for useful submatrices, for instance groups of genes that are co-expressed under particular biological conditions. As these submatrices are expected to partly overlap, a significant challenge in biclustering is to develop methods that are able to detect overlapping biclusters. The authors propose a probabilistic mixture modelling framework for biclustering biological data that lends itself to various data types and allows biclusters to overlap. Their framework is akin to the latent feature and mixture-of-experts model families, with inference and parameter estimation being performed via a variational expectation-maximization algorithm. The model compares favorably with competing approaches, both in a binary DNA copy number variation data set and in a miRNA expression data set, indicating that it may potentially be used as a general-problem solving tool in biclustering.


Author(s):  
José Antonio Seoane Fernández ◽  
Mónica Miguélez Rico

Large worldwide projects like the Human Genome Project, which in 2003 successfully concluded the sequencing of the human genome, and the recently terminated Hapmap Project, have opened new perspectives in the study of complex multigene illnesses: they have provided us with new information to tackle the complex mechanisms and relationships between genes and environmental factors that generate complex illnesses (Lopez, 2004; Dominguez, 2006). Thanks to these new genomic and proteomic data, it becomes increasingly possible to develop new medicines and therapies, establish early diagnoses, and even discover new solutions for old problems. These tasks however inevitably require the analysis, filtration, and comparison of a large amount of data generated in a laboratory with an enormous amount of data stored in public databases, such as the NCBI and the EBI. Computer sciences equip biomedicine with an environment that simplifies our understanding of the biological processes that take place in each and every organizational level of live matter (molecular level, genetic level, cell, tissue, organ, individual, and population) and the intrinsic relationships between them. Bioinformatics can be described as the application of computational methods to biological discoveries (Baldi, 1998). It is a multidisciplinary area that includes computer sciences, biology, chemistry, mathematics, and statistics. The three main tasks of bioinformatics are the following: develop algorithms and mathematical models to test the relationships between the members of large biological datasets, analyze and interpret heterogeneous data types, and implement tools that allow the storage, retrieve, and management of large amounts of biological data.


2019 ◽  
Vol 16 (151) ◽  
pp. 20180747
Author(s):  
Bernat Bramon Mora ◽  
Giulio V. Dalla Riva ◽  
Daniel B. Stouffer

Null models have become a crucial tool for understanding structure within incidence matrices across multiple biological contexts. For example, they have been widely used for the study of ecological and biogeographic questions, testing hypotheses regarding patterns of community assembly, species co-occurrence and biodiversity. However, to our knowledge we remain without a general and flexible approach to study the mechanisms explaining such structures. Here, we provide a method for generating ‘correlation-informed’ null models, which combine the classic concept of null models and tools from community ecology, like joint statistical modelling. Generally, this model allows us to assess whether the information encoded within any given correlation matrix is predictive for explaining structural patterns observed within an incidence matrix. To demonstrate its utility, we apply our approach to two different case studies that represent examples of common scenarios encountered in community ecology. First, we use a phylogenetically informed null model to detect a strong evolutionary fingerprint within empirically observed food webs, reflecting key differences in the impact of shared evolutionary history when shaping the interactions of predators or prey. Second, we use multiple informed null models to identify which factors determine structural patterns of species assemblages, focusing in on the study of nestedness and the influence of site size, isolation, species range and species richness. In addition to offering a versatile way to study the mechanisms shaping the structure of any incidence matrix, including those describing ecological communities, our approach can also be adapted further to test even more sophisticated hypotheses.


Sign in / Sign up

Export Citation Format

Share Document