robCompositions: An R-package for Robust Statistical Analysis of Compositional Data

2011 ◽  
pp. 341-355 ◽  
Author(s):  
Matthias Templ ◽  
Karel Hron ◽  
Peter Filzmoser
2021 ◽  
Vol 50 (2) ◽  
pp. 16-37
Author(s):  
Valentin Todorov

In a number of recent articles Riani, Cerioli, Atkinson and others advocate the technique of monitoring robust estimates computed over a range of key parameter values. Through this approach the diagnostic tools of choice can be tuned in such a way that highly robust estimators which are as efficient as possible are obtained. This approach is applicable to various robust multivariate estimates like S- and MM-estimates, MVE and MCD as well as to the Forward Search in whichmonitoring is part of the robust method. Key tool for detection of multivariate outliers and for monitoring of robust estimates is the Mahalanobis distances and statistics related to these distances. However, the results obtained with thistool in case of compositional data might be unrealistic since compositional data contain relative rather than absolute information and need to be transformed to the usual Euclidean geometry before the standard statistical tools can be applied. Various data transformations of compositional data have been introduced in the literature and theoretical results on the equivalence of the additive, the centered, and the isometric logratio transformation in the context of outlier identification exist. To illustrate the problem of monitoring compositional data and to demonstrate the usefulness of monitoring in this case we start with a simple example and then analyze a real life data set presenting the technologicalstructure of manufactured exports. The analysis is conducted with the R package fsdaR, which makes the analytical and graphical tools provided in the MATLAB FSDA library available for R users.


2020 ◽  
Vol 36 (9) ◽  
pp. 2943-2945 ◽  
Author(s):  
Francisco Madrid-Gambin ◽  
Sergio Oller-Moreno ◽  
Luis Fernandez ◽  
Simona Bartova ◽  
Maria Pilar Giner ◽  
...  

Abstract Summary Nuclear magnetic resonance (NMR)-based metabolomics is widely used to obtain metabolic fingerprints of biological systems. While targeted workflows require previous knowledge of metabolites, prior to statistical analysis, untargeted approaches remain a challenge. Computational tools dealing with fully untargeted NMR-based metabolomics are still scarce or not user-friendly. Therefore, we developed AlpsNMR (Automated spectraL Processing System for NMR), an R package that provides automated and efficient signal processing for untargeted NMR metabolomics. AlpsNMR includes spectra loading, metadata handling, automated outlier detection, spectra alignment and peak-picking, integration and normalization. The resulting output can be used for further statistical analysis. AlpsNMR proved effective in detecting metabolite changes in a test case. The tool allows less experienced users to easily implement this workflow from spectra to a ready-to-use dataset in their routines. Availability and implementation The AlpsNMR R package and tutorial is freely available to download from http://github.com/sipss/AlpsNMR under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 17 (2) ◽  
pp. 130-136 ◽  
Author(s):  
Misha Z. Pesenson ◽  
Santosh K. Suram ◽  
John M. Gregoire

Author(s):  
Marne C Hagemeijer ◽  
Annelotte M Vonk ◽  
Nikhil T Awatade ◽  
Iris A L Silva ◽  
Christian Tischer ◽  
...  

Abstract Motivation The forskolin-induced swelling (FIS) assay has become the preferential assay to predict the efficacy of approved and investigational CFTR-modulating drugs for individuals with cystic fibrosis (CF). Currently, no standardized quantification method of FIS data exists thereby hampering inter-laboratory reproducibility. Results We developed a complete open-source workflow for standardized high-content analysis of CFTR function measurements in intestinal organoids using raw microscopy images as input. The workflow includes tools for (i) file and metadata handling; (ii) image quantification and (iii) statistical analysis. Our workflow reproduced results generated by published proprietary analysis protocols and enables standardized CFTR function measurements in CF organoids. Availability All workflow components are open-source and freely available: the htmrenamer R package for file handling https://github.com/hmbotelho/htmrenamer; CellProfiler and ImageJ analysis scripts/pipelines https://github.com/hmbotelho/FIS_image_analysis; the Organoid Analyst application for statistical analysis https://github.com/hmbotelho/organoid_analyst; detailed usage instructions and a demonstration dataset https://github.com/hmbotelho/FIS_analysis. Distributed under GPL v3.0. Supplementary information Supplementary information and a stepwise guide for software installation and data analysis for training purposes are available at Bioinformatics online.


2021 ◽  
Vol 17 (7) ◽  
pp. e1009131
Author(s):  
Maciej Migdal ◽  
Dan Fu Ruan ◽  
William F. Forrest ◽  
Amir Horowitz ◽  
Christian Hammer

Human immunogenetic variation in the form of HLA and KIR types has been shown to be strongly associated with a multitude of immune-related phenotypes. However, association studies involving immunogenetic loci most commonly involve simple analyses of classical HLA allelic diversity, resulting in limitations regarding the interpretability and reproducibility of results. We here present MiDAS, a comprehensive R package for immunogenetic data transformation and statistical analysis. MiDAS recodes input data in the form of HLA alleles and KIR types into biologically meaningful variables, allowing HLA amino acid fine mapping, analyses of HLA evolutionary divergence as well as experimentally validated HLA-KIR interactions. Further, MiDAS enables comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS thus closes the gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to immune and disease biology. It is freely available under a MIT license.


2021 ◽  
Vol 50 (2) ◽  
pp. 38-55
Author(s):  
Carolina Navarro ◽  
Silvia Gonzalez-Morcillo ◽  
Carles Mulet-Forteza ◽  
Salvador Linares-Mustaros

This study presents a comprehensive bibliometric analysis of the paper published by John Aitchison in the Journal of the Royal Statistical Society. Series B (Methodological) in 1982. Having recently reached the milestone of 35 years since its publication, this pioneering paper was the first to illustrate the use of the methodology "Compositional Data Analysis" or "CoDA". By October 2019, this paper had received over 780 citations, making it the most widely cited and influential article among those using said methodology. The bibliometric approach used in this study encompasses a wide range of techniques, including a specific analysis of the main authors and institutions to have cited Aitchison' paper. The VOSviewer software was also used for the purpose of developing network maps for said publication. Specifically, the techniques used were co-citations and bibliographic coupling. The results clearly show the significant impact the paper has had on scientific research, having been cited by authors and institutions that publish all around the world.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1105 ◽  
Author(s):  
Paul Brennan

Protein schematics are valuable for research, teaching and knowledge communication. However, the tools used to automate the process are challenging. The purpose of the drawProteins package is to enable the generation of schematics of proteins in an automated fashion that can integrate with the Bioconductor/R suite of tools for bioinformatics and statistical analysis. Using UniProt accession numbers, the package uses the UniProt API to get the features of the protein from the UniProt database. The features are assembled into a data frame and visualized using adaptations of the ggplot2 package. Visualizations can be customised in many ways including adding additional protein features information from other data frames, altering colors and protein names and adding extra layers using other ggplot2 functions. This can be completed within a script that makes the workflow reproducible and sharable.


2019 ◽  
Vol 67 (2SUPL) ◽  
pp. S228-S248
Author(s):  
Luis-Ricardo Murillo-Hiller ◽  
Oscar-Antonio Segura-Bermúdez ◽  
Juan-Diego Barquero ◽  
Federico Bolaños

Hesperiidae is one of the most diverse families of butterflies in Costa Rica, with approximately 486 species. Even so, there are few butterfly lists where this group has been included. In this paper, we present information on seasonality, abundance and natural history features of this family for the Leonelo Oviedo Ecological Reserve (RELO), a 2 ha forest embedded in an urban matrix. Over the course of two years, a monthly sampling was carried out on a 270 m trail across the Reserve from 08:00 to 12:00, collecting all the individuals located within 5 m on each side of the trail. To better represent the richness, individuals were also randomly collected for more than ten years, but the butterflies collected in this way were not included in the statistical analysis. Photographs were taken of all the species in order to provide an identification guide. For the cryptic species, drawings and dissections of the genitalia were made. For the community indexes we used Microsoft Excel and the Shannon index with base two logarithm. For the summary of the monthly data analysis were done according to dry and wet season. For a comparison of richness and abundance we did a g-test to evaluate if there are differences between seasons; however, with the use of the R package vegan a hierarchical cluster analysis was done using the Jaccard index with Wards minimum variance agglomerative method. With R package pvclust the uncertainty of the clusters based on a bootstrap with 10 000 iterations. 423 individuals of 49 species were included in the statistical analysis, from a total of 435 individuals of 58 species. A tendency to greater richness and abundance of skippers was found during the dry season. Through the cluster analysis, it was possible to determine that in relation to the diversity of skippers, both wet seasons are grouped significantly (P = 0.05). The dry seasons are also grouped significantly (P = 0.05). The reserve has connectivity with other green areas via a stream. During the wet season, plant growth increases connectivity, which could lead to the entry of new individuals of different species that are not permanent residents of RELO and establish small populations, increasing the richness and abundance of species. This added to the variation in the occurrence of some species of butterflies in response to seasonal variations and differences in the availability of resources in different seasons explains the grouping of species between seasons.


Sign in / Sign up

Export Citation Format

Share Document