scholarly journals Automatic classification of constitutive and non-constitutive metabolites with gcProfileMakeR

Author(s):  
Fernando Pérez-Sanz ◽  
Victoria Ruiz-Hernández ◽  
Marta Isabel Terry ◽  
Sara Arce-Gallego ◽  
Julia Weiss ◽  
...  

AbstractData analysis in non-targeted metabolomics is extremely time consuming. Genetic factors and environmental cues affect the composition and quantity of present metabolites i.e. the constitutive and non-constitutive metabolites. We developed gcProfileMakeR, an R package that uses standard output files from GC-MS for automatic data analysis using CAS numbers. gcProfileMakeR produces three outputs: a core or constitutive metabolome, a second list of compounds with high quality matches that is non-constitutive and a third set of compounds with low quality matching to MS libraries. As a proof of concept, we defined the floral scent emission of Antirrhinum majus using wild type plants, the floral identity mutants deficiens and compacta as well as RNAi lines of AmLHY. Loss of petal identity was accompanied by appearance of aldehydes typical of green leaf volatile profiles. Decreased levels of AmLHY caused a major increase in volatile complexity, and activated the synthesis of benzyl acetate, absent in WT. Furthermore, some volatiles emitted in a gated fashion in WT such as methyl 3,5-dimethoxybezoate or linalool became constitutive. Using sixteen volatiles of the constitutive profile, all genotypes were classified by Machine Learning with 0% error. gcProfileMakeR may thus help define core and pan-metabolomes. It enhances the quality of data reported in metabolomic profiles as text outputs rely on CAS numbers. This is especially important for FAIR data implementation.One sentence summarygcProfileMakeR allows the automatic annotation of the core metabolome and non-constitutive metabolites, increasing speed and accuracy of non-targeted metabolomics.

Metabolites ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 211
Author(s):  
Fernando Perez-Sanz ◽  
Victoria Ruiz-Hernández ◽  
Marta I. Terry ◽  
Sara Arce-Gallego ◽  
Julia Weiss ◽  
...  

Metabolomes comprise constitutive and non-constitutive metabolites produced due to physiological, genetic or environmental effects. However, finding constitutive metabolites and non-constitutive metabolites in large datasets is technically challenging. We developed gcProfileMakeR, an R package using standard Excel output files from an Agilent Chemstation GC-MS for automatic data analysis using CAS numbers. gcProfileMakeR has two filters for data preprocessing removing contaminants and low-quality peaks. The first function NormalizeWithinFiles, samples assigning retention times to CAS. The second function NormalizeBetweenFiles, reaches a consensus between files where compounds in close retention times are grouped together. The third function getGroups, establishes what is considered as Constitutive Profile, Non-constitutive by Frequency i.e., not present in all samples and Non-constitutive by Quality. Results can be plotted with the plotGroup function. We used it to analyse floral scent emissions in four snapdragon genotypes. These included a wild type, Deficiens nicotianoides and compacta affecting floral identity and RNAi:AmLHY targeting a circadian clock gene. We identified differences in scent constitutive and non-constitutive profiles as well as in timing of emission. gcProfileMakeR is a very useful tool to define constitutive and non-constitutive scent profiles. It also allows to analyse genotypes and circadian datasets to identify differing metabolites.


2019 ◽  
Vol 26 (1) ◽  
pp. 244-252 ◽  
Author(s):  
Shibom Basu ◽  
Jakub W. Kaminski ◽  
Ezequiel Panepucci ◽  
Chia-Ying Huang ◽  
Rangana Warshamanage ◽  
...  

At the Swiss Light Source macromolecular crystallography (MX) beamlines the collection of serial synchrotron crystallography (SSX) diffraction data is facilitated by the recent DA+ data acquisition and analysis software developments. The SSX suite allows easy, efficient and high-throughput measurements on a large number of crystals. The fast continuous diffraction-based two-dimensional grid scan method allows initial location of microcrystals. The CY+ GUI utility enables efficient assessment of a grid scan's analysis output and subsequent collection of multiple wedges of data (so-called minisets) from automatically selected positions in a serial and automated way. The automated data processing (adp) routines adapted to the SSX data collection mode provide near real time analysis for data in both CBF and HDF5 formats. The automatic data merging (adm) is the latest extension of the DA+ data analysis software routines. It utilizes the sxdm (SSX data merging) package, which provides automatic online scaling and merging of minisets and allows identification of a minisets subset resulting in the best quality of the final merged data. The results of both adp and adm are sent to the MX MongoDB database and displayed in the web-based tracker, which provides the user with on-the-fly feedback about the experiment.


Author(s):  
Pedro M. Esperança ◽  
Dari F. Da ◽  
Ben Lambert ◽  
Roch K. Dabiré ◽  
Thomas S. Churcher

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Benjamin Ulfenborg

Abstract Background Studies on multiple modalities of omics data such as transcriptomics, genomics and proteomics are growing in popularity, since they allow us to investigate complex mechanisms across molecular layers. It is widely recognized that integrative omics analysis holds the promise to unlock novel and actionable biological insights into health and disease. Integration of multi-omics data remains challenging, however, and requires combination of several software tools and extensive technical expertise to account for the properties of heterogeneous data. Results This paper presents the miodin R package, which provides a streamlined workflow-based syntax for multi-omics data analysis. The package allows users to perform analysis of omics data either across experiments on the same samples (vertical integration), or across studies on the same variables (horizontal integration). Workflows have been designed to promote transparent data analysis and reduce the technical expertise required to perform low-level data import and processing. Conclusions The miodin package is implemented in R and is freely available for use and extension under the GPL-3 license. Package source, reference documentation and user manual are available at https://gitlab.com/algoromics/miodin.


2016 ◽  
Vol 23 (2) ◽  
pp. 109-123 ◽  
Author(s):  
Tippawan Liabsuetrakul ◽  
Tagoon Prappre ◽  
Pakamas Pairot ◽  
Nurlisa Oumudee ◽  
Monir Islam

Surveillance systems are yet to be integrated with health information systems for improving the health of pregnant mothers and their newborns, particularly in developing countries. This study aimed to develop a web-based epidemiological surveillance system for maternal and newborn health with integration of action-oriented responses and automatic data analysis with results presentations and to assess the system acceptance by nurses and doctors involved in various hospitals in southern Thailand. Freeware software and scripting languages were used. The system can be run on different platforms, and it is accessible via various electronic devices. Automatic data analysis with results presentations in the forms of graphs, tables and maps was part of the system. A multi-level security system was incorporated into the program. Most doctors and nurses involved in the study felt the system was easy to use and useful. This system can be integrated into country routine reporting system for monitoring maternal and newborn health and survival.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Jaemun Sim ◽  
Jonathan Sangyun Lee ◽  
Ohbyung Kwon

In a ubiquitous environment, high-accuracy data analysis is essential because it affects real-world decision-making. However, in the real world, user-related data from information systems are often missing due to users’ concerns about privacy or lack of obligation to provide complete data. This data incompleteness can impair the accuracy of data analysis using classification algorithms, which can degrade the value of the data. Many studies have attempted to overcome these data incompleteness issues and to improve the quality of data analysis using classification algorithms. The performance of classification algorithms may be affected by the characteristics and patterns of the missing data, such as the ratio of missing data to complete data. We perform a concrete causal analysis of differences in performance of classification algorithms based on various factors. The characteristics of missing values, datasets, and imputation methods are examined. We also propose imputation and classification algorithms appropriate to different datasets and circumstances.


2010 ◽  
Vol 22 (1) ◽  
pp. 278
Author(s):  
A. Gad ◽  
M. Hoelker ◽  
F. Rings ◽  
N. Ghanem ◽  
D. Salilew-Wondim ◽  
...  

Estrus synchronization and superovulation are the most widely used procedures in embryo transfer technology. However, changes in the oviduct and uterine environment due to these procedures and the subsequent influence on embryos have not yet been investigated. This study was con- ducted to investigate the effect of oviduct environment of only synchronized or superovulated cyclic heifers on the gene expression profile of blastocysts. Bovine Affymetrix array analysis was performed using 2 groups of blastocysts. The first group was bovine blastocysts produced after superovulation of Simmental heifers (n = 9) using 8 consecutive FSH injections over 4 days in decreasing doses (in total, 300-400 mg of FSH equivalent according to body weight) and flushed at Day 7 by nonsurgical endoscopic method. The second group was bovine blastocysts derived from synchronized Simmental heifers (n = 4) after transfer of 2-cell stage embryos from superovulated donor Simmental heifers (n = 9) by nonsurgical transvaginal endoscopy tubal transfer method. Total RNA was extracted from 3 pools of embryos from each experimental group (6 embryos per pool). A total of 6 biotin-labeled cRNA samples were hybridized on 6 bovine Affymetrix arrays. Data analysis was performed using LIMMA written on R package, which maintained the Bioconductor. Array data analysis revealed a total of 454 transcripts to be differen- tially expressed (P < 0.05, fold change >2) between the 2 groups. Of these, 429 and 25 were up- and down-regulated, respectively, in blastocysts derived from superovulated heifers compared with those derived from synchronized animals. Genes involved in response to stress (HSPA14 and HSPE1), cellular and metabolic processes (CPSF3, ATPIF1, POMP, and MDH2), translation (RPS17, EEF1B2, and EIF4E), and cell commu- nication (FN1, KRT18, and DSG2) were found to be enriched in blastocysts derived from superovulated animals. On the other hand, protein metabolic processes related genes (CLGN) were found to be enriched in blastocysts derived from the synchronized group. The KEGG analysis of the differentially expressed genes showed that the ribosome and oxidative phosphorylation pathways are the dominant pathways and genes involved in these pathways are greatly abundant in the blastocysts derived from superovulated animals. Quantitative real-time PCR has confirmed the transcript abundance of 7 out of 8 genes selected for validation. In conclusion, blastocysts cultured in synchronized animals post 2-cell stage showed significant differences in transcriptome profile compared with their counterparts that remained in superovulated heifers until Day 7. Further functional analysis of some selected candidate genes could give new insights into mechanisms regulating the ability of embryos to survive after transfer.


Talanta ◽  
2017 ◽  
Vol 169 ◽  
pp. 77-84 ◽  
Author(s):  
A. Garrido-Fernández ◽  
A. Montaño ◽  
A.H. Sánchez-Gómez ◽  
A. Cortés-Delgado ◽  
A. López-López

Sign in / Sign up

Export Citation Format

Share Document