scholarly journals MIPMLP – Microbiome Preprocessing Machine Learning Pipeline

2020 ◽  
Author(s):  
Yoel Y Jasner ◽  
Anna Belogolovski ◽  
Meirav Ben-Itzhak ◽  
Omry Koren ◽  
Yoram Louzoun

Abstract16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML. We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification. We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand alone version at https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/HomeImportanceMicrobiome composition has been proposed as a biomarker (mic-marker) for multiple diseases. However, a clear analysis of the optimal way to represent the gene sequence counts is still lacking.We propose a simple and straight forward method that significantly improves the accuracy of mic-marker studies.This method can be of use to merge two of the most important advances in biology in the last decade: Microbiome analysis, and the introduction of machine learning methods to biological studies.

2021 ◽  
Vol 12 ◽  
Author(s):  
Yoel Jasner ◽  
Anna Belogolovski ◽  
Meirav Ben-Itzhak ◽  
Omry Koren ◽  
Yoram Louzoun

Background16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML.MethodsWe checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification.ResultsWe show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results.ConclusionsThe prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.


Procedia CIRP ◽  
2021 ◽  
Vol 96 ◽  
pp. 272-277
Author(s):  
Hannah Lickert ◽  
Aleksandra Wewer ◽  
Sören Dittmann ◽  
Pinar Bilge ◽  
Franz Dietrich

mSphere ◽  
2019 ◽  
Vol 4 (3) ◽  
Author(s):  
Artur Yakimovich

ABSTRACT Artur Yakimovich works in the field of computational virology and applies machine learning algorithms to study host-pathogen interactions. In this mSphere of Influence article, he reflects on two papers “Holographic Deep Learning for Rapid Optical Screening of Anthrax Spores” by Jo et al. (Y. Jo, S. Park, J. Jung, J. Yoon, et al., Sci Adv 3:e1700606, 2017, https://doi.org/10.1126/sciadv.1700606) and “Bacterial Colony Counting with Convolutional Neural Networks in Digital Microbiology Imaging” by Ferrari and colleagues (A. Ferrari, S. Lombardi, and A. Signoroni, Pattern Recognition 61:629–640, 2017, https://doi.org/10.1016/j.patcog.2016.07.016). Here he discusses how these papers made an impact on him by showcasing that artificial intelligence algorithms can be equally applicable to both classical infection biology techniques and cutting-edge label-free imaging of pathogens.


Author(s):  
Hannah Bolinger ◽  
David Tran ◽  
Kenneth Harary ◽  
George C. Paoli ◽  
Giselle Guron ◽  
...  

Traditional microbiological testing methods are slow, and many molecular-based techniques rely on culture-based enrichment to overcome low limits of detection. Recent advancements in sequencing technologies may make it possible to utilize machine learning (ML) to identify patterns in microbiome data to potentially predict the presence or absence of pathogens. In this study, 299 poultry rinsate samples from various points in the processing chain were analyzed to determine if microbiota could inform about a sample’s risk for containing Salmonella . Samples were culture confirmed as Salmonella -positive or -negative following modified USDA MLG protocols. The culture confirmation result was used as a reference to compare with 16S sequencing data. Pre-chill samples tested positive (71/82) at a higher frequency than post-chill samples (30/217) and contained greater microbial diversity. Due to their larger sample size, post-chill samples were analyzed more deeply. Analysis of variance (ANOVA) identified a significant effect of chilling on the number of genera (p<0.001), but analysis of similarities (ANOSIM) failed to provide evidence for microbial dissimilarity between pre- and post-chill samples (p=0.001, R=0.443). Various ML models were trained using post-chill samples to predict if a sample contained Salmonella based on the samples’ microbiota pre-enrichment. The optimal model was a Random Forest-based model with a performance as follows: accuracy (88%), sensitivity (85%), specificity (90%). While the algorithms described in this paper are prototypes, these risk-based algorithms demonstrate the potential and need for further studies to provide insight alongside diagnostic tests. Combining risk-based information with diagnostic tools can help poultry processors make informed decisions to help identify and prevent the spread of Salmonella . These data add to the growing body of literature exploring novel ways to utilize microbiome data for predictive food safety.


2021 ◽  
Author(s):  
Cassandra Velasco ◽  
Christopher Dunn ◽  
Cassandra Sturdy ◽  
Vladislav Izda ◽  
Jake Martin ◽  
...  

AbstractObjectiveAdult cartilage has limited repair capacity. MRL/MpJ mice, by contrast, are capable of spontaneously healing ear punctures. This study was undertaken to characterize microbiome differences between healer and nonhealer mice and to evaluate microbiome transplantation as a novel regenerative therapy.MethodsWe transplanted C57BL/6J mice with MRL/MpJ cecal contents in mice at weaning and as adults (n=57) and measured earhole closure 4 weeks after a 2.0mm punch and compared to vehicle-transplanted MRL and B6 (n=25) and B6-transplanted MRL (n=20) mice. Sex effects, timing of transplant relative to earpunch, and transgenerational heritability were evaluated. In a subset (n=58), cecal microbiomes were profiled by 16S sequencing and compared to earhole closure rates. Microbial metagenomes were imputed using PICRUSt.ResultsTransplantation of B6 mice with MRL microbiota, either in weanlings or adults, improved earhole closure rates. Transplantation prior to ear punch was associated with the greatest earhole closure. Offspring of transplanted mice healed better than controls. Several microbiome clades were correlated with healing, including Firmicutes, Lactobacillales, and Verrucomicrobia. Gram-negative organisms were reduced. Females of all groups tended to heal better than males, female microbiota resembled MRL mice.ConclusionIn this study, we found an association between the microbiome and tissue regeneration in MRL mice and demonstrate that this trait can be transferred to nonhealer mice via microbiome transplantation. We identified several microbiome clades associated with healing. Future studies should evaluate the mechanisms underlying these findings and confirm our results in murine OA.


2021 ◽  
Vol 118 (40) ◽  
pp. e2026053118
Author(s):  
Miles Cranmer ◽  
Daniel Tamayo ◽  
Hanno Rein ◽  
Peter Battaglia ◽  
Samuel Hadden ◽  
...  

We introduce a Bayesian neural network model that can accurately predict not only if, but also when a compact planetary system with three or more planets will go unstable. Our model, trained directly from short N-body time series of raw orbital elements, is more than two orders of magnitude more accurate at predicting instability times than analytical estimators, while also reducing the bias of existing machine learning algorithms by nearly a factor of three. Despite being trained on compact resonant and near-resonant three-planet configurations, the model demonstrates robust generalization to both nonresonant and higher multiplicity configurations, in the latter case outperforming models fit to that specific set of integrations. The model computes instability estimates up to 105 times faster than a numerical integrator, and unlike previous efforts provides confidence intervals on its predictions. Our inference model is publicly available in the SPOCK (https://github.com/dtamayo/spock) package, with training code open sourced (https://github.com/MilesCranmer/bnn_chaos_model).


2020 ◽  
Author(s):  
Trang T. Le ◽  
Jason H. Moore

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]


Author(s):  
Wei Hao Khoong

In this paper, we introduce deboost, a Python library devoted to weighted distance ensembling of predictions for regression and classification tasks. Its backbone resides on the scikit-learn library for default models and data preprocessing functions. It offers flexible choices of models for the ensemble as long as they contain the predict method, like the models available from scikit-learn. deboost is released under the MIT open-source license and can be downloaded from the Python Package Index (PyPI) at https://pypi.org/project/deboost. The source scripts are also available on a GitHub repository at https://github.com/weihao94/DEBoost.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0248322
Author(s):  
Cassandra Velasco ◽  
Christopher Dunn ◽  
Cassandra Sturdy ◽  
Vladislav Izda ◽  
Jake Martin ◽  
...  

Objective Adult elastic cartilage has limited repair capacity. MRL/MpJ (MRL) mice, by contrast, are capable of spontaneously healing ear punctures. This study was undertaken to characterize microbiome differences between healer and non-healer mice and to evaluate whether this healing phenotype can be transferred via gut microbiome transplantation. Methods We orally transplanted C57BL/6J (B6) mice with MRL/MpJ cecal contents at weaning and as adults (n = 57) and measured ear hole closure 4 weeks after a 2.0mm punch and compared to vehicle-transplanted MRL and B6 (n = 25) and B6-transplanted MRL (n = 20) mice. Sex effects, timing of transplant relative to earpunch, and transgenerational heritability were evaluated. In a subset (n = 58), cecal microbiomes were profiled by 16S sequencing and compared to ear hole closure. Microbial metagenomes were imputed using PICRUSt. Results Transplantation of B6 mice with MRL microbiota, either in weanlings or adults, improved ear hole closure. B6-vehicle mice healed ear hole punches poorly (0.25±0.03mm, mm ear hole healing 4 weeks after a 2mm ear hole punch [2.0mm—final ear hole size], mean±SEM), whereas MRL-vehicle mice healed well (1.4±0.1mm). MRL-transplanted B6 mice healed roughly three times as well as B6-vehicle mice, and half as well as MRL-vehicle mice (0.74±0.05mm, P = 6.9E-10 vs. B6-vehicle, P = 5.2E-12 vs. MRL-vehicle). Transplantation of MRL mice with B6 cecal material did not reduce MRL healing (B6-transplanted MRL 1.3±0.1 vs. MRL-vehicle 1.4±0.1, p = 0.36). Transplantation prior to ear punch was associated with the greatest ear hole closure. Offspring of transplanted mice healed significantly better than non-transplanted control mice (offspring:0.63±0.03mm, mean±SEM vs. B6-vehicle control:0.25±0.03mm, n = 39 offspring, P = 4.6E-11). Several microbiome clades were correlated with healing, including Firmicutes (R = 0.84, P = 8.0E-7), Lactobacillales (R = 0.65, P = 1.1E-3), and Verrucomicrobia (R = -0.80, P = 9.2E-6). Females of all groups tended to heal better than males (B6-vehicle P = 0.059, MRL-transplanted B6 P = 0.096, offspring of MRL-transplanted B6 P = 0.0038, B6-transplanted MRL P = 1.6E-6, MRL-vehicle P = 0.0031). Many clades characteristic of female mouse cecal microbiota vs. males were the same as clades characteristic of MRL and MRL-transplanted B6 mice vs. B6 controls, including including increases in Clostridia and reductions in Verrucomicrobia in female mice. Conclusion In this study, we found an association between the microbiome and tissue regeneration in MRL mice and demonstrate that this trait can be transferred to non-healer mice via microbiome transplantation. We identified several microbiome clades associated with healing.


2016 ◽  
Vol 2 ◽  
pp. e90 ◽  
Author(s):  
Ranko Gacesa ◽  
David J. Barlow ◽  
Paul F. Long

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).


Sign in / Sign up

Export Citation Format

Share Document