scholarly journals Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework

Author(s):  
Longzhu Shen ◽  
Giuseppe Amatulli ◽  
Tushar Sethi ◽  
Peter Raymond ◽  
Sami Domisch

Nitrogen (N) and Phosphorus (P) are essential nutrients for life processes in water bodies but in excessive quantities, they are a significant source of aquatic pollution. Eutrophication has now become widespread due to such an imbalance, and is largely attributed to anthropogenic activity. In view of this phenomenon, we present a new dataset and statistical method for estimating and mapping elemental and compound con- centrations of N and P at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. The model is based on a Random Forest (RF) machine learning algorithm that was fitted with environmental variables and seasonal N and P concentration observations from 230,000 stations spanning across US stream networks. Accounting for spatial and temporal variability offers improved accuracy in the analysis of N and P cycles. The algorithm has been validated with an internal and external validation procedure that is able to explain 70-83% of the variance in the model. The dataset is ready for use as input in a variety of environmental models and analyses, and the methodological framework can be applied to large-scale studies on N and P pollution, which include water quality, species distribution and water ecology research worldwide.

2019 ◽  
Author(s):  
Longzhu Shen ◽  
Giuseppe Amatulli ◽  
Tushar Sethi ◽  
Peter Raymond ◽  
Sami Domisch

Nitrogen (N) and Phosphorus (P) are essential nutrients for life processes in water bodies but in excessive quantities, they are a significant source of aquatic pollution. Eutrophication has now become widespread due to such an imbalance, and is largely attributed to anthropogenic activity. In view of this phenomenon, we present a new dataset and statistical method for estimating and mapping elemental and compound con- centrations of N and P at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. The model is based on a Random Forest (RF) machine learning algorithm that was fitted with environmental variables and seasonal N and P concentration observations from 230,000 stations spanning across US stream networks. Accounting for spatial and temporal variability offers improved accuracy in the analysis of N and P cycles. The algorithm has been validated with an internal and external validation procedure that is able to explain 70-83% of the variance in the model. The dataset is ready for use as input in a variety of environmental models and analyses, and the methodological framework can be applied to large-scale studies on N and P pollution, which include water quality, species distribution and water ecology research worldwide.


2020 ◽  
Vol 4 (Supplement_2) ◽  
pp. 1559-1559
Author(s):  
Wanglong Gou ◽  
Chu-Wen Ling ◽  
Yan He ◽  
Zengliang Jiang ◽  
Yuanqing Fu ◽  
...  

Abstract Objectives The gut microbiome-type 2 diabetes (T2D) relationship among human cohorts have been controversial. We hypothesized that this limitation could be addressed by integrating the cutting-edge interpretable machine learning framework and large-scale human cohort studies. Methods 3 independent cohorts with >9000 participants were included in this study. We proposed a new machine learning-based analytic framework — using LightGBM to infer the relationship between incorporated features and T2D, and SHapley Additive explanation(SHAP) to identified microbiome features associated with the risk of T2D. We then generated a microbiome risk score (MRS) integrating the threshold and direction of the identified microbiome features to predict T2D risk. Results We finally identified 15 microbiome features (two of them are indicators of microbial diversity, others are taxa-related features) associated with the risk of T2D. The identified T2D-related gut microbiome features showed superior T2D prediction accuracy compared to host genetics or traditional risk factors. Furthermore, we found that the MRS (per unit change in MRS) consistently showed positive association with T2D risk in the discovery cohort (RR 1.28, 95%CI 1.23-1.33), external validation cohort 1 (RR 1.23, 95%CI 1.13-1.34) and external validation cohort 2 (GGMP, RR 1.12, 95%CI 1.06-1.18). The MRS could also predict future glucose increment. We subsequently identified dietary and lifestyle factors which could prospectively modulate the microbiome features, and found that body fat distribution may be the key factor modulating the gut microbiome-T2D relationship. Conclusions Taken together, we proposed a new analytical framework for the investigation of microbiome-disease relationship. The identified microbiome features may serve as potential drug targets for T2D in future. Funding Sources This study was funded by National Natural Science Foundation of China (81903316, 81773416), Westlake University (101396021801) and the 5010 Program for Clinical Researches (2007032) of the Sun Yat-sen University (Guangzhou, China).


2020 ◽  
Author(s):  
Ahmed Alaa ◽  
Deepti Gurdasani ◽  
Adrian Harris ◽  
Jem Rashbass ◽  
Mihaela van der Schaar

Abstract Accurate prediction of the individualized survival benefit of adjuvant therapy is key to making informed therapeutic decisions for patients with early invasive breast cancer. Here, we use a state-of the-art automated and interpretable machine learning algorithm to develop a breast cancer prognostication and treatment benefit prediction model — Adjutorium — using data from large-scale cohorts of nearly 1 million women captured in the national cancer registries of the United Kingdom and the United States. We trained and internally validated the Adjutorium model on 395,862 patients from the UK National Cancer Registration and Analysis Service (NCRAS); we then externally validated the model among 571,635 patients from the US Surveillance, Epidemiology, and End Results (SEER) Program. Adjutorium exhibited significantly improved accuracy compared to the major prognostic tool in current clinical use (PREDICT v2.1) in both internal and external validation (AUC-ROC for 5-year survival prediction in NCRAS was 0.835, 95% CI: 0.833–0.837 and 0.755, 95% CI: 0.753–0.757 for Adjutorium and PREDICT v2.1. In SEER, the AUC-ROC performance was 0.815, 95% CI: 0.813–0.817 and 0.775, 95% CI: 0.772–0.778 for Adjutorium and PREDICT v2.1, respectively). Importantly, our model substantially improved accuracy in specific subgroups known to be under-served by existing models. Adjutorium is currently implemented as a web-based decision support tool (vanderschaar-lab.com/adjutorium/) to aid decisions on adjuvant therapy in women with early breast cancer, and can be publicly accessed by patients and clinicians worldwide.


2020 ◽  
Vol 142 (8) ◽  
pp. 3814-3822 ◽  
Author(s):  
George S. Fanourgakis ◽  
Konstantinos Gkagkas ◽  
Emmanuel Tylianakis ◽  
George E. Froudakis

2018 ◽  
Vol 07 (04) ◽  
pp. 164-173 ◽  
Author(s):  
Ian Campbell ◽  
Samantha Stover ◽  
Andres Hernandez-Garcia ◽  
Shalini Jhangiani ◽  
Jaya Punetha ◽  
...  

AbstractWolf–Hirschhorn syndrome (WHS) is caused by partial deletion of the short arm of chromosome 4 and is characterized by dysmorphic facies, congenital heart defects, intellectual/developmental disability, and increased risk for congenital diaphragmatic hernia (CDH). In this report, we describe a stillborn girl with WHS and a large CDH. A literature review revealed 15 cases of WHS with CDH, which overlap a 2.3-Mb CDH critical region. We applied a machine-learning algorithm that integrates large-scale genomic knowledge to genes within the 4p16.3 CDH critical region and identified FGFRL1, CTBP1, NSD2, FGFR3, CPLX1, MAEA, CTBP1-AS2, and ZNF141 as genes whose haploinsufficiency may contribute to the development of CDH.


2019 ◽  
Author(s):  
Dimitrios Vitsios ◽  
Slavé Petrovski

AbstractAccess to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses that result in candidate lists of genes. Often these analyses highlight several gene signals that might contribute to pathogenesis but are insufficiently powered to reach experiment-wide significance. This often triggers a process of laborious evaluation of highly-ranked genes through manual inspection of various public knowledge resources to triage those considered sufficiently interesting for deeper investigation. Here, we introduce a novel multi-dimensional, multi-step machine learning framework to objectively and more holistically assess biological relevance of genes to disease studies, by relying on a plethora of gene-associated annotations. We developed mantis-ml to serve as an automated machine learning (AutoML) framework, following a stochastic semi-supervised learning approach to rank known and novel disease-associated genes through iterative training and prediction sessions of random balanced datasets across the protein-coding exome (n=18,626 genes). We applied this framework on a range of disease-specific areas and as a generic disease likelihood estimator, achieving an average Area Under Curve (AUC) prediction performance of 0.85. Critically, to demonstrate applied utility on exome-wide association studies, we overlapped mantis-ml disease-specific predictions with data from published cohort-level association studies. We retrieved statistically significant enrichment of high mantis-ml predictions among the top-ranked genes from hypothesis-free cohort-level statistics (p<0.05), suggesting the capture of true prioritisation signals. We believe that mantis-ml is a novel easy-to-use tool to support objectively triaging gene discovery and overall enhancing our understanding of complex genotype-phenotype associations.


2017 ◽  
Vol 59 ◽  
pp. 495-541 ◽  
Author(s):  
Ramya Ramakrishnan ◽  
Chongjie Zhang ◽  
Julie Shah

In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks that require coordination. The joint strategies are learned through "perturbation training," a human team-training strategy that requires team members to practice variations of a given task to help their team generalize to new variants of that task. We formally define the problem of human-robot perturbation training and develop and evaluate the first end-to-end framework for such training, which incorporates a multi-agent transfer learning algorithm, human-robot co-learning framework and communication protocol. Our transfer learning algorithm, Adaptive Perturbation Training (AdaPT), is a hybrid of transfer and reinforcement learning techniques that learns quickly and robustly for new task variants. We empirically validate the benefits of AdaPT through comparison to other hybrid reinforcement and transfer learning techniques aimed at transferring knowledge from multiple source tasks to a single target task. We also demonstrate that AdaPT's rapid learning supports live interaction between a person and a robot, during which the human-robot team trains to achieve a high level of performance for new task variants. We augment AdaPT with a co-learning framework and a computational bi-directional communication protocol so that the robot can co-train with a person during live interaction. Results from large-scale human subject experiments (n=48) indicate that AdaPT enables an agent to learn in a manner compatible with a human's own learning process, and that a robot undergoing perturbation training with a human results in a high level of team performance. Finally, we demonstrate that human-robot training using AdaPT in a simulation environment produces effective performance for a team incorporating an embodied robot partner.


Author(s):  
R. Kyle Martin ◽  
Solvejg Wastvedt ◽  
Ayoosh Pareek ◽  
Andreas Persson ◽  
Håvard Visnes ◽  
...  

Abstract Purpose External validation of machine learning predictive models is achieved through evaluation of model performance on different groups of patients than were used for algorithm development. This important step is uncommonly performed, inhibiting clinical translation of newly developed models. Machine learning analysis of the Norwegian Knee Ligament Register (NKLR) recently led to the development of a tool capable of estimating the risk of anterior cruciate ligament (ACL) revision (https://swastvedt.shinyapps.io/calculator_rev/). The purpose of this study was to determine the external validity of the NKLR model by assessing algorithm performance when applied to patients from the Danish Knee Ligament Registry (DKLR). Methods The primary outcome measure of the NKLR model was probability of revision ACL reconstruction within 1, 2, and/or 5 years. For external validation, all DKLR patients with complete data for the five variables required for NKLR prediction were included. The five variables included graft choice, femur fixation device, KOOS QOL score at surgery, years from injury to surgery, and age at surgery. Predicted revision probabilities were calculated for all DKLR patients. The model performance was assessed using the same metrics as the NKLR study: concordance and calibration. Results In total, 10,922 DKLR patients were included for analysis. Average follow-up time or time-to-revision was 8.4 (± 4.3) years and overall revision rate was 6.9%. Surgical technique trends (i.e., graft choice and fixation devices) and injury characteristics (i.e., concomitant meniscus and cartilage pathology) were dissimilar between registries. The model produced similar concordance when applied to the DKLR population compared to the original NKLR test data (DKLR: 0.68; NKLR: 0.68–0.69). Calibration was poorer for the DKLR population at one and five years post primary surgery but similar to the NKLR at two years. Conclusion The NKLR machine learning algorithm demonstrated similar performance when applied to patients from the DKLR, suggesting that it is valid for application outside of the initial patient population. This represents the first machine learning model for predicting revision ACL reconstruction that has been externally validated. Clinicians can use this in-clinic calculator to estimate revision risk at a patient specific level when discussing outcome expectations pre-operatively. While encouraging, it should be noted that the performance of the model on patients undergoing ACL reconstruction outside of Scandinavia remains unknown. Level of evidence III.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Raphaël Pestourie ◽  
Youssef Mroueh ◽  
Thanh V. Nguyen ◽  
Payel Das ◽  
Steven G. Johnson

Abstract Surrogate models for partial differential equations are widely used in the design of metamaterials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than the optical wavelength. We present an active-learning algorithm that reduces the number of simulations required by more than an order of magnitude for an NN surrogate model of optical-surface components compared to uniform random samples. Results show that the surrogate evaluation is over two orders of magnitude faster than a direct solve, and we demonstrate how this can be exploited to accelerate large-scale engineering optimization.


Sign in / Sign up

Export Citation Format

Share Document