Enki

Data-driven Learning-enabled Systems are limited by the quality of available training data, particularly when trained offline. For systems that must operate in real-world environments, the space of possible conditions that can occur is vast and difficult to comprehensively predict at design time. Environmental uncertainty arises when run-time conditions diverge from design-time training conditions. To address this problem, automated methods can generate synthetic data to fill in gaps for training and test data coverage. We propose an evolution-based technique to assist developers with uncovering limitations in existing data when previously unseen environmental phenomena are introduced. This technique explores unique contexts for a given environmental condition, with an emphasis on diversity. Synthetic data generated by this technique may be used for two purposes: (1) to assess the robustness of a system to uncertain environmental factors and (2) to improve the system’s robustness. This technique is demonstrated to outperform random and greedy methods for multiple adverse environmental conditions applied to image-processing Deep Neural Networks.

Download Full-text

A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Water ◽

10.3390/w13010107 ◽

2021 ◽

Vol 13 (1) ◽

pp. 107

Author(s):

Elahe Jamalinia ◽

Faraz S. Tehrani ◽

Susan C. Steele-Dunne ◽

Philip J. Vardon

Keyword(s):

Numerical Simulation ◽

Water Flux ◽

Temporal Stability ◽

Synthetic Data ◽

Climatic Conditions ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Surface Cracking ◽

Real Time Analysis

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.

Download Full-text

Physics-Driven Regularization of Deep Neural Networks for Enhanced Engineering Design and Analysis

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4044507 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Mohammad Amin Nabian ◽

Hadi Meidani

Keyword(s):

Neural Networks ◽

Engineering Design ◽

Physical System ◽

Deep Neural Networks ◽

Complete Information ◽

Training Data ◽

Data Driven ◽

Training Approach ◽

Domain Expertise ◽

Generalization Errors

Abstract In this paper, we introduce a physics-driven regularization method for training of deep neural networks (DNNs) for use in engineering design and analysis problems. In particular, we focus on the prediction of a physical system, for which in addition to training data, partial or complete information on a set of governing laws is also available. These laws often appear in the form of differential equations, derived from first principles, empirically validated laws, or domain expertise, and are usually neglected in a data-driven prediction of engineering systems. We propose a training approach that utilizes the known governing laws and regularizes data-driven DNN models by penalizing divergence from those laws. The first two numerical examples are synthetic examples, where we show that in constructing a DNN model that best fits the measurements from a physical system, the use of our proposed regularization results in DNNs that are more interpretable with smaller generalization errors, compared with other common regularization methods. The last two examples concern metamodeling for a random Burgers’ system and for aerodynamic analysis of passenger vehicles, where we demonstrate that the proposed regularization provides superior generalization accuracy compared with other common alternatives.

Download Full-text

WHAT IDENTIFIES A WHALE BY ITS FLUKE? ON THE BENEFIT OF INTERPRETABLE MACHINE LEARNING FOR WHALE IDENTIFICATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-1005-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 1005-1012

Author(s):

J. Kierdorf ◽

J. Garcke ◽

J. Behley ◽

T. Cheeseman ◽

R. Roscher

Keyword(s):

Machine Learning ◽

Deep Neural Networks ◽

Population Monitoring ◽

Humpback Whale ◽

Spectral Cluster ◽

Data Driven ◽

Special Focus ◽

Machine Learning Methods ◽

Sensitivity Maps

Abstract. Interpretable and explainable machine learning have proven to be promising approaches to verify the quality of a data-driven model in general as well as to obtain more information about the quality of certain observations in practise. In this paper, we use these approaches for an application in the marine sciences to support the monitoring of whales. Whale population monitoring is an important element of whale conservation, where the identification of whales plays an important role in this process, for example to trace the migration of whales over time and space. Classical approaches use photographs and a manual mapping with special focus on the shape of the whale flukes and their unique pigmentation. However, this is not feasible for comprehensive monitoring. Machine learning methods, especially deep neural networks, have shown that they can efficiently solve the automatic observation of a large number of whales. Despite their success for many different tasks such as identification, further potentials such as interpretability and their benefits have not yet been exploited. Our main contribution is an analysis of interpretation tools, especially occlusion sensitivity maps, and the question of how the gained insights can help a whale researcher. For our analysis, we use images of humpback whale flukes provided by the Kaggle Challenge ”Humpback Whale Identification”. By means of spectral cluster analysis of heatmaps, which indicate which parts of the image are important for a decision, we can show that the they can be grouped in a meaningful way. Moreover, it appears that characteristics automatically determined by a neural network correspond to those that are considered important by a whale expert.

Download Full-text

AI Radar Sensor: Creating Radar Depth Sounder Images Based on Generative Adversarial Network

Sensors ◽

10.3390/s19245479 ◽

2019 ◽

Vol 19 (24) ◽

pp. 5479 ◽

Cited By ~ 1

Author(s):

Maryam Rahnemoonfar ◽

Jimmy Johnson ◽

John Paden

Keyword(s):

Data Augmentation ◽

Synthetic Data ◽

Detection Algorithm ◽

Contour Detection ◽

Training Data ◽

Generative Adversarial Network ◽

Radar Images ◽

Adversarial Network ◽

Radar Imagery

Significant resources have been spent in collecting and storing large and heterogeneous radar datasets during expensive Arctic and Antarctic fieldwork. The vast majority of data available is unlabeled, and the labeling process is both time-consuming and expensive. One possible alternative to the labeling process is the use of synthetically generated data with artificial intelligence. Instead of labeling real images, we can generate synthetic data based on arbitrary labels. In this way, training data can be quickly augmented with additional images. In this research, we evaluated the performance of synthetically generated radar images based on modified cycle-consistent adversarial networks. We conducted several experiments to test the quality of the generated radar imagery. We also tested the quality of a state-of-the-art contour detection algorithm on synthetic data and different combinations of real and synthetic data. Our experiments show that synthetic radar images generated by generative adversarial network (GAN) can be used in combination with real images for data augmentation and training of deep neural networks. However, the synthetic images generated by GANs cannot be used solely for training a neural network (training on synthetic and testing on real) as they cannot simulate all of the radar characteristics such as noise or Doppler effects. To the best of our knowledge, this is the first work in creating radar sounder imagery based on generative adversarial network.

Download Full-text

Robust Automated Assessment of Human Blastocyst Quality using Deep Learning

10.1101/394882 ◽

2018 ◽

Cited By ~ 4

Author(s):

Pegah Khosravi ◽

Ehsan Kazemi ◽

Qiansheng Zhan ◽

Marco Toschi ◽

Jonas E. Malmsten ◽

...

Keyword(s):

Deep Neural Networks ◽

Embryo Quality ◽

Embryo Implantation ◽

Data Driven ◽

Vitro Fertilization ◽

Human Blastocyst ◽

Blastocyst Quality ◽

Morphological Assessment

AbstractMorphology assessment has become the standard method for evaluation of embryo quality and selecting human blastocysts for transfer inin vitro fertilization(IVF). This process is highly subjective for some embryos and thus prone to human bias. As a result, morphological assessment results may vary extensively between embryologists and in some cases may fail to accurately predict embryo implantation and live birth potential. Here we postulated that an artificial intelligence (AI) approach trained on thousands of embryos can reliably predict embryo quality without human intervention.To test this hypothesis, we implemented an AI approach based on deep neural networks (DNNs). Our approach called STORK accurately predicts the morphological quality of blastocysts based on raw digital images of embryos with 98% accuracy. These results indicate that a DNN can automatically and accurately grade embryos based on raw images. Using clinical data for 2,182 embryos, we then created a decision tree that integrates clinical parameters such as embryo quality and patient age to identify scenarios associated with increased or decreased pregnancy chance. This IVF data-driven analysis shows that the chance of pregnancy varies from 13.8% to 66.3%.In conclusion, our AI-driven approach provides a novel way to assess embryo quality and uncovers new, potentially personalized strategies to select embryos with an improved likelihood of pregnancy outcome.

Download Full-text

Role of General Adversarial Networks in Mammogram Analysis: A Review

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666191115102318 ◽

2020 ◽

Vol 16 (7) ◽

pp. 863-877

Author(s):

Annapoorani Gopal ◽

Lathaselvi Gandhimaruthian ◽

Javid Ali

Keyword(s):

Breast Tumor ◽

Deep Neural Networks ◽

Training Data ◽

Learning Technology ◽

Breast Cancers ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Tumor Extraction

The Deep Neural Networks have gained prominence in the biomedical domain, becoming the most commonly used networks after machine learning technology. Mammograms can be used to detect breast cancers with high precision with the help of Convolutional Neural Network (CNN) which is deep learning technology. An exhaustive labeled data is required to train the CNN from scratch. This can be overcome by deploying Generative Adversarial Network (GAN) which comparatively needs lesser training data during a mammogram screening. In the proposed study, the application of GANs in estimating breast density, high-resolution mammogram synthesis for clustered microcalcification analysis, effective segmentation of breast tumor, analysis of the shape of breast tumor, extraction of features and augmentation of the image during mammogram classification have been extensively reviewed.

Download Full-text

Simple Index to Assess the Calibration Quality of Safety Performance Functions Based on Multiple Goodness-of-Fit Metrics

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211008896 ◽

2021 ◽

pp. 036119812110088

Author(s):

Raul E. Avelar ◽

Karen Dixon ◽

Boniphace Kutela ◽

Sam Klump ◽

Beth Wemple ◽

...

Keyword(s):

Goodness Of Fit ◽

Synthetic Data ◽

Calibration Procedure ◽

Safety Performance ◽

Absolute Deviation ◽

Data Set ◽

Safety Database ◽

Simple Index ◽

Safety Performance Functions

The calibration of safety performance functions (SPFs) is a mechanism included in the Highway Safety Manual (HSM) to adjust SPFs in the HSM for use in intended jurisdictions. Critically, the quality of the calibration procedure must be assessed before using the calibrated SPFs. Multiple resources to aid practitioners in calibrating SPFs have been developed in the years following the publication of the HSM 1st edition. Similarly, the literature suggests multiple ways to assess the goodness-of-fit (GOF) of a calibrated SPF to a data set from a given jurisdiction. This paper uses the calibration results of multiple intersection SPFs to a large Mississippi safety database to examine the relations between multiple GOF metrics. The goal is to develop a sensible single index that leverages the joint information from multiple GOF metrics to assess overall quality of calibration. A factor analysis applied to the calibration results revealed three underlying factors explaining 76% of the variability in the data. From these results, the authors developed an index and performed a sensitivity analysis. The key metrics were found to be, in descending order: the deviation of the cumulative residual (CURE) plot from the 95% confidence area, the mean absolute deviation, the modified R-squared, and the value of the calibration factor. This paper also presents comparisons between the index and alternative scoring strategies, as well as an effort to verify the results using synthetic data. The developed index is recommended to comprehensively assess the quality of the calibrated intersection SPFs.

Download Full-text

CATH functional families predict functional sites in proteins

Bioinformatics ◽

10.1093/bioinformatics/btaa937 ◽

2020 ◽

Author(s):

Sayoni Das ◽

Harry M Scholes ◽

Neeladri Sen ◽

Christine Orengo

Keyword(s):

Functional Characterization ◽

Functional Site ◽

Training Data ◽

Supplementary Information ◽

Conserved Residues ◽

Functional Sites ◽

Protein Protein Interaction ◽

Evolutionary Features ◽

Functional Families

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Genome Biology ◽

10.1186/s13059-021-02368-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ruizhu Huang ◽

Charlotte Soneson ◽

Pierre-Luc Germain ◽

Thomas S.B. Schmidt ◽

Christian Von Mering ◽

...

Keyword(s):

Single Cell ◽

Synthetic Data ◽

Cell Types ◽

Data Driven ◽

Rna Seq ◽

Hierarchical Trees

AbstracttreeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Download Full-text

Data-driven deep density estimation

Neural Computing and Applications ◽

10.1007/s00521-021-06281-3 ◽

2021 ◽

Author(s):

Patrik Puchert ◽

Pedro Hermosilla ◽

Tobias Ritschel ◽

Timo Ropinski

Keyword(s):

Data Analysis ◽

Density Estimation ◽

Population Data ◽

Training Data ◽

Data Driven ◽

Discrete Observations ◽

Efficient Manner ◽

Continuous Models ◽

3D Scans ◽

Spatial Locations

AbstractDensity estimation plays a crucial role in many data analysis tasks, as it infers a continuous probability density function (PDF) from discrete samples. Thus, it is used in tasks as diverse as analyzing population data, spatial locations in 2D sensor readings, or reconstructing scenes from 3D scans. In this paper, we introduce a learned, data-driven deep density estimation (DDE) to infer PDFs in an accurate and efficient manner, while being independent of domain dimensionality or sample size. Furthermore, we do not require access to the original PDF during estimation, neither in parametric form, nor as priors, or in the form of many samples. This is enabled by training an unstructured convolutional neural network on an infinite stream of synthetic PDFs, as unbound amounts of synthetic training data generalize better across a deck of natural PDFs than any natural finite training data will do. Thus, we hope that our publicly available DDE method will be beneficial in many areas of data analysis, where continuous models are to be estimated from discrete observations.

Download Full-text