Artificial Metabolic Networks: enabling neural computation with metabolic networks

2022 ◽  
Leon Faure ◽  
Bastien Mollet ◽  
Wolfram Liebermeister ◽  
Jean-Loup Faulon

Metabolic networks have largely been exploited as mechanistic tools to predict the behavior of microorganisms with a defined genotype in different environments. However, flux predictions by constraint-based modeling approaches are limited in quality unless labor-intensive experiments including the measurement of media intake fluxes, are performed. Using machine learning instead of an optimization of biomass flux - on which most existing constraint-based methods are based - provides ways to improve flux and growth rate predictions. In this paper, we show how Recurrent Neural Networks can surrogate constraint-based modeling and make metabolic networks suitable for backpropagation and consequently be used as an architecture for machine learning. We refer to our hybrid - mechanistic and neural network - models as Artificial Metabolic Networks (AMN). We showcase AMN and illustrate its performance with an experimental dataset of Escherichia coli growth rates in 73 different media compositions. We reach a regression coefficient of R2=0.78 on cross-validation sets. We expect AMNs to provide easier discovery of metabolic insights and prompt new biotechnological applications.

Palaios ◽  
2020 ◽  
Vol 35 (9) ◽  
pp. 391-402 ◽  

ABSTRACT Accurate taxonomic classification of microfossils in thin-sections is an important biostratigraphic procedure. As paleontological expertise is typically restricted to specific taxonomic groups and experts are not present in all institutions, geoscience researchers often suffer from lack of quick access to critical taxonomic knowledge for biostratigraphic analyses. Moreover, diminishing emphasis on education and training in systematics poses a major challenge for the future of biostratigraphy, and on associated endeavors reliant on systematics. Here we present a machine learning approach to classify and organize fusulinids—microscopic index fossils for the late Paleozoic. The technique we employ has the potential to use such important taxonomic knowledge in models that can be applied to recognize and categorize fossil specimens. Our results demonstrate that, given adequate images and training, convolutional neural network models can correctly identify fusulinids with high levels of accuracy. Continued efforts in digitization of biological and paleontological collections at numerous museums and adoption of machine learning by paleontologists can enable the development of highly accurate and easy-to-use classification tools and, thus, facilitate biostratigraphic analyses by non-experts as well as allow for cross-validation of disparate collections around the world. Automation of classification work would also enable expert paleontologists and others to focus efforts on exploration of more complex interpretations and concepts.

2021 ◽  
Vol 11 (15) ◽  
pp. 6918
Chidubem Iddianozie ◽  
Gavin McArdle

The effectiveness of a machine learning model is impacted by the data representation used. Consequently, it is crucial to investigate robust representations for efficient machine learning methods. In this paper, we explore the link between data representations and model performance for inference tasks on spatial networks. We argue that representations which explicitly encode the relations between spatial entities would improve model performance. Specifically, we consider homogeneous and heterogeneous representations of spatial networks. We recognise that the expressive nature of the heterogeneous representation may benefit spatial networks and could improve model performance on certain tasks. Thus, we carry out an empirical study using Graph Neural Network models for two inference tasks on spatial networks. Our results demonstrate that heterogeneous representations improves model performance for down-stream inference tasks on spatial networks.

Energies ◽  
2020 ◽  
Vol 13 (3) ◽  
pp. 689 ◽  
Tyler McCandless ◽  
Susan Dettling ◽  
Sue Ellen Haupt

This work compares the solar power forecasting performance of tree-based methods that include implicit regime-based models to explicit regime separation methods that utilize both unsupervised and supervised machine learning techniques. Previous studies have shown an improvement utilizing a regime-based machine learning approach in a climate with diverse cloud conditions. This study compares the machine learning approaches for solar power prediction at the Shagaya Renewable Energy Park in Kuwait, which is in an arid desert climate characterized by abundant sunshine. The regime-dependent artificial neural network models undergo a comprehensive parameter and hyperparameter tuning analysis to minimize the prediction errors on a test dataset. The final results that compare the different methods are computed on an independent validation dataset. The results show that the tree-based methods, the regression model tree approach, performs better than the explicit regime-dependent approach. These results appear to be a function of the predominantly sunny conditions that limit the ability of an unsupervised technique to separate regimes for which the relationship between the predictors and the predictand would differ for the supervised learning technique.

2019 ◽  
Vol 207 ◽  
pp. 05004 ◽  
Chiara De Sio

The KM3NeT Collaboration is building a network of underwater Cherenkov telescopes at two sites in the Mediterranean Sea, with the main goals of investigating astrophysical sources of high-energy neutrinos (ARCA) and of determining the neutrino mass hierarchy (ORCA). Various Machine Learning techniques, such as Random Forests, BDTs, Shallow and Deep Networks are being used for diverse tasks, such as event-type and particle identification, energy/direction estimation, source identification, signal/background discrimination and data analysis, with sound results as well as promising research paths. The main focus of this work is the application of Convolutional Neural Network models to the tasks of neutrino interaction classification, as well as the estimation of energy and direction of the propagating particles. The performances are also compared to those of the standard reconstruction algorithms used in the Collaboration.

2020 ◽  
pp. 1-22 ◽  
D. Sykes ◽  
A. Grivas ◽  
C. Grover ◽  
R. Tobin ◽  
C. Sudlow ◽  

Abstract Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, ( and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.

2021 ◽  
Vol 4 (2) ◽  
pp. 34-69
Dávid Burka ◽  
László Kovács ◽  
László Szepesváry

Pricing an insurance product covering motor third-party liability is a major challenge for actuaries. Comprehensive statistical modelling and modern computational power are necessary to solve this problem. The generalised linear and additive modelling approaches have been widely used by insurance companies for a long time. Modelling with modern machine learning methods has recently started, but applying them properly with relevant features is a great issue for pricing experts. This study analyses the claim-causing probability by fitting generalised linear modelling, generalised additive modelling, random forest, and neural network models. Several evaluation measures are used to compare these techniques. The best model is a mixture of the base methods. The authors’ hypothesis about the existence of significant interactions between feature variables is proved by the models. A simplified classification and visualisation is performed on the final model, which can support tariff applications later.

2019 ◽  
Emmanuel L.C. de los Santos

ABSTRACTSignificant progress has been made in the past few years on the computational identification biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. However, current machine learning tools have limitations, since they are specific to the RiPP-class they are trained for, and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network models that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP is able to identify PP sequences in significantly more putative RiPP clusters than current tools, while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that are recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.

2019 ◽  
J. Christopher D. Terry ◽  
Helen E. Roy ◽  
Tom A. August

AbstractThe accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording.Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteer-led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be cross-referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multi-input neural network models that synthesise metadata and images to identify records to species level.We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an image-only baseline of 48.2%, we observe a 9.1 percentage-point improvement in top-1 accuracy with a multi-input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data is being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilising similar pieces of evidence as human naturalists to make identifications.Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualisation offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualising images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for large scale verification of submitted records.

Makhamisa Senekane ◽  
Mhlambululi Mafu ◽  
Molibeli Benedict Taele

Weather variations play a significant role in peoples’ short-term, medium-term or long-term planning. Therefore, understanding of weather patterns has become very important in decision making. Short-term weather forecasting (nowcasting) involves the prediction of weather over a short period of time; typically few hours. Different techniques have been proposed for short-term weather forecasting. Traditional techniques used for nowcasting are highly parametric, and hence complex. Recently, there has been a shift towards the use of artificial intelligence techniques for weather nowcasting. These include the use of machine learning techniques such as artificial neural networks. In this chapter, we report the use of deep learning techniques for weather nowcasting. Deep learning techniques were tested on meteorological data. Three deep learning techniques, namely multilayer perceptron, Elman recurrent neural networks and Jordan recurrent neural networks, were used in this work. Multilayer perceptron models achieved 91 and 75% accuracies for sunshine forecasting and precipitation forecasting respectively, Elman recurrent neural network models achieved accuracies of 96 and 97% for sunshine and precipitation forecasting respectively, while Jordan recurrent neural network models achieved accuracies of 97 and 97% for sunshine and precipitation nowcasting respectively. The results obtained underline the utility of using deep learning for weather nowcasting.

Sign in / Sign up

Export Citation Format

Share Document