scholarly journals NeuRiPP: Neural network identification of RiPP precursor peptides

2019 ◽  
Author(s):  
Emmanuel L.C. de los Santos

ABSTRACTSignificant progress has been made in the past few years on the computational identification biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. However, current machine learning tools have limitations, since they are specific to the RiPP-class they are trained for, and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network models that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP is able to identify PP sequences in significantly more putative RiPP clusters than current tools, while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that are recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Emmanuel L. C. de los Santos

Abstract Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.


2017 ◽  
Author(s):  
Charlie W. Zhao ◽  
Mark J. Daley ◽  
J. Andrew Pruszynski

AbstractFirst-order tactile neurons have spatially complex receptive fields. Here we use machine learning tools to show that such complexity arises for a wide range of training sets and network architectures, and benefits network performance, especially on more difficult tasks and in the presence of noise. Our work suggests that spatially complex receptive fields are normatively good given the biological constraints of the tactile periphery.


Healthcare ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 181 ◽  
Author(s):  
Patricia Melin ◽  
Julio Cesar Monica ◽  
Daniela Sanchez ◽  
Oscar Castillo

In this paper, a multiple ensemble neural network model with fuzzy response aggregation for the COVID-19 time series is presented. Ensemble neural networks are composed of a set of modules, which are used to produce several predictions under different conditions. The modules are simple neural networks. Fuzzy logic is then used to aggregate the responses of several predictor modules, in this way, improving the final prediction by combining the outputs of the modules in an intelligent way. Fuzzy logic handles the uncertainty in the process of making a final decision about the prediction. The complete model was tested for the case of predicting the COVID-19 time series in Mexico, at the level of the states and the whole country. The simulation results of the multiple ensemble neural network models with fuzzy response integration show very good predicted values in the validation data set. In fact, the prediction errors of the multiple ensemble neural networks are significantly lower than using traditional monolithic neural networks, in this way showing the advantages of the proposed approach.


Author(s):  
A. Saravanan ◽  
J. Jerald ◽  
A. Delphin Carolina Rani

AbstractThe objective of the paper is to develop a new method to model the manufacturing cost–tolerance and to optimize the tolerance values along with its manufacturing cost. A cost–tolerance relation has a complex nonlinear correlation among them. The property of a neural network makes it possible to model the complex correlation, and the genetic algorithm (GA) is integrated with the best neural network model to optimize the tolerance values. The proposed method used three types of neural network models (multilayer perceptron, backpropagation network, and radial basis function). These network models were developed separately for prismatic and rotational parts. For the construction of network models, part size and tolerance values were used as input neurons. The reference manufacturing cost was assigned as the output neuron. The qualitative production data set was gathered in a workshop and partitioned into three files for training, testing, and validation, respectively. The architecture of the network model was identified based on the best regression coefficient and the root-mean-square-error value. The best network model was integrated into the GA, and the role of genetic operators was also studied. Finally, two case studies from the literature were demonstrated in order to validate the proposed method. A new methodology based on the neural network model enables the design and process planning engineers to propose an intelligent decision irrespective of their experience.


2020 ◽  
pp. 1-22 ◽  
Author(s):  
D. Sykes ◽  
A. Grivas ◽  
C. Grover ◽  
R. Tobin ◽  
C. Sudlow ◽  
...  

Abstract Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, (https://www.ltg.ed.ac.uk/software/edie-r/) and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.


2019 ◽  
Author(s):  
J. Christopher D. Terry ◽  
Helen E. Roy ◽  
Tom A. August

AbstractThe accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording.Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteer-led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be cross-referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multi-input neural network models that synthesise metadata and images to identify records to species level.We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an image-only baseline of 48.2%, we observe a 9.1 percentage-point improvement in top-1 accuracy with a multi-input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data is being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilising similar pieces of evidence as human naturalists to make identifications.Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualisation offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualising images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for large scale verification of submitted records.


2020 ◽  
pp. 147592172096544
Author(s):  
Aravinda S Rao ◽  
Tuan Nguyen ◽  
Marimuthu Palaniswami ◽  
Tuan Ngo

With the growing number of aging infrastructure across the world, there is a high demand for a more effective inspection method to assess its conditions. Routine assessment of structural conditions is a necessity to ensure the safety and operation of critical infrastructure. However, the current practice to detect structural damages, such as cracks, depends on human visual observation methods, which are prone to efficiency, cost, and safety concerns. In this article, we present an automated detection method, which is based on convolutional neural network models and a non-overlapping window-based approach, to detect crack/non-crack conditions of concrete structures from images. To this end, we construct a data set of crack/non-crack concrete structures, comprising 32,704 training patches, 2074 validation patches, and 6032 test patches. We evaluate the performance of our approach using 15 state-of-the-art convolutional neural network models in terms of number of parameters required to train the models, area under the curve, and inference time. Our approach provides over 95% accuracy and over 87% precision in detecting the cracks for most of the convolutional neural network models. We also show that our approach outperforms existing models in literature in terms of accuracy and inference time. The best performance in terms of area under the curve was achieved by visual geometry group-16 model (area under the curve = 0.9805) and best inference time was provided by AlexNet (0.32 s per image in size of 256 × 256 × 3). Our evaluation shows that deeper convolutional neural network models have higher detection accuracies; however, they also require more parameters and have higher inference time. We believe that this study would act as a benchmark for real-time, automated crack detection for condition assessment of infrastructure.


2020 ◽  
Vol 34 (09) ◽  
pp. 13693-13696
Author(s):  
Emma Strubell ◽  
Ananya Ganesh ◽  
Andrew McCallum

The field of artificial intelligence has experienced a dramatic methodological shift towards large neural networks trained on plentiful data. This shift has been fueled by recent advances in hardware and techniques enabling remarkable levels of computation, resulting in impressive advances in AI across many applications. However, the massive computation required to obtain these exciting results is costly both financially, due to the price of specialized hardware and electricity or cloud compute time, and to the environment, as a result of non-renewable energy used to fuel modern tensor processing hardware. In a paper published this year at ACL, we brought this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training and tuning neural network models for NLP (Strubell, Ganesh, and McCallum 2019). In this extended abstract, we briefly summarize our findings in NLP, incorporating updated estimates and broader information from recent related publications, and provide actionable recommendations to reduce costs and improve equity in the machine learning and artificial intelligence community.


2021 ◽  
Author(s):  
V.Y. Ilichev ◽  
I.V. Chukhraev

The article is devoted to the consideration of one of the areas of application of modern and promising computer technology – machine learning. This direction is based on the creation of models consisting of neural networks and their deep learning. At present, there is a need to generate new, not yet existing, images of objects of different types. Most often, text files or images act as such objects. To achieve a high quality of results, a generation method based on the adversarial work of two neural networks (generator and discriminator) was once worked out. This class of neural network models is distinguished by the complexity of topography, since it is necessary to correctly organize the structure of neural layers in order to achieve maximum accuracy and minimal error. The described program is created using the Python language and special libraries that extend the set of commands for performing additional functions: working with neural networks Keras (main library), integrating with the operating system Os, outputting graphs Matplotlib, working with data arrays Numpy and others. A description is given of the type and features of each neural layer, as well as the use of library connection functions, input of initial data, compilation and training of the obtained model. Next, the implementation of the procedure for outputting the results of evaluating the errors of the generator and discriminator and the accuracy achieved by the model depending on the number of cycles (eras) of its training is considered. Based on the results of the work, conclusions were drawn and recommendations were made for the use and development of the considered methodology for creating and training generative and adversarial neural networks. Studies have demonstrated the procedure for operating with comparatively simple and accessible, but effective means of a universal Python language with the Keras library to create and teach a complex neural network model. In fact, it has been proved that the use of this method allows to achieve high-quality results of machine learning, previously achievable only when using special software systems for working with neural networks.


Sign in / Sign up

Export Citation Format

Share Document