NeuRiPP: Neural network identification of RiPP precursor peptides

Abstract Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.

Download Full-text

NeuRiPP: Neural network identification of RiPP precursor peptides

10.1101/616060 ◽

2019 ◽

Cited By ~ 1

Author(s):

Emmanuel L.C. de los Santos

Keyword(s):

Neural Network ◽

Machine Learning ◽

Network Models ◽

Gene Clusters ◽

Learning Tools ◽

Neural Network Models ◽

Data Set ◽

The Rich ◽

Tailoring Enzymes ◽

Rich Data

ABSTRACTSignificant progress has been made in the past few years on the computational identification biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. However, current machine learning tools have limitations, since they are specific to the RiPP-class they are trained for, and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network models that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP is able to identify PP sequences in significantly more putative RiPP clusters than current tools, while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that are recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.

Download Full-text

Machine learning for robo-advisors: testing for neurons specialization

Investment Management and Financial Innovations ◽

10.21511/imfi.16(4).2019.18 ◽

2019 ◽

Vol 16 (4) ◽

pp. 205-214

Author(s):

Roman Semko

Keyword(s):

Neural Network ◽

Machine Learning ◽

Black Box ◽

Learning Tools ◽

Data Set ◽

Comprehensive Review ◽

Risk Return ◽

Wealth Management ◽

Intermediate Layers ◽

Trained Neural Network

The rise of robo-advisor wealth management services, which constitute a key element of fintech revolution, unveils the question whether they can dominate human-based advice, namely how to address the client’s behavioral biases in an automated way. One approach to it would be the application of machine learning tools during client profiling. However, trained neural network is often considered as a black box, which may raise concerns from the customers and regulators in terms of model validity, transparency, and related risks. In order to address these issues and shed more light on how neurons work, especially to figure out how they perform computation at intermediate layers, this paper visualizes and estimates the neurons’ sensitivity to different input parameters. Before it, the comprehensive review of the most popular optimization algorithms is presented and based on them respective data set is generated to train convolutional neural network. It was found that selected hidden units to some extent are not only specializing in the reaction to such features as, for example, risk, return or risk-aversion level but also they are learning more complex concepts like Sharpe ratio. These findings should help to understand robo-advisor mechanics deeper, which finally will provide more room to improve and significantly innovate the automated wealth management process and make it more transparent.

Download Full-text

Toward Optimal Heparin Dosing by Comparing Multiple Machine Learning Methods: Retrospective Study (Preprint)

10.2196/preprints.17648 ◽

2019 ◽

Author(s):

Longxiang Su ◽

Chun Liu ◽

Dongkai Li ◽

Jie He ◽

Fanglan Zheng ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Intensive Care Unit ◽

Intensive Care ◽

Data Set ◽

Machine Learning Methods ◽

Heparin Treatment ◽

Neural Network Algorithm ◽

Mimic Iii ◽

Machine Learning Models

BACKGROUND Heparin is one of the most commonly used medications in intensive care units. In clinical practice, the use of a weight-based heparin dosing nomogram is standard practice for the treatment of thrombosis. Recently, machine learning techniques have dramatically improved the ability of computers to provide clinical decision support and have allowed for the possibility of computer generated, algorithm-based heparin dosing recommendations. OBJECTIVE The objective of this study was to predict the effects of heparin treatment using machine learning methods to optimize heparin dosing in intensive care units based on the predictions. Patient state predictions were based upon activated partial thromboplastin time in 3 different ranges: subtherapeutic, normal therapeutic, and supratherapeutic, respectively. METHODS Retrospective data from 2 intensive care unit research databases (Multiparameter Intelligent Monitoring in Intensive Care III, MIMIC-III; e–Intensive Care Unit Collaborative Research Database, eICU) were used for the analysis. Candidate machine learning models (random forest, support vector machine, adaptive boosting, extreme gradient boosting, and shallow neural network) were compared in 3 patient groups to evaluate the classification performance for predicting the subtherapeutic, normal therapeutic, and supratherapeutic patient states. The model results were evaluated using precision, recall, F1 score, and accuracy. RESULTS Data from the MIMIC-III database (n=2789 patients) and from the eICU database (n=575 patients) were used. In 3-class classification, the shallow neural network algorithm performed the best (F1 scores of 87.26%, 85.98%, and 87.55% for data set 1, 2, and 3, respectively). The shallow neural network algorithm achieved the highest F1 scores within the patient therapeutic state groups: subtherapeutic (data set 1: 79.35%; data set 2: 83.67%; data set 3: 83.33%), normal therapeutic (data set 1: 93.15%; data set 2: 87.76%; data set 3: 84.62%), and supratherapeutic (data set 1: 88.00%; data set 2: 86.54%; data set 3: 95.45%) therapeutic ranges, respectively. CONCLUSIONS The most appropriate model for predicting the effects of heparin treatment was found by comparing multiple machine learning models and can be used to further guide optimal heparin dosing. Using multicenter intensive care unit data, our study demonstrates the feasibility of predicting the outcomes of heparin treatment using data-driven methods, and thus, how machine learning–based models can be used to optimize and personalize heparin dosing to improve patient safety. Manual analysis and validation suggested that the model outperformed standard practice heparin treatment dosing.

Download Full-text

Machine Learning-Based Estimation of Ground Reaction Forces and Knee Joint Kinetics from Inertial Sensors While Performing a Vertical Drop Jump

Sensors ◽

10.3390/s21227709 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7709

Author(s):

Serena Cerfoglio ◽

Manuela Galli ◽

Marco Tarabini ◽

Filippo Bertozzi ◽

Chiarella Sforza ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Knee Joint ◽

Field Analysis ◽

Ground Reaction Forces ◽

Drop Jump ◽

Joint Moments ◽

Data Set ◽

Joint Kinetics ◽

Reaction Forces

Nowadays, the use of wearable inertial-based systems together with machine learning methods opens new pathways to assess athletes’ performance. In this paper, we developed a neural network-based approach for the estimation of the Ground Reaction Forces (GRFs) and the three-dimensional knee joint moments during the first landing phase of the Vertical Drop Jump. Data were simultaneously recorded from three commercial inertial units and an optoelectronic system during the execution of 112 jumps performed by 11 healthy participants. Data were processed and sorted to obtain a time-matched dataset, and a non-linear autoregressive with external input neural network was implemented in Matlab. The network was trained through a train-test split technique, and performance was evaluated in terms of Root Mean Square Error (RMSE). The network was able to estimate the time course of GRFs and joint moments with a mean RMSE of 0.02 N/kg and 0.04 N·m/kg, respectively. Despite the comparatively restricted data set and slight boundary errors, the results supported the use of the developed method to estimate joint kinetics, opening a new perspective for the development of an in-field analysis method.

Download Full-text

Evaluation of Machine-Learning Tools for Predicting Sand Production

10.2118/207193-ms ◽

2021 ◽

Author(s):

Afungchwi Ronald Ngwashi ◽

David O. Ogbe ◽

Dickson O. Udebhulu

Keyword(s):

Machine Learning ◽

Niger Delta ◽

Oil And Gas ◽

Back Propagation ◽

Oil And Gas Industry ◽

Learning Tools ◽

Sand Production ◽

Data Set ◽

Test Set ◽

Gas Industry

Abstract Data analytics has only recently picked the interest of the oil and gas industry as it has made data visualization much simpler, faster, and cost-effective. This is driven by the promising innovative techniques in developing artificial intelligence and machine-learning tools to provide sustainable solutions to ever-increasing problems of the petroleum industry activities. Sand production is one of these real issues faced by the oil and gas industry. Understanding whether a well will produce sand or not is the foundation of every completion job in sandstone formations. The Niger Delta Province is a region characterized by friable and unconsolidated sandstones, therefore it's more prone to sanding. It is economically unattractive in this region to design sand equipment for a well that will not produce sand. This paper is aimed at developing a fast and more accurate machine-learning algorithm to predict sanding in sandstone formations. A two-layered Artificial Neural Network (ANN) with back-propagation algorithm was developed using PYTHON programming language. The algorithm uses 11 geological and reservoir parameters that are associated with the onset of sanding. These parameters include depth, overburden, pore pressure, maximum and minimum horizontal stresses, well azimuth, well inclination, Poisson's ratio, Young's Modulus, friction angle, and shale content. Data typical of the Niger Delta were collected to validate the algorithm. The data was further split into a training set (70%) and a test set (30%). Statistical analyses of the data yielded correlations between the parameters and were plotted for better visualization. The accuracy of the ANN algorithm is found to depend on the number of parameters, number of epochs, and the size of the data set. For a completion engineer, the answer to the question of whether or not a well will require sand production control is binary-either a well will produce sand or it does not. Support vector machines (SVM) are known to be better suited as the machine-learning tools for binary identification. This study also presents a comparative analysis between ANN and SVM models as tools for predicting sand production. Analysis of the Niger Delta data set indicated that SVM outperformed ANN model even when the training data set is sparse. Using the 30% test set, ANN gives an accuracy, precision, recall, and F1 - Score of about 80% while the SVM performance was 100% for the four metrics. It is then concluded that machine learning tools such as ANN with back-propagation and SVM are simple, accurate, and easy-to-use tools for effectively predicting sand production.

Download Full-text

Spoken words as biomarkers: using machine learning to gain insight into communication as a predictor of anxiety

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa049 ◽

2020 ◽

Vol 27 (6) ◽

pp. 929-933

Author(s):

George Demiris ◽

Kristin L Corey Magan ◽

Debra Parker Oliver ◽

Karla T Washington ◽

Chad Chadwick ◽

...

Keyword(s):

Machine Learning ◽

Secondary Data ◽

Health Indicators ◽

Machine Learning Algorithms ◽

Standardized Assessments ◽

Learning Tools ◽

Data Set ◽

Problem Solving Therapy ◽

Audio Communication ◽

The Impact

Abstract Objective The goal of this study was to explore whether features of recorded and transcribed audio communication data extracted by machine learning algorithms can be used to train a classifier for anxiety. Materials and Methods We used a secondary data set generated by a clinical trial examining problem-solving therapy for hospice caregivers consisting of 140 transcripts of multiple, sequential conversations between an interviewer and a family caregiver along with standardized assessments of anxiety prior to each session; 98 of these transcripts (70%) served as the training set, holding the remaining 30% of the data for evaluation. Results A classifier for anxiety was developed relying on language-based features. An 86% precision, 78% recall, 81% accuracy, and 84% specificity were achieved with the use of the trained classifiers. High anxiety inflections were found among recently bereaved caregivers and were usually connected to issues related to transitioning out of the caregiving role. This analysis highlighted the impact of lowering anxiety by increasing reciprocity between interviewers and caregivers. Conclusion Verbal communication can provide a platform for machine learning tools to highlight and predict behavioral health indicators and trends.

Download Full-text

A comparison of deep machine learning and Monte Carlo methods for facies classification from seismic data

Geophysics ◽

10.1190/geo2019-0405.1 ◽

2020 ◽

Vol 85 (4) ◽

pp. WA41-WA52 ◽

Cited By ~ 3

Author(s):

Dario Grana ◽

Leonardo Azevedo ◽

Mingliang Liu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Monte Carlo ◽

Monte Carlo Methods ◽

Seismic Data ◽

Transition Probabilities ◽

Machine Learning Algorithms ◽

Data Set ◽

Facies Classification ◽

The Neural Network

Among the large variety of mathematical and computational methods for estimating reservoir properties such as facies and petrophysical variables from geophysical data, deep machine-learning algorithms have gained significant popularity for their ability to obtain accurate solutions for geophysical inverse problems in which the physical models are partially unknown. Solutions of classification and inversion problems are generally not unique, and uncertainty quantification studies are required to quantify the uncertainty in the model predictions and determine the precision of the results. Probabilistic methods, such as Monte Carlo approaches, provide a reliable approach for capturing the variability of the set of possible models that match the measured data. Here, we focused on the classification of facies from seismic data and benchmarked the performance of three different algorithms: recurrent neural network, Monte Carlo acceptance/rejection sampling, and Markov chain Monte Carlo. We tested and validated these approaches at the well locations by comparing classification predictions to the reference facies profile. The accuracy of the classification results is defined as the mismatch between the predictions and the log facies profile. Our study found that when the training data set of the neural network is large enough and the prior information about the transition probabilities of the facies in the Monte Carlo approach is not informative, machine-learning methods lead to more accurate solutions; however, the uncertainty of the solution might be underestimated. When some prior knowledge of the facies model is available, for example, from nearby wells, Monte Carlo methods provide solutions with similar accuracy to the neural network and allow a more robust quantification of the uncertainty, of the solution.

Download Full-text

Machine Learning Models of Survival Prediction in Trauma Patients

Journal of Clinical Medicine ◽

10.3390/jcm8060799 ◽

2019 ◽

Vol 8 (6) ◽

pp. 799 ◽

Cited By ~ 7

Author(s):

Cheng-Shyuan Rau ◽

Shao-Chun Wu ◽

Jung-Fang Chuang ◽

Chun-Ying Huang ◽

Hang-Tsung Liu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Predictive Performance ◽

Original Data ◽

High Accuracy ◽

Validation Dataset ◽

Survival Prediction ◽

Trauma Patients ◽

Data Set ◽

Test Dataset

Background: We aimed to build a model using machine learning for the prediction of survival in trauma patients and compared these model predictions to those predicted by the most commonly used algorithm, the Trauma and Injury Severity Score (TRISS). Methods: Enrolled hospitalized trauma patients from 2009 to 2016 were divided into a training dataset (70% of the original data set) for generation of a plausible model under supervised classification, and a test dataset (30% of the original data set) to test the performance of the model. The training and test datasets comprised 13,208 (12,871 survival and 337 mortality) and 5603 (5473 survival and 130 mortality) patients, respectively. With the provision of additional information such as pre-existing comorbidity status or laboratory data, logistic regression (LR), support vector machine (SVM), and neural network (NN) (with the Stuttgart Neural Network Simulator (RSNNS)) were used to build models of survival prediction and compared to the predictive performance of TRISS. Predictive performance was evaluated by accuracy, sensitivity, and specificity, as well as by area under the curve (AUC) measures of receiver operating characteristic curves. Results: In the validation dataset, NN and the TRISS presented the highest score (82.0%) for balanced accuracy, followed by SVM (75.2%) and LR (71.8%) models. In the test dataset, NN had the highest balanced accuracy (75.1%), followed by the TRISS (70.2%), SVM (70.6%), and LR (68.9%) models. All four models (LR, SVM, NN, and TRISS) exhibited a high accuracy of more than 97.5% and a sensitivity of more than 98.6%. However, NN exhibited the highest specificity (51.5%), followed by the TRISS (41.5%), SVM (40.8%), and LR (38.5%) models. Conclusions: These four models (LR, SVM, NN, and TRISS) exhibited a similar high accuracy and sensitivity in predicting the survival of the trauma patients. In the test dataset, the NN model had the highest balanced accuracy and predictive specificity.

Download Full-text

Multi-class Emotion AI by reconstructing linguistic context of words

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.20.11763 ◽

2018 ◽

Vol 7 (2.20) ◽

pp. 97 ◽

Cited By ~ 2

Author(s):

K Sripath Roy ◽

Farhaan Ahmed Shaik ◽

K Uday Kiran ◽

M Naga Teja ◽

Subhani Kurra

Keyword(s):

Neural Network ◽

Machine Learning ◽

Social Networking ◽

Sentiment Analysis ◽

Learning Strategies ◽

Point Of View ◽

Linguistic Context ◽

Social Networking Websites ◽

Rich Data ◽

Technological World

In today’s technological world, Social networking websites like Twitter, Instagram, Facebook, Tumblr, etc. play a very significant role. Emotion AI is about dealing, recognizing and analyzing sentiments or opinions conveyed in a person’s text. In particular Emotion is most frequently called Sentiment analysis. It helps us to understand the people’s point of view. A vast amount of sentiment rich data is produced by Social networking websites in the form of posts, tweets, statuses, blogs etc. Some users post reviews of certain products in social media which influences customers to buy the product. Companies can use such review data analyze it and improve the product. Sentiment analysis of Twitter is troublesome correlated to other social networking websites because of the existence of a lot of short words, misspellings and slang words applying emotion analysis to such data is more challenging. We have classified the sentiment into 5 categories. Machine learning strategies are preferred mostly for analyzing emotion AI. We have used neural network model word2vec with TF-IDF approach to predict the sentiment of the tweet.

Download Full-text

Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network

Applied Sciences ◽

10.3390/app10061999 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1999 ◽

Cited By ~ 7

Author(s):

Milica M. Badža ◽

Marko Č. Barjaktarović

Keyword(s):

Neural Network ◽

Machine Learning ◽

Brain Tumors ◽

Convolutional Neural Network ◽

Cross Validation ◽

Magnetic Resonance Images ◽

Generalization Capability ◽

Data Set ◽

Fold Cross Validation

The classification of brain tumors is performed by biopsy, which is not usually conducted before definitive brain surgery. The improvement of technology and machine learning can help radiologists in tumor diagnostics without invasive measures. A machine-learning algorithm that has achieved substantial results in image segmentation and classification is the convolutional neural network (CNN). We present a new CNN architecture for brain tumor classification of three tumor types. The developed network is simpler than already-existing pre-trained networks, and it was tested on T1-weighted contrast-enhanced magnetic resonance images. The performance of the network was evaluated using four approaches: combinations of two 10-fold cross-validation methods and two databases. The generalization capability of the network was tested with one of the 10-fold methods, subject-wise cross-validation, and the improvement was tested by using an augmented image database. The best result for the 10-fold cross-validation method was obtained for the record-wise cross-validation for the augmented data set, and, in that case, the accuracy was 96.56%. With good generalization capability and good execution speed, the new developed CNN architecture could be used as an effective decision-support tool for radiologists in medical diagnostics.

Download Full-text