A High Accurate Machine Learning Meta-Strategy for the Prediction of Intrinsically Disorder Proteins

Abstract Background: Many proteins or partial regions of proteins do not have stable and well-defined three-dimensional structures in vitro. Understanding Intrinsically Disorder Proteins (IDPs) is significant for interpreting biological function as well as studying many diseases. Although more than 70 disorder predictors have been invented, many existing predictors are limited on the characteristics of proteins and do not have very high accuracy. Therefore, it is critical to formulate new strategies on disorder protein prediction. Results: Here, we propose a machine learning meta-strategy to improve the accuracy of disordered proteins and disordered regions prediction. We first use logistic forward parameter selection to select eight most significant predictors from the current available IDP predictors. Then we design a novel meta-strategy using several machine learning models, including Decision-tree based algorithm, Naive Bayes, Random forest, and Convolutional Neural Network (CNN). By applying different strategies, the results suggest Random forest can improve the predicted single amino acid accuracy significantly to 93.35%. Using the combination vector data of eight most significant predictors as input, the Convolution Neural Network can improve the whole protein prediction to 95.62%. Conclusion: According to the performance of our machine learning meta-strategy, the Random forest and CNN models can improve the accuracy to predict IDPs.

Download Full-text

A High Accurate Machine Learning Meta-Strategy for the Prediction of Intrinsically Disorder Proteins

10.1101/2020.05.18.103200 ◽

2020 ◽

Author(s):

Chengbin Hu ◽

Yiru Qin ◽

Chuan Ye ◽

Jiao jin ◽

Ting Zhou ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Three Dimensional ◽

Disordered Proteins ◽

Single Amino Acid ◽

Vector Data ◽

Protein Prediction ◽

Intrinsically Disorder

Background: Many proteins or partial regions of proteins do not have stable and well-defined three-dimensional structures in vitro. Understanding intrinsically disorder proteins (IDPs) is significant for interpreting biological function as well as studying many diseases. Although more than 70 disorder predictors have been invented, many existing predictors are limited on the characteristics of proteins and do not have very high accuracy. Therefore, it is critical to formulate new strategies on disorder protein prediction. Results: Here, we propose a machine learning meta-strategy to improve the accuracy of disordered proteins and disordered regions prediction. We first use logistic forward parameter selection to select eight most significant predictors from the current available IDP predictors. Then we design a novel meta-strategy using several machine learning models, including Decision-tree based algorithm, Naive Bayes, Random forest, and Convolutional Neural Network (CNN). By applying different strategies, the results suggest Random forest can improve the predicted single amino acid accuracy significantly to 93.35%. Using the combination vector data of eight most significant predictors as input, the Convolution Neural Network can improve the whole protein prediction to 95.62%. Conclusion: According to the performance of our machine learning meta-strategy, the Random forest and CNN models can improve the accuracy to predict intrinsically disorder proteins.

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Comparative Analysis on Machine Learning and Deep Learning to Predict Post-Induction Hypotension

Sensors ◽

10.3390/s20164575 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4575 ◽

Cited By ~ 1

Author(s):

Jihyun Lee ◽

Jiyoung Woo ◽

Ah Reum Kang ◽

Young-Seob Jeong ◽

Woohyun Jung ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Feature Selection ◽

Deep Learning ◽

Random Forest ◽

Tracheal Intubation ◽

Feature Engineering ◽

Learning Models ◽

Raw Data ◽

Vital Records

Hypotensive events in the initial stage of anesthesia can cause serious complications in the patients after surgery, which could be fatal. In this study, we intended to predict hypotension after tracheal intubation using machine learning and deep learning techniques after intubation one minute in advance. Meta learning models, such as random forest, extreme gradient boosting (Xgboost), and deep learning models, especially the convolutional neural network (CNN) model and the deep neural network (DNN), were trained to predict hypotension occurring between tracheal intubation and incision, using data from four minutes to one minute before tracheal intubation. Vital records and electronic health records (EHR) for 282 of 319 patients who underwent laparoscopic cholecystectomy from October 2018 to July 2019 were collected. Among the 282 patients, 151 developed post-induction hypotension. Our experiments had two scenarios: using raw vital records and feature engineering on vital records. The experiments on raw data showed that CNN had the best accuracy of 72.63%, followed by random forest (70.32%) and Xgboost (64.6%). The experiments on feature engineering showed that random forest combined with feature selection had the best accuracy of 74.89%, while CNN had a lower accuracy of 68.95% than that of the experiment on raw data. Our study is an extension of previous studies to detect hypotension before intubation with a one-minute advance. To improve accuracy, we built a model using state-of-art algorithms. We found that CNN had a good performance, but that random forest had a better performance when combined with feature selection. In addition, we found that the examination period (data period) is also important.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Utilização de técnicas de Machine Learning e de Deep Learning para a predição de casos de internações causadas por dengue em municípios da Paraíba

10.5753/ercemapi.2021.17914 ◽

2021 ◽

Author(s):

Ewerthon Dyego de Araújo Batista ◽

Wellington Candeia de Araújo ◽

Romeryto Vieira Lira ◽

Laryssa Izabel de Araújo Batista

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Support Vector

Dengue é um problema de saúde pública no Brasil, os casos da doença voltaram a crescer na Paraíba. O boletim epidemiológico da Paraíba, divulgado em agosto de 2021, informa um aumento de 53% de casos em relação ao ano anterior. Técnicas de Machine Learning (ML) e de Deep Learning estão sendo utilizadas como ferramentas para a predição da doença e suporte ao seu combate. Por meio das técnicas Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Long ShortTerm Memory (LSTM) e Convolutional Neural Network (CNN), este artigo apresenta um sistema capaz de realizar previsões de internações causadas por dengue para as cidades Bayeux, Cabedelo, João Pessoa e Santa Rita. O sistema conseguiu realizar previsões para Bayeux com taxa de erro 0,5290, já em Cabedelo o erro foi 0,92742, João Pessoa 9,55288 e Santa Rita 0,74551.

Download Full-text

Performance Comparison of Oil Spill and Ship Classification from X-Band Dual- and Single-Polarized SAR Image Using Support Vector Machine, Random Forest, and Deep Neural Network

Remote Sensing ◽

10.3390/rs13163203 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3203

Author(s):

Won-Kyung Baek ◽

Hyung-Sup Jung

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Performance Improvement ◽

Oil Spill ◽

Deep Neural Network ◽

Support Vector ◽

Sar Image ◽

X Band

It is well known that the polarization characteristics in X-band synthetic aperture radar (SAR) image analysis can provide us with additional information for marine target classification and detection. Normally, dual-and single-polarized SAR images are acquired by SAR satellites, and then we must determine how accurate the marine mapping performance from dual-polarized (pol) images is versus the marine mapping performance from the single-pol images in a given machine learning model. The purpose of this study is to compare the performance of single- and dual-pol SAR image classification achieved by the support vector machine (SVM), random forest (RF), and deep neural network (DNN) models. The test image is a TerraSAR-X dual-pol image acquired from the 2007 Kerch Strait oil spill event. For this, 824,026 pixels and 1,648,051 pixels were extracted from the image for the training and test, respectively, and sea, ship, oil, and land objects were classified from the image by using the three machine learning methods. The mean f1-scores of the SVM, RF, and DNN models resulting from the single-pol image were approximately 0.822, 0.882, and 0.889, respectively, and those from the dual-pol image were about 0.852, 0.908, and 0.898, respectively. The performance improvement achieved by dual-pol was about 3.6%, 2.9%, and 1% in SVM, RF, and DNN, respectively. The DNN model had the best performance (0.889) in the single-pol test while the RF model was best (0.908) in the dual-pol test. The performance improvement was approximately 2.1% and not noticeable. If the condition that dual-pol images have two-times lower spatial resolution versus single-pol images in the azimuth direction is considered, a small improvement may not be valuable. Therefore, the results show that the performance improvement by X-band dual-pol image may be not remarkable when classifying the sea, ships, oil spills, and sea and land surfaces.

Download Full-text

Machine learning model for predicting the optimal depth of tracheal tube insertion in pediatric patients: A retrospective cohort study

PLoS ONE ◽

10.1371/journal.pone.0257069 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257069

Author(s):

Jae-Geum Shim ◽

Kyoung-Ho Ryu ◽

Sung Hyun Lee ◽

Eun-Ah Cho ◽

Sungho Lee ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Random Forest ◽

Tracheal Tube ◽

Pediatric Patients ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.

Download Full-text

Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches

10.21203/rs.3.rs-39653/v3 ◽

2020 ◽

Author(s):

Yuanren Tong ◽

Keming Lu ◽

Yingyun Yang ◽

Ji Li ◽

Yucong Lin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Language Processing ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Abstract Background: Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods: A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Results: The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively.Conclusions: Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases.

Download Full-text

Estimating Design Floods at Ungauged Watersheds in South Korea Using Machine Learning Models

Water ◽

10.3390/w12113022 ◽

2020 ◽

Vol 12 (11) ◽

pp. 3022

Author(s):

Jin-Young Lee ◽

Changhyun Choi ◽

Doosun Kang ◽

Byung Sik Kim ◽

Tae-Woong Kim

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

South Korea ◽

Recurrent Neural Network ◽

Flood Damage ◽

Flood Frequency Analysis ◽

Support Vector ◽

Design Floods ◽

Ungauged Watersheds

With recent increases of heavy rainfall during the summer season, South Korea is hit by substantial flood damage every year. To reduce such flood damage and cope with flood disasters, it is necessary to reliably estimate design floods. Despite the ongoing efforts to develop practical design practice, it has been difficult to develop a standardized guideline due to the lack of hydrologic data, especially flood data. In fact, flood frequency analysis (FFA) is impractical for ungauged watersheds, and design rainfall–runoff analysis (DRRA) overestimates design floods. This study estimated the appropriate design floods at ungauged watersheds by combining the DRRA and watershed characteristics using machine learning methods, including decision tree, random forest, support vector machine, deep neural network, the Elman recurrent neural network, and the Jordan recurrent neural network. The proposed models were validated using K-fold cross-validation to reduce overfitting and were evaluated based on various error measures. Even though the DRRA overestimated the design floods by 160%, on average, for our study areas the proposed model using random forest reduced the errors and estimated design floods at 99% of the FFA, on average.

Download Full-text

Spatio-temporal ensemble predictions for wind and solar energy combining dispersion modelling methods and machine learning

10.5194/egusphere-egu21-15646 ◽

2021 ◽

Author(s):

Irene Schicker ◽

Petrina Papazek ◽

Elisa Perrone ◽

Delia Arnold

Keyword(s):

Neural Network ◽

Machine Learning ◽

Renewable Energy ◽

Random Forest ◽

Energy Production ◽

Energy Systems ◽

Processing Algorithm ◽

Post Processing ◽

Processing Methods ◽

Renewable Energy Systems

With the increasing usage of renewable energy systems to meet the climate agreement aims accurate predictions of the possible amount of energy production stemming from renewable energy systems are needed. The need for such predictions and their uncertainty is manifold: to estimate the load on the power grid, to take measures in case of too much/not enough renewable energy with reduced nuclear energy availability, rescheduling/adjusting of energy production,&#160; maintenance, trading, and more. Furthermore, TSOs and energy providers need the information as finegrained, spatially and temporarily, as possible, on third level hub or even on solar farm / wind turbine level for a comparatively large area.These needs pose a challenge to numerical weather prediction (NWP) post-processing methods. Typically, one uses selected NWP fields aswell as observations, if available, as input in post-processing methods. Here, we combine two post-processing methods namely a neural network and random forest approach with the Flex_extract algorithm. Flex_extract is the pre-processing algorithm for the langrangian particle dispersion model FLEXPART and the trajectory model FLEXTRA. Flex_extract uses the three-dimensional wind fields of the NWP model and calculates additionally the instantaneous surfaces fluxes. Thus, coupling Flex_extract with a machine learning post-processing algorithm enables the usage of native NWP fields with a higher vertical accuracy than pressure levels. To generate an ensmeble in post-processing from deterministic sources different tools are available. Here, we will apply the Schaake Shuffle.&#160;In this study a neural network and random forest approach for probabilistic forecasting with a high horizontal grid resolution (1 km ) as well as a high temporal forecasting frequency of wind speed and global horizontal irradiance for Austria will be presented. Evaluation will be carried out against gridded analysis fields and observations.

Download Full-text