scholarly journals A High Accurate Machine Learning Meta-Strategy for the Prediction of Intrinsically Disorder Proteins

Author(s):  
Chengbin Hu ◽  
Yiru Qin ◽  
Chuan Ye ◽  
Jiao Jin ◽  
Ting Zhou ◽  
...  

Abstract Background: Many proteins or partial regions of proteins do not have stable and well-defined three-dimensional structures in vitro. Understanding Intrinsically Disorder Proteins (IDPs) is significant for interpreting biological function as well as studying many diseases. Although more than 70 disorder predictors have been invented, many existing predictors are limited on the characteristics of proteins and do not have very high accuracy. Therefore, it is critical to formulate new strategies on disorder protein prediction. Results: Here, we propose a machine learning meta-strategy to improve the accuracy of disordered proteins and disordered regions prediction. We first use logistic forward parameter selection to select eight most significant predictors from the current available IDP predictors. Then we design a novel meta-strategy using several machine learning models, including Decision-tree based algorithm, Naive Bayes, Random forest, and Convolutional Neural Network (CNN). By applying different strategies, the results suggest Random forest can improve the predicted single amino acid accuracy significantly to 93.35%. Using the combination vector data of eight most significant predictors as input, the Convolution Neural Network can improve the whole protein prediction to 95.62%. Conclusion: According to the performance of our machine learning meta-strategy, the Random forest and CNN models can improve the accuracy to predict IDPs.

2020 ◽  
Author(s):  
Chengbin Hu ◽  
Yiru Qin ◽  
Chuan Ye ◽  
Jiao jin ◽  
Ting Zhou ◽  
...  

Background: Many proteins or partial regions of proteins do not have stable and well-defined three-dimensional structures in vitro. Understanding intrinsically disorder proteins (IDPs) is significant for interpreting biological function as well as studying many diseases. Although more than 70 disorder predictors have been invented, many existing predictors are limited on the characteristics of proteins and do not have very high accuracy. Therefore, it is critical to formulate new strategies on disorder protein prediction. Results: Here, we propose a machine learning meta-strategy to improve the accuracy of disordered proteins and disordered regions prediction. We first use logistic forward parameter selection to select eight most significant predictors from the current available IDP predictors. Then we design a novel meta-strategy using several machine learning models, including Decision-tree based algorithm, Naive Bayes, Random forest, and Convolutional Neural Network (CNN). By applying different strategies, the results suggest Random forest can improve the predicted single amino acid accuracy significantly to 93.35%. Using the combination vector data of eight most significant predictors as input, the Convolution Neural Network can improve the whole protein prediction to 95.62%. Conclusion: According to the performance of our machine learning meta-strategy, the Random forest and CNN models can improve the accuracy to predict intrinsically disorder proteins.


Animals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 771
Author(s):  
Toshiya Arakawa

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4575 ◽  
Author(s):  
Jihyun Lee ◽  
Jiyoung Woo ◽  
Ah Reum Kang ◽  
Young-Seob Jeong ◽  
Woohyun Jung ◽  
...  

Hypotensive events in the initial stage of anesthesia can cause serious complications in the patients after surgery, which could be fatal. In this study, we intended to predict hypotension after tracheal intubation using machine learning and deep learning techniques after intubation one minute in advance. Meta learning models, such as random forest, extreme gradient boosting (Xgboost), and deep learning models, especially the convolutional neural network (CNN) model and the deep neural network (DNN), were trained to predict hypotension occurring between tracheal intubation and incision, using data from four minutes to one minute before tracheal intubation. Vital records and electronic health records (EHR) for 282 of 319 patients who underwent laparoscopic cholecystectomy from October 2018 to July 2019 were collected. Among the 282 patients, 151 developed post-induction hypotension. Our experiments had two scenarios: using raw vital records and feature engineering on vital records. The experiments on raw data showed that CNN had the best accuracy of 72.63%, followed by random forest (70.32%) and Xgboost (64.6%). The experiments on feature engineering showed that random forest combined with feature selection had the best accuracy of 74.89%, while CNN had a lower accuracy of 68.95% than that of the experiment on raw data. Our study is an extension of previous studies to detect hypotension before intubation with a one-minute advance. To improve accuracy, we built a model using state-of-art algorithms. We found that CNN had a good performance, but that random forest had a better performance when combined with feature selection. In addition, we found that the examination period (data period) is also important.


Water ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 2927
Author(s):  
Jiyeong Hong ◽  
Seoro Lee ◽  
Joo Hyun Bae ◽  
Jimin Lee ◽  
Woon Ji Park ◽  
...  

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.


2021 ◽  
Author(s):  
Ewerthon Dyego de Araújo Batista ◽  
Wellington Candeia de Araújo ◽  
Romeryto Vieira Lira ◽  
Laryssa Izabel de Araújo Batista

Dengue é um problema de saúde pública no Brasil, os casos da doença voltaram a crescer na Paraíba. O boletim epidemiológico da Paraíba, divulgado em agosto de 2021, informa um aumento de 53% de casos em relação ao ano anterior. Técnicas de Machine Learning (ML) e de Deep Learning estão sendo utilizadas como ferramentas para a predição da doença e suporte ao seu combate. Por meio das técnicas Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Long ShortTerm Memory (LSTM) e Convolutional Neural Network (CNN), este artigo apresenta um sistema capaz de realizar previsões de internações causadas por dengue para as cidades Bayeux, Cabedelo, João Pessoa e Santa Rita. O sistema conseguiu realizar previsões para Bayeux com taxa de erro 0,5290, já em Cabedelo o erro foi 0,92742, João Pessoa 9,55288 e Santa Rita 0,74551.


2021 ◽  
Vol 13 (16) ◽  
pp. 3203
Author(s):  
Won-Kyung Baek ◽  
Hyung-Sup Jung

It is well known that the polarization characteristics in X-band synthetic aperture radar (SAR) image analysis can provide us with additional information for marine target classification and detection. Normally, dual-and single-polarized SAR images are acquired by SAR satellites, and then we must determine how accurate the marine mapping performance from dual-polarized (pol) images is versus the marine mapping performance from the single-pol images in a given machine learning model. The purpose of this study is to compare the performance of single- and dual-pol SAR image classification achieved by the support vector machine (SVM), random forest (RF), and deep neural network (DNN) models. The test image is a TerraSAR-X dual-pol image acquired from the 2007 Kerch Strait oil spill event. For this, 824,026 pixels and 1,648,051 pixels were extracted from the image for the training and test, respectively, and sea, ship, oil, and land objects were classified from the image by using the three machine learning methods. The mean f1-scores of the SVM, RF, and DNN models resulting from the single-pol image were approximately 0.822, 0.882, and 0.889, respectively, and those from the dual-pol image were about 0.852, 0.908, and 0.898, respectively. The performance improvement achieved by dual-pol was about 3.6%, 2.9%, and 1% in SVM, RF, and DNN, respectively. The DNN model had the best performance (0.889) in the single-pol test while the RF model was best (0.908) in the dual-pol test. The performance improvement was approximately 2.1% and not noticeable. If the condition that dual-pol images have two-times lower spatial resolution versus single-pol images in the azimuth direction is considered, a small improvement may not be valuable. Therefore, the results show that the performance improvement by X-band dual-pol image may be not remarkable when classifying the sea, ships, oil spills, and sea and land surfaces.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257069
Author(s):  
Jae-Geum Shim ◽  
Kyoung-Ho Ryu ◽  
Sung Hyun Lee ◽  
Eun-Ah Cho ◽  
Sungho Lee ◽  
...  

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.


2020 ◽  
Author(s):  
Yuanren Tong ◽  
Keming Lu ◽  
Yingyun Yang ◽  
Ji Li ◽  
Yucong Lin ◽  
...  

Abstract Background: Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods: A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Results: The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively.Conclusions: Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases.


Water ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 3022
Author(s):  
Jin-Young Lee ◽  
Changhyun Choi ◽  
Doosun Kang ◽  
Byung Sik Kim ◽  
Tae-Woong Kim

With recent increases of heavy rainfall during the summer season, South Korea is hit by substantial flood damage every year. To reduce such flood damage and cope with flood disasters, it is necessary to reliably estimate design floods. Despite the ongoing efforts to develop practical design practice, it has been difficult to develop a standardized guideline due to the lack of hydrologic data, especially flood data. In fact, flood frequency analysis (FFA) is impractical for ungauged watersheds, and design rainfall–runoff analysis (DRRA) overestimates design floods. This study estimated the appropriate design floods at ungauged watersheds by combining the DRRA and watershed characteristics using machine learning methods, including decision tree, random forest, support vector machine, deep neural network, the Elman recurrent neural network, and the Jordan recurrent neural network. The proposed models were validated using K-fold cross-validation to reduce overfitting and were evaluated based on various error measures. Even though the DRRA overestimated the design floods by 160%, on average, for our study areas the proposed model using random forest reduced the errors and estimated design floods at 99% of the FFA, on average.


2021 ◽  
Author(s):  
Irene Schicker ◽  
Petrina Papazek ◽  
Elisa Perrone ◽  
Delia Arnold

&lt;p&gt;With the increasing usage of renewable energy systems to meet the climate agreement aims accurate predictions of the possible amount of energy production stemming from renewable energy systems are needed. The need for such predictions and their uncertainty is manifold: to estimate the load on the power grid, to take measures in case of too much/not enough renewable energy with reduced nuclear energy availability, rescheduling/adjusting of energy production,&amp;#160; maintenance, trading, and more. Furthermore, TSOs and energy providers need the information as finegrained, spatially and temporarily, as possible, on third level hub or even on solar farm / wind turbine level for a comparatively large area.&lt;/p&gt;&lt;p&gt;These needs pose a challenge to numerical weather prediction (NWP) post-processing methods. Typically, one uses selected NWP fields aswell as observations, if available, as input in post-processing methods. Here, we combine two post-processing methods namely a neural network and random forest approach with the Flex_extract algorithm. Flex_extract is the pre-processing algorithm for the langrangian particle dispersion model FLEXPART and the trajectory model FLEXTRA. Flex_extract uses the three-dimensional wind fields of the NWP model and calculates additionally the instantaneous surfaces fluxes. Thus, coupling Flex_extract with a machine learning post-processing algorithm enables the usage of native NWP fields with a higher vertical accuracy than pressure levels. To generate an ensmeble in post-processing from deterministic sources different tools are available. Here, we will apply the Schaake Shuffle.&amp;#160;&lt;/p&gt;&lt;p&gt;In this study a neural network and random forest approach for probabilistic forecasting with a high horizontal grid resolution (1 km ) as well as a high temporal forecasting frequency of wind speed and global horizontal irradiance for Austria will be presented. Evaluation will be carried out against gridded analysis fields and observations.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document