Pathogenicity Prediction of Single Amino Acid Variants with Machine Learning Model Based on Protein Structural Energies

Author(s):  
Tzu-Hsuan Wu ◽  
Peng-Chan Lin ◽  
Hsin-Hung Chou ◽  
Meng-Ru Shen ◽  
Sun-Yuan Hsieh
2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10381
Author(s):  
Rohit Nandakumar ◽  
Valentin Dinu

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.


Minerals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1294
Author(s):  
Honglei Wang ◽  
Zhenlei Li ◽  
Dazhao Song ◽  
Xueqiu He ◽  
Aleksei Sobolev ◽  
...  

Rockburst is a serious hazard in underground engineering, and accurate prediction of rockburst risk is challenging. To construct an intelligent prediction model of rockburst risk with interpretability and high accuracy, three binary scorecards predicting different risk levels of rockburst were constructed using ChiMerge, evidence weight theory, and the logistic regression algorithm. An intelligent rockburst prediction model based on scorecard methodology (IRPSC) was obtained by integrating the three scorecards. The effects of hazard sample category weights on the missed alarm rate, false alarm rate, and accuracy of the IRPSC were analyzed. Results show that the accuracy, false alarm rate, and missed alarm rate of the IRPSC for rockburst prediction in riverside hydropower stations are 75%, 12.5%, and 12.5%, respectively. Setting higher hazard sample category weights can reduce the missed alarm rate of IRPSC, but it will lead to a higher false alarm rate. The IRPSC can adaptively adjust the threshold and weight value of the indicator and convert the abstract machine learning model into a tabular form, which overcomes the commonly black box problems of machine learning model, as well as is of great significance to the application of machine learning in rockburst risk prediction.


Sign in / Sign up

Export Citation Format

Share Document