Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

2020 ◽  
Vol 26 (33) ◽  
pp. 4195-4205
Author(s):  
Xiaoyu Ding ◽  
Chen Cui ◽  
Dingyan Wang ◽  
Jihui Zhao ◽  
Mingyue Zheng ◽  
...  

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

2020 ◽  
Author(s):  
Young Min Park ◽  
Byung-Joo Lee

Abstract Background: This study analyzed the prognostic significance of nodal factors, including the number of metastatic LNs and LNR, in patients with PTC, and attempted to construct a disease recurrence prediction model using machine learning techniques.Methods: We retrospectively analyzed clinico-pathologic data from 1040 patients diagnosed with papillary thyroid cancer between 2003 and 2009. Results: We analyzed clinico-pathologic factors related to recurrence through logistic regression analysis. Among the factors that we included, only sex and tumor size were significantly correlated with disease recurrence. Parameters such as age, sex, tumor size, tumor multiplicity, ETE, ENE, pT, pN, ipsilateral central LN metastasis, contralateral central LNs metastasis, number of metastatic LNs, and LNR were input for construction of a machine learning prediction model. The performance of five machine learning models related to recurrence prediction was compared based on accuracy. The Decision Tree model showed the best accuracy at 95%, and the lightGBM and stacking model together showed 93% accuracy. Conclusions: We confirmed that all machine learning prediction models showed an accuracy of 90% or more for predicting disease recurrence in PTC. Large-scale multicenter clinical studies should be performed to improve the performance of our prediction models and verify their clinical effectiveness.


2013 ◽  
Vol 17 (11) ◽  
pp. 4713-4728 ◽  
Author(s):  
S. Terzer ◽  
L. I. Wassenaar ◽  
L. J. Araguás-Araguás ◽  
P. K. Aggarwal

Abstract. A regionalized cluster-based water isotope prediction (RCWIP) approach, based on the Global Network of Isotopes in Precipitation (GNIP), was demonstrated for the purposes of predicting point- and large-scale spatio-temporal patterns of the stable isotope composition (δ2H, δ18O) of precipitation around the world. Unlike earlier global domain and fixed regressor models, RCWIP predefined 36 climatic cluster domains and tested all model combinations from an array of climatic and spatial regressor variables to obtain the best predictive approach to each cluster domain, as indicated by root-mean-squared error (RMSE) and variogram analysis. Fuzzy membership fractions were thereafter used as the weights to seamlessly amalgamate results of the optimized climatic zone prediction models into a single predictive mapping product, such as global or regional amount-weighted mean annual, mean monthly, or growing-season δ18O/δ2H in precipitation. Comparative tests revealed the RCWIP approach outperformed classical global-fixed regression–interpolation-based models more than 67% of the time, and clearly improved upon predictive accuracy and precision. All RCWIP isotope mapping products are available as gridded GeoTIFF files from the IAEA website (www.iaea.org/water) and are for use in hydrology, climatology, food authenticity, ecology, and forensics.


2013 ◽  
Vol 10 (6) ◽  
pp. 7351-7393 ◽  
Author(s):  
S. Terzer ◽  
L. I. Wassenaar ◽  
L. J. Araguás-Araguás ◽  
P. K. Aggarwal

Abstract. A Regionalized Climatic Water Isotope Prediction (RCWIP) approach, based on the Global Network for Isotopes in Precipitation (GNIP), was demonstrated for the purposes of predicting point- and large-scale spatiotemporal patterns of the stable isotope compositions of water (δ2H, δ18O) in precipitation around the world. Unlike earlier global domain and fixed regressor models, RCWIP pre-defined thirty-six climatic cluster domains, and tested all model combinations from an array of climatic and spatial regressor variables to obtain the best predictive approach to each cluster domain, as indicated by RMSE and variogram analysis. Fuzzy membership fractions were thereafter used as the weights to seamlessly amalgamate results of the optimized climatic zone prediction models into a single predictive mapping product, such as global or regional amount-weighted mean annual, mean monthly or growing-season δ18O/δ2H in precipitation. Comparative tests revealed the RCWIP approach outperformed classical global-fixed regression-interpolation based models more than 67% of the time, and significantly improved upon predictive accuracy and precision. All RCWIP isotope mapping products are available as gridded GeoTIFF files from the IAEA website (www.iaea.org/water) and are for use in hydrology, climatology, food authenticity, ecology, and forensics.


2018 ◽  
Vol 57 (3) ◽  
pp. 547-570 ◽  
Author(s):  
Wanli Xing ◽  
Dongping Du

Massive open online courses (MOOCs) show great potential to transform traditional education through the Internet. However, the high attrition rates in MOOCs have often been cited as a scale-efficacy tradeoff. Traditional educational approaches are usually unable to identify such large-scale number of at-risk students in danger of dropping out in time to support effective intervention design. While building dropout prediction models using learning analytics are promising in informing intervention design for these at-risk students, results of the current prediction model construction methods do not enable personalized intervention for these students. In this study, we take an initial step to optimize the dropout prediction model performance toward intervention personalization for at-risk students in MOOCs. Specifically, based on a temporal prediction mechanism, this study proposes to use the deep learning algorithm to construct the dropout prediction model and further produce the predicted individual student dropout probability. By taking advantage of the power of deep learning, this approach not only constructs more accurate dropout prediction models compared with baseline algorithms but also comes up with an approach to personalize and prioritize intervention for at-risk students in MOOCs through using individual drop out probabilities. The findings from this study and implications are then discussed.


2015 ◽  
Vol 2015 ◽  
pp. 1-18 ◽  
Author(s):  
Ronald de Vlaming ◽  
Patrick J. F. Groenen

In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use ofridge regressionfor prediction in quantitative genetics usingsingle-nucleotide polymorphismdata is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g.,dominanceandepistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e.,N<10,000) the predictive accuracy of ridge regression is slightly higher than the classicalgenome-wide association studyapproach ofrepeated simple regression(i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.


Author(s):  
Byeong Mun Heo ◽  
Keun Ho Ryu

Hypertension and prehypertension are risk factors for cardiovascular diseases. However, the associations of both prehypertension and hypertension with anthropometry, blood parameters, and spirometry have not been investigated. The purpose of this study was to identify the risk factors for prehypertension and hypertension in middle-aged Korean adults and to study prediction models of prehypertension and hypertension combined with anthropometry, blood parameters, and spirometry. Binary logistic regression analysis was performed to assess the statistical significance of prehypertension and hypertension, and prediction models were developed using logistic regression, naïve Bayes, and decision trees. Among all risk factors for prehypertension, body mass index (BMI) was identified as the best indicator in both men [odds ratio (OR) = 1.429, 95% confidence interval (CI) = 1.304–1.462)] and women (OR = 1.428, 95% CI = 1.204–1.453). In contrast, among all risk factors for hypertension, BMI (OR = 1.993, 95% CI = 1.818–2.186) was found to be the best indicator in men, whereas the waist-to-height ratio (OR = 2.071, 95% CI = 1.884–2.276) was the best indicator in women. In the prehypertension prediction model, men exhibited an area under the receiver operating characteristic curve (AUC) of 0.635, and women exhibited a predictive power with an AUC of 0.777. In the hypertension prediction model, men exhibited an AUC of 0.700, and women exhibited an AUC of 0.845. This study proposes various risk factors for prehypertension and hypertension, and our findings can be used as a large-scale screening tool for controlling and managing hypertension.


2021 ◽  
pp. 1-39
Author(s):  
Md Mahfuzer Rahman ◽  
Xiaoqing “Frank” Liu ◽  
Joseph W. Sirrianni ◽  
Douglas Adams

One of the challenging problems in large scale cyber-argumentation platforms is that users often engage and focus only on a few issues and leave other issues under-discussed and under-acknowledged. This kind of non-uniform participation obstructs the argumentation analysis models to retrieve collective intelligence from the underlying discussion. To resolve this problem, we developed an innovative opinion prediction model for a multi-issue cyber-argumentation environment. Our model predicts users’ opinions on the non-participated issues from similar users’ opinions on related issues using intelligent argumentation techniques and a collaborative filtering method. Based on our detailed experimental results on an empirical dataset collected using our cyber-argumentation platform, our model is 21.7% more accurate, handles data sparsity better than other popular opinion prediction methods. Our model can also predict opinions on multiple issues simultaneously with reasonable accuracy. Contrary to existing opinion prediction models, which only predict whether a user agrees on an issue, our model predicts how much a user agrees on the issue. To our knowledge, this is the first research to attempt multi-issue opinion prediction with the partial agreement in the cyber-argumentation platform. With additional data on non-participated issues, our opinion prediction model can help the collective intelligence analysis models to analyze social phenomena more effectively and accurately in the cyber argumentation platform.


2001 ◽  
Vol 10 (2) ◽  
pp. 241 ◽  
Author(s):  
Jon B. Marsden-Smedley ◽  
Wendy R. Catchpole

An experimental program was carried out in Tasmanian buttongrass moorlands to develop fire behaviour prediction models for improving fire management. This paper describes the results of the fuel moisture modelling section of this project. A range of previously developed fuel moisture prediction models are examined and three empirical dead fuel moisture prediction models are developed. McArthur’s grassland fuel moisture model gave equally good predictions as a linear regression model using humidity and dew-point temperature. The regression model was preferred as a prediction model as it is inherently more robust. A prediction model based on hazard sticks was found to have strong seasonal effects which need further investigation before hazard sticks can be used operationally.


2019 ◽  
Vol 17 (6) ◽  
pp. 1519-1530 ◽  
Author(s):  
Yao Luo ◽  
Ranran Zeng ◽  
Qingqing Guo ◽  
Jianrong Xu ◽  
Xiaoou Sun ◽  
...  

G03 is a novel anticancer agent with unusual microtubule-stabilizing effects.


Sign in / Sign up

Export Citation Format

Share Document