scholarly journals Effective Search of Triterpenes with Anti-HSV-1 Activity Using a Classification Model by Logistic Regression

2021 ◽  
Vol 9 ◽  
Author(s):  
Keiko Ogawa ◽  
Seikou Nakamura ◽  
Haruka Oguri ◽  
Kaori Ryu ◽  
Taichi Yoneda ◽  
...  

Natural products are an excellent source of skeletons for medicinal seeds. Triterpenes and saponins are representative natural products that exhibit anti-herpes simplex virus type 1 (HSV-1) activity. However, there has been a lack of comprehensive information on the anti-HSV-1 activity of triterpenes. Therefore, expanding information on the anti-HSV-1 activity of triterpenes and improving the efficiency of their exploration are urgently required. To improve the efficiency of the development of anti-HSV-1 active compounds, we constructed a predictive model for the anti-HSV-1 activity of triterpenes by using the information obtained from previous studies using machine learning methods. In this study, we constructed a binary classification model (i.e., active or inactive) using a logistic regression algorithm. As a result of the evaluation of predictive model, the accuracy for the test data is 0.79, and the area under the curve (AUC) is 0.86. Additionally, to enrich the information on the anti-HSV-1 activity of triterpenes, a plaque reduction assay was performed on 20 triterpenes. As a result, chikusetsusaponin IVa (11: IC50 = 13.06 μM) was found to have potent anti-HSV-1 with three potentially anti-HSV-1 active triterpenes. The assay result was further used for external validation of predictive model. The prediction of the test compounds in the activity test showed a high accuracy (0.83) and AUC (0.81). We also found that this predictive model was found to be able to successfully narrow down the active compounds. This study provides more information on the anti-HSV-1 activity of triterpenes. Moreover, the predictive model can improve the efficiency of the development of active triterpenes by integrating many previous studies to clarify potential relationships.

Author(s):  
Sandro Radovanović ◽  
Marko Ivić

Research Question: This paper aims at adjusting the logistic regression algorithm to mitigate unwanted discrimination shown towards race, gender, etc. Motivation: Decades of research in the field of algorithm design have been dedicated to making a better prediction model. Many algorithms are designed and improved, which made them better than the judgments of people and even experts. However, in recent years it has been discovered that predictive models can make unwanted discrimination. Such unwanted discrimination in the predictive model can lead to legal consequences. In order to mitigate the problem of unwanted discrimination, we propose equal opportunity between privileged and discriminated groups in the logistic regression algorithm. Idea: Our idea is to add a regularization term in the goal function of the logistic regression. Therefore, our predictive model will solve both the social problem and the predictive problem. More specifically, our model will provide fair and accurate predictions. Data: The data used in this research present U.S. census data describing individuals using personal characteristics with a goal to provide a binary classification model for predicting if an individual has an annual salary above $50k. The dataset used is known for disparate impact regarding female individuals. In addition, we used the COMPAS dataset aimed at predicting recidivism. COMPAS is biased toward African-Americans. Tools: We developed a novel regularization technique for equal opportunity in the logistic regression algorithm. The proposed regularization is compared against classical logistic regression and fairness constraint logistic regression, using a ten-fold cross-validation. Findings: The results suggest that equal opportunity logistic regression manages to create a fair prediction model. More specifically, our model improved both disparate impact and equal opportunity compared to classical logistic regression, with a minor loss in prediction accuracy. Compared to the disparate impact constrained logistic regression, our approach has higher prediction accuracy and equal opportunity, while having a lower disparate impact. By inspecting the coefficients of our approach and classical logistic regression, one can see that proxy attribute coefficients are reduced to very low values. Contribution: The main contribution of this paper is in the methodological part. More specifically, we implemented an equal opportunity in the logistic regression algorithm.


Molecules ◽  
2019 ◽  
Vol 24 (10) ◽  
pp. 2006 ◽  
Author(s):  
Liadys Mora Lagares ◽  
Nikola Minovski ◽  
Marjana Novič

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.


2020 ◽  
Author(s):  
Nida Fatima

Abstract Background: Preoperative prognostication of clinical and surgical outcome in patients with neurosurgical diseases can improve the risk stratification, thus can guide in implementing targeted treatment to minimize these events. Therefore, the author aims to highlight the development and validation of predictive models determining neurosurgical outcomes through machine learning algorithms using logistic regression.Methods: Logistic regression (enter, backward and forward) and least absolute shrinkage and selection operator (LASSO) method for selection of variables from selected database can eventually lead to multiple candidate models. The final model with a set of predictive variables must be selected based upon the clinical knowledge and numerical results.Results: The predictive model which performed best on the discrimination, calibration, Brier score and decision curve analysis must be selected to develop machine learning algorithms. Logistic regression should be compared with the LASSO model. Usually for the big databases, the predictive model selected through logistic regression gives higher Area Under the Curve (AUC) than those with LASSO model. The predictive probability derived from the best model could be uploaded to an open access web application which is easily deployed by the patients and surgeons to make a risk assessment world-wide.Conclusions: Machine learning algorithms provide promising results for the prediction of outcomes following cranial and spinal surgery. These algorithms can provide useful factors for patient-counselling, assessing peri-operative risk factors, and predicting post-operative outcomes after neurosurgery.


2020 ◽  
Author(s):  
Fangran Xin ◽  
Bowen Yang ◽  
Lingyu Fu ◽  
Haina Liu ◽  
Tingting Wei ◽  
...  

Abstract Background: to develop and validate a serum lipid and inflammatory marker model based on the nomogram for the prediction of stroke risk in rheumatoid arthritis patients.Methods: This study was conducted among 313 rheumatoid arthritis with stroke patients and 1827 rheumatoid arthritis patients divided into develop and validation cohorts from the First Affiliated Hospital of China Medical University during January 2011 to December 2018. Logistic regression analysis was used to create a nomogram of predictive model of stroke risk in rheumatoid arthritis patients, after comparing with other machine algorithms. The performance of the nomogram was evaluated by discrimination, calibration and decision curve analysis, also compared with the Framingham Risk Score in predicting stroke in rheumatoid arthritis patients.Results: the nomogram was performed by logistic regression algorithm, and predictors of which included the stratifications of sex, age, systolic blood pressure, C-reactive protein, erythrocyte sedimentation rate, total cholesterol, low density lipoprotein cholesterol and the distribution of being accompanied with hy-med, diabetes, atrial fibrillation and coronary heart disease history, which exhibited a well goodness fit and a good agreement. The analysis with area under the curve, the net reclassification index, the integrated discrimination improvement and clinical use, suggested that this is an easy-to-use nomogram compared with the Framingham Risk Score.Conclusion: This study presents a risk nomogram that incorporates the traditional risk factors, serum lipids and inflammatory markers which can be used to predict stroke in rheumatoid arthritis patients.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sang Hoon Kim ◽  
Youngbae Hwang ◽  
Dong Jun Oh ◽  
Ji Hyung Nam ◽  
Ki Bae Kim ◽  
...  

AbstractThe manual reading of capsule endoscopy (CE) videos in small bowel disease diagnosis is time-intensive. Algorithms introduced to automate this process are premature for real clinical applications, and multi-diagnosis using these methods has not been sufficiently validated. Therefore, we developed a practical binary classification model, which selectively identifies clinically meaningful images including inflamed mucosa, atypical vascularity or bleeding, and tested it with unseen cases. Four hundred thousand CE images were randomly selected from 84 cases in which 240,000 images were used to train the algorithm to categorize images binarily. The remaining images were utilized for validation and internal testing. The algorithm was externally tested with 256,591 unseen images. The diagnostic accuracy of the trained model applied to the validation set was 98.067%. In contrast, the accuracy of the model when applied to a dataset provided by an independent hospital that did not participate during training was 85.470%. The area under the curve (AUC) was 0.922. Our model showed excellent internal test results, and the misreadings were slightly increased when the model was tested in unseen external cases while the classified ‘insignificant’ images contain ambiguous substances. Once this limitation is solved, the proposed CNN-based binary classification will be a promising candidate for developing clinically-ready computer-aided reading methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Yukun Wu ◽  
Binshen Chen ◽  
Chunxiao Liu

Objective. To assess the value of clinically relevant data for predicting the failure of removal of the urinary catheter within 48 hours after TUERP. Materials and Methods. We retrospectively analyzed the medical records of 357 patients who underwent TUERP between January 2015 and July 2018, all of whom stopped bladder irrigation and removed urinary catheter within 48 hours after the operation. According to whether the removal of the catheter was successful, the patients were classified into 2 groups: Group A was successful and group B was a failure. Univariate analysis was performed to determine the association between the failure of removal of the catheter and the patients’ preoperative clinical characteristics. Logistic regression analysis and receiver operating characteristic analysis (ROC) were conducted to establish the prediction model. Then the area under the curve (AUC) and the cut-off value were calculated. Results. 357 patients were divided into group A (n = 305, 85.4%) and group B (n = 52, 14.6%). The patients’ drug medication (P=0.006), history of acute urinary retention (AUR) (P≤0.001), smoke (P=0.045), IPSS (P≤0.001), IPP (P=0.006), PSA (P=0.047), residual urine volume (P≤0.001), QoL (P≤0.001), and TPV (P=0.043) were significantly different between the 2 groups. A predictive model using logistic regression was defined as follows: INDEX = 10.862–1.376 × (IPSS) − 1.185 × (QoL) − 1.062 × (drug medication) + 1.079 × (history of AUR) + 0.030 × (TPV) − 0.867 × (IPP) with area under the curve of 0.860 obtained from the ROC curve analysis. The predictive model had a cut-off value of 1.7725, and the sensitivity for predicting the failure of removal of the urethra was 74.1% and the specificity was 84.6%. Conclusion. This study demonstrated that IPSS, QoL, drug medication, history of AUR, TPV, and IPP are independent factors associated with the failure of removal of the urethral catheter within 48 hours after TUERP.


2021 ◽  
pp. 197140092110123
Author(s):  
Christoph J Maurer ◽  
Irina Mader ◽  
Felix Joachimski ◽  
Ori Staszewski ◽  
Bruno Märkl ◽  
...  

Purpose The aim of this study was the development and external validation of a logistic regression model to differentiate gliosarcoma (GSC) and glioblastoma multiforme (GBM) on standard MR imaging. Methods A univariate and multivariate analysis was carried out of a logistic regression model to discriminate patients histologically diagnosed with primary GSC and an age and sex-matched group of patients with primary GBM on presurgical MRI with external validation. Results In total, 56 patients with GSC and 56 patients with GBM were included. Evidence of haemorrhage suggested the diagnosis of GSC, whereas cystic components and pial as well as ependymal invasion were more commonly observed in GBM patients. The logistic regression model yielded a mean area under the curve (AUC) of 0.919 on the training dataset and of 0.746 on the validation dataset. The accuracy in the validation dataset was 0.67 with a sensitivity of 0.85 and a specificity of 0.5. Conclusions Although some imaging criteria suggest the diagnosis of GSC or GBM, differentiation between these two tumour entities on standard MRI alone is not feasible.


Traffic accidents are one of the most life-threatening dangers to human being. Deaths and injuries due to traffic accidents have a great impact on society. Traffic accidents information and data provided by public can be useful to classify these accidents according to their type and severity, and consequently try to build predictive model. Detecting and identifying injury severity in traffic accidents in real time is primordial for speeding post-accidents protocols as well as developing general road safety policies. In this project we are using Logistic Regression algorithm to classify accident data. The data to be analysed is collected from various sources, is both structured and unstructured and has several attributes. In this project we are going to detect and analyse data together to generate decision trees that give insights on previous accidents.


2020 ◽  
Author(s):  
Sang Hoon Kim ◽  
Youngbae Hwang ◽  
Dong Jun Oh ◽  
Ji Hyung Nam ◽  
Ki Bae Kim ◽  
...  

Abstract Manual reading of capsule endoscopy (CE) video is a time-consuming process in diagnosing small bowel diseases. Although many algorithms have been introduced, multi-diagnosis has not been sufficiently validated. They are promising but still premature to be used in clinical practice. Therefore, we developed a practical binary classification model and tested it with unseen cases.400,000 CE images were randomly selected from 84 cases. Among them, 240,000 were used to train an algorithm to categorize images binarily. The remaining images were utilized for validation and internal testing. The algorithm was externally tested with 256,591 unseen images.Diagnostic accuracy was 98.067% when the trained model was applied to the validation set. It was 97.946% when applied to images for internal testing. When the model was applied to a dataset provided by an independent hospital not participated during training, its accuracy was 85.470%. The area under the curve was 0.922.Our binary classification model showed excellent internal test results, and when tested in unseen external cases, misreadings were slightly increased while judging ‘insignificant’ images containing ambiguous substances. When we can get over this problem, CNN-based binary classification will become the most promising candidates for developing clinically ready computer-aided reading methods.


2020 ◽  
Vol 7 (4) ◽  
pp. 131
Author(s):  
José Miguel Calderón ◽  
Julio Álvarez-Pitti ◽  
Irene Cuenca ◽  
Francisco Ponce ◽  
Pau Redon

Obstructive sleep apnea syndrome is a reduction of the airflow during sleep which not only produces a reduction in sleep quality but also has major health consequences. The prevalence in the obese pediatric population can surpass 50%, and polysomnography is the current gold standard method for its diagnosis. Unfortunately, it is expensive, disturbing and time-consuming for experienced professionals. The objective is to develop a patient-friendly screening tool for the obese pediatric population to identify those children at higher risk of suffering from this syndrome. Three supervised learning classifier algorithms (i.e., logistic regression, support vector machine and AdaBoost) common in the field of machine learning were trained and tested on two very different datasets where oxygen saturation raw signal was recorded. The first dataset was the Childhood Adenotonsillectomy Trial (CHAT) consisting of 453 individuals, with ages between 5 and 9 years old and one-third of the patients being obese. Cross-validation was performed on the second dataset from an obesity assessment consult at the Pediatric Department of the Hospital General Universitario of Valencia. A total of 27 patients were recruited between 5 and 17 years old; 42% were girls and 63% were obese. The performance of each algorithm was evaluated based on key performance indicators (e.g., area under the curve, accuracy, recall, specificity and positive predicted value). The logistic regression algorithm outperformed (accuracy = 0.79, specificity = 0.96, area under the curve = 0.9, recall = 0.62 and positive predictive value = 0.94) the support vector machine and the AdaBoost algorithm when trained with the CHAT datasets. Cross-validation tests, using the Hospital General de Valencia (HG) dataset, confirmed the higher performance of the logistic regression algorithm in comparison with the others. In addition, only a minor loss of performance (accuracy = 0.75, specificity = 0.88, area under the curve = 0.85, recall = 0.62 and positive predictive value = 0.83) was observed despite the differences between the datasets. The proposed minimally invasive screening tool has shown promising performance when it comes to identifying children at risk of suffering obstructive sleep apnea syndrome. Moreover, it is ideal to be implemented in an outpatient consult in primary and secondary care.


Sign in / Sign up

Export Citation Format

Share Document