scholarly journals Sequence-Based Discovery of Antibacterial Peptides Using Ensemble Gradient Boosting

Proceedings ◽  
2020 ◽  
Vol 66 (1) ◽  
pp. 6
Author(s):  
Ehdieh Khaledian ◽  
Shira L. Broschat

Antimicrobial resistance is driving pharmaceutical companies to investigate different therapeutic approaches. One approach that has garnered growing consideration in drug development is the use of antimicrobial peptides (AMPs). Antibacterial peptides (ABPs), which occur naturally as part of the immune response, can serve as powerful, broad-spectrum antibiotics. However, conventional laboratory procedures for screening and discovering ABPs are expensive and time-consuming. Identification of ABPs can be significantly improved using computational methods. In this paper, we introduce a machine learning method for the fast and accurate prediction of ABPs. We gathered more than 6000 peptides from publicly available datasets and extracted 1209 features (peptide characteristics) from these sequences. We selected the set of optimal features by applying correlation-based and random forest feature selection techniques. Finally, we designed an ensemble gradient boosting model (GBM) to predict putative ABPs. We evaluated our model using receiver operating characteristic (ROC) curves, calculating the area under the curve (AUC) for several different models for comparison, including a recurrent neural network, a support vector machine, and iAMPpred. The AUC for the GBM was ~0.98, more than 3% better than any of the other models.

2021 ◽  
Vol 9 (Suppl 3) ◽  
pp. A838-A839
Author(s):  
Steven Tran ◽  
Luke Rasmussen ◽  
Jennifer Pacheco ◽  
Carlos Galvez ◽  
Kyle Tegtmeyer ◽  
...  

BackgroundImmune checkpoint inhibitors (ICIs) are a pillar of cancer therapy with demonstrated efficacy in a variety of malignancies. However, they are associated with immune-related adverse events (irAEs) that affect many organ systems with varying severity, inhibiting patient quality of life and in some cases the ability to continue immunotherapy. Research into irAEs is nascent, and identifying patients with adverse events poses a critical challenge for future research efforts and patient care. This study's objective was to develop an electronic health record (EHR)-based model to identify and characterize patients with ICI-associated arthritis (checkpoint arthritis).MethodsForty-two patients with checkpoint arthritis were chart abstracted from a cohort of all patients who received checkpoint therapy for cancer (n=2,612) in a single-center retrospective study. All EHR clinical codes (N=32,198) were extracted including International Classification of Diseases (ICD)-9 and ICD-10, Logical Observation Identifiers Names and Codes (LOINC), RxNorm, and Current Procedural Terminology (CPT). Logistic regression, random forest, gradient boosting, support vector machine, K-nearest neighbors, and neural network machine learning models were trained to identify checkpoint arthritis patients using these clinical codes. Models were evaluated using receiver operating characteristic area under the curve (ROC-AUC), and the most important variables were determined from the logistic regression model. Models were retrained on smaller fractions of the important variables to determine the minimum variable set necessary to achieve accurate identification of checkpoint arthritis.ResultsLogistic regression and random forest were the highest performing models on the full variable set of 32,198 clinical codes (AUCs: 0.911, 0.894, respectively) (table 1). Retraining the models on smaller fractions of the most important variables demonstrated peak performance using the top 31 clinical codes, or 0.1% of the total variables (figure 1). The most important features included presence of ESR, CRP, rheumatoid factor lab, prednisone, joint pain, creatine kinase lab, thyroid labs, and immunization, all positively associated with checkpoint arthritis (figure 2).ConclusionsOur study demonstrates that a data-driven, EHR based approach can robustly identify checkpoint arthritis patients. The high performance of the models using only the 0.1% most important variables suggests that only a small number of clinical attributes are needed to identify these patients. The variables most important for identifying checkpoint arthritis included several unexpected clinical features, such as thyroid labs and immunization, indicating potential underlying irAE associations that warrant further exploration. Finally, the flexibility of this approach and its demonstrated effectiveness could be applied to identify and characterize other irAEs.Ethics ApprovalThis study was approved by the Northwestern University Institutional Review Board, ID STU00210502, with a granted waiver of consentAbstract 802 Table 1Model performance metricsAUC was calculated from the ROC curve. Sensitivity, specificity, PPV, and NPV were determined at the threshold maximizing the F1-score. AUC = area under the curve, ROC = receiver operating characteristic, PPV = positive predictive value, NPV = negative predictive valueAbstract 802 Figure 1Model AUC trained on decreasing fractions of the most important variables, determined by the random forest model. 100% = 32,198 clinical codes. LReg = logistic regression, RF = random forest, GB = gradient boosting, NN = neural network, KNN = K-nearest neighbor, SVM = support vector machine, SVMAnom = SVM anomaly detectionAbstract 802 Figure 2The 31 most important variables determined by the logistic regression (A, coefficients) and random forest (B, relative importance) models


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar ◽  
Jared M. Peterson

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.


2020 ◽  
Vol 8 (5) ◽  
pp. 252-253
Author(s):  
Stefan Krüger

Background: The study aimed to investigate the predictive value of the quick sequential organ failure assessment (qSOFA) for clinical outcomes in emergency patients with community-acquired pneumonia (CAP). Methods: A total of 742 CAP cases from the emergency department (ED) were enrolled in this study. The scoring systems including the qSOFA, SOFA and CURB-65 (confusion, urea, respiratory rate, blood pressure and age) were used to predict the prognostic outcomes of CAP in ICU-admission, acute respiratory distress syndrome (ARDS) and 28-day mortality. According to the area under the curve (AUC) of the receiver operating characteristic (ROC) curves, the accuracies of prediction of the scoring systems were analyzed among CAP patients. Results: The AUC values of the qSOFA, SOFA and CURB-65 scores for ICU-admission among CAP patients were 0.712 (95%CI: 0.678–0.745, P < 0.001), 0.744 (95%CI: 0.711–0.775, P < 0.001) and 0.705 (95%CI: 0.671–0.738, P < 0.001), respectively. For ARDS, the AUC values of the qSOFA, SOFA and CURB-65 scores were 0.730 (95%CI: 0.697–0.762, P < 0.001), 0.724 (95%CI: 0.690–0.756, P < 0.001) and 0.749 (95%CI: 0.716–0.780, P < 0.001), respectively. After 28 days of follow-up, the AUC values of the qSOFA, SOFA and CURB-65 scores for 28-day mortality were 0.602 (95%CI: 0.566–0.638, P < 0.001), 0.587 (95%CI: 0.551–0.623, P < 0.001) and 0.614 (95%CI: 0.577–0.649, P < 0.001) in turn. There were no statistical differences between qSOFA and SOFA scores for predicting ICU-admission (Z = 1.482, P = 0.138), ARDS (Z = 0.321, P = 0.748) and 28-day mortality (Z = 0.573, P = 0.567). Moreover, we found no differences to predict the ICU-admission (Z = 0.370, P = 0.712), ARDS (Z = 0.900, P = 0.368) and 28-day mortality (Z = 0.768, P = 0.442) using qSOFA or CURB-65 scores. Conclusion: qSOFA was not inferior to SOFA or CURB-65 scores in predicting the ICU-admission, ARDS and 28-day mortality of patients presenting in the ED with CAP.


2021 ◽  
pp. 22-37
Author(s):  
Han Gao ◽  
Pei Shan Fam ◽  
Lea Tien Tay ◽  
Heng Chin Low

Tree-based gradient boosting (TGB) models gain popularity in various areas due to their powerful prediction ability and fast processing speed. This study aims to compare the landslide spatial prediction performance of TGB models and non-tree-based machine learning (NML) models in Penang Island, Malaysia. Two specific instances of TGB models, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) and two specific instances of NML models, artificial neural network (ANN) and support vector machine (SVM), are applied to make predictions of landslide susceptibility. Feature selection and oversampling techniques are considered to improve the prediction performance as well. The results are analyzed and discussed mainly based on receiver operating characteristic (ROC) curves as well as the area under the curves (AUC). The results show that TGB models give better prediction performance compared to NML models, no matter what the sample size is. The TGB models’ performances are improved when training with the dataset considering either feature selection or oversampling techniques. The highest AUC value of 0.9525 is obtained from the combination of XGBoost and SMOTE. The landslide susceptibility maps (LSMs) produced by XGBoost and LightGBM can provide valuable information in landslide management and mitigation in Penang Island, Malaysia.


Children ◽  
2021 ◽  
Vol 8 (5) ◽  
pp. 331
Author(s):  
Ryoji Aoki ◽  
Nobuhiko Nagano ◽  
Aya Okahashi ◽  
Shoko Ohashi ◽  
Yoshinori Fujinaka ◽  
...  

This study aimed to devise a novel physique index and investigate its accuracy in identifying newborns with skeletal dysplasia in comparison with head circumference (HC)/height (HT) ratio. The birth weight (W), HT, and HC at birth of 1500 newborns were retrospectively collected. The linear regression equations and coefficients of determination (R2) were determined. The formulated equation was corrected by the mean weight for gestational age at birth (Wcorr) as a novel physique index for screening skeletal dysplasia. The index accuracy was assessed using receiver operating characteristic (ROC) curves in 11 newborns by fetal ultrasound and compared with that of the HC/HT ratio. The R2 values between W and HT, (HT)2, and (HT) 3 were 0.978, 0.990, and 0.993, respectively. Those between W and HC, (HC)2, and (HC)3 were 0.974, 0.984, and 0.988, respectively. W/Wcorr × (HC/HT)3 was used as a novel physique index. Seven newborns had skeletal dysplasia. Our novel physique index had a higher area under the curve (AUC), sensitivity, and specificity than the HC/HT ratio (AUC: 1.00 vs. 0.86, sensitivity: 1.00 vs. 0.86, and specificity: 1.00 vs. 0.75, respectively). Our novel physique index was more accurate than HC/HT ratio and has the potential to accurately identify newborns with skeletal dysplasia.


Biomolecules ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 1059
Author(s):  
Sarah Atef Fahim ◽  
Mahmoud Salah Abdullah ◽  
Nancy A. Espinoza-Sánchez ◽  
Hebatallah Hassan ◽  
Ayman M. Ibrahim ◽  
...  

Inflammatory breast cancer (IBC) is a rare yet aggressive breast cancer variant, associated with a poor prognosis. The major challenge for IBC is misdiagnosis due to the lack of molecular biomarkers. We profiled dysregulated expression of microRNAs (miRNAs) in primary samples of IBC and non-IBC tumors using human breast cancer miRNA PCR array. We discovered that 28 miRNAs were dysregulated (10 were upregulated, while 18 were underexpressed) in IBC vs. non-IBC tumors. We identified 128 hub genes, which are putative targets of the differentially expressed miRNAs and modulate important cancer biological processes. Furthermore, our qPCR analysis independently verified a significantly upregulated expression of miR-181b-5p, whereas a significant downregulation of miR-200b-3p, miR-200c-3p, and miR-203a-3p was detected in IBC tumors. Receiver operating characteristic (ROC) curves implied that the four miRNAs individually had a diagnostic accuracy in discriminating patients with IBC from non-IBC and that miR-203a-3p had the highest diagnostic value with an AUC of 0.821. Interestingly, a combination of miR-181b-5p, miR-200b-3p, and miR-200c-3p robustly improved the diagnostic accuracy, with an area under the curve (AUC) of 0.897. Intriguingly, qPCR revealed that the expression of zinc finger E box-binding homeobox 2 (ZEB2) mRNA, the putative target of miR-200b-3p, miR-200c-3p, and miR-203a-3p, was upregulated in IBC tumors. Overall, this study identified a set of miRNAs serving as potential biomarkers with diagnostic relevance for IBC.


2019 ◽  
Vol 58 (1) ◽  
pp. 50-58 ◽  
Author(s):  
Lorenz Kuessel ◽  
Heinrich Husslein ◽  
Eliana Montanari ◽  
Michael Kundi ◽  
Gottfried Himmler ◽  
...  

Abstract Background We investigated the dynamics and the predictive value of soluble syndecan-1 (Sdc-1), a biomarker of endothelial dysfunction, in uneventful pregnancies and pregnancies complicated by preeclampsia (PE). Methods Serum levels of Sdc-1 were measured at sequential time points during and after uneventful pregnancies (control, n = 95) and pregnancies developing PE (PE_long, n = 12). Levels were further measured in women with symptomatic PE (PE_state, n = 46) at a single time point. Results Sdc-1 levels increased consistently throughout pregnancy. In the PE_long group Sdc-1 levels were lower at all visits throughout pregnancy, and reached significance in weeks 18–22 (p = 0.019), 23–27 (p = 0.009), 28–32 (p = 0.006) and 33–36 (p = 0.008). After delivery, Sdc-1 levels dropped sharply in all pregnancies but were significantly elevated in the PE_long group. The predictive power of Sdc-1 was evaluated analyzing receiver operating characteristic (ROC) curves. A significant power was reached at weeks 14–17 (area under the curve [AUC] 0.65, p = 0.025), 23–27 (AUC 0.73, p = 0.004) and 33–36 (AUC 0.75, p = 0.013). Conclusions In summary, Sdc-1 levels were lower in women developing PE compared to uneventful pregnancies and Sdc-1 might be useful to predict PE. After delivery, Sdc-1 levels remained higher in women with PE. Additional studies investigating the link between glycocalyx degradation, Sdc-1 levels and placental and endothelial dysfunction in pregnancies affected by PE are warranted.


1996 ◽  
Vol 37 (1P1) ◽  
pp. 204-207
Author(s):  
M. Murakami ◽  
H. Watanabe ◽  
H. Nakata

Purpose: To test the clinical usefulness of computed radiography (CR) with a storage phosphor plate in upper gastrointestinal radiographic examinations, a newly devised phantom gastric mucosa was used. Material and Methods: Simulated small elevated and depressed lesions were created on a phantom gastric mucosa made from a styrofoam “plate”. Twenty-four sets of each CR and screen-film radiographs (SR) were obtained using phototimed exposures. Receiver operating characteristic (ROC) study and visual ranking using these images were performed. Results: There was no significant difference between the ROC curves of CR and SR. By visual ranking, CR was equal to or better than SR in most cases. In no case was SR definitely superior to CR. Conclusion: CR can be safely applied in upper gastrointestinal roentgenologic examinations.


2011 ◽  
Vol 42 (3) ◽  
pp. 545-555 ◽  
Author(s):  
S. Strand ◽  
T. E. McEwan

BackgroundFemale stalkers account for 10–25% of all stalking cases, yet little is known about risk factors for female stalking violence. This study identifies risk factors for female stalking violence and contrasts these with risk factors for male stalking violence.MethodSeventy-one female and 479 male stalkers presenting to police in Sweden and a specialist stalking clinic in Australia were investigated. Univariate comparisons of behaviour by gender, and comparisons between violent and non-violent female stalkers, were undertaken. Logistic regression was then used to develop a predictive model for stalking violence based on demographic, offence and clinical characteristics.ResultsRates of violence were not significantly different between genders (31% of males and 23% of females). For both men and women, violence was associated with a combination of a prior intimate relationship with the victim, threats and approach behaviour. This model produced receiver operating characteristic (ROC) curves with area under the curve (AUC)=0.80 for female stalkers and AUC=0.78 for male stalkers. The most notable gender difference was significantly higher rates of personality disorder among women. High rates of psychotic disorder were found in both genders. Stalking violence was directly related to psychotic symptoms for a small number of women.ConclusionsSimilar risk factors generally predict stalking violence between genders, providing initial support for a similar approach to risk assessment for all stalkers. The most notable gender difference was the prevalence of personality and psychotic disorders among female stalkers, supporting an argument for routine psychiatric assessment of women charged with stalking.


2020 ◽  
Vol 62 (1) ◽  
pp. 155-162
Author(s):  
Yuya Miyasaka ◽  
Noriyuki Kadoya ◽  
Rei Umezawa ◽  
Yoshiki Takayama ◽  
Kengo Ito ◽  
...  

Abstract We compared predictive performance between dose volume histogram (DVH) parameter addition and deformable image registration (DIR) addition for gastrointestinal (GI) toxicity in cervical cancer patients. A total of 59 patients receiving brachytherapy and external beam radiotherapy were analyzed retrospectively. The accumulative dose was calculated by three methods: conventional DVH parameter addition, full DIR addition and partial DIR addition. ${D}_{2{cm}^3}$, ${D}_{1{cm}^3}$ and ${D}_{0.1{cm}^3}$ (minimum doses to the most exposed 2 cm3, 1cm3 and 0.1 cm3 of tissue, respectively) of the rectum and sigmoid were calculated by each method. V50, V60 and V70 Gy (volume irradiated over 50, 60 and 70 Gy, respectively) were calculated in full DIR addition. The DVH parameters were compared between toxicity (≥grade1) and non-toxicity groups. The area under the curve (AUC) of the receiver operating characteristic (ROC) curves were compared to evaluate the predictive performance of each method. The differences between toxicity and non-toxicity groups in ${D}_{2{cm}^3}$ were 0.2, 5.7 and 3.1 Gy for the DVH parameter addition, full DIR addition and partial DIR addition, respectively. The AUCs of ${D}_{2{cm}^3}$ were 0.51, 0.67 and 0.57 for DVH parameter addition, full DIR addition and partial DIR addition, respectively. In full DIR addition, the difference in dose between toxicity and non-toxicity was the largest and AUC was the highest. AUCs of V50, V60 and V70 Gy were 0.51, 0.63 and 0.62, respectively, and V60 and V70 were high values close to the value of ${D}_{2{cm}^3}$ of the full DIR addition. Our results suggested that the full DIR addition may have the potential to predict toxicity more accurately than the conventional DVH parameter addition, and that it could be more effective to accumulate to all pelvic irradiation by DIR.


Sign in / Sign up

Export Citation Format

Share Document