scholarly journals Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dimitri Boeckaerts ◽  
Michiel Stock ◽  
Bjorn Criel ◽  
Hans Gerstmans ◽  
Bernard De Baets ◽  
...  

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

2020 ◽  
Vol 35 (Supplement_3) ◽  
Author(s):  
Jerry Yu ◽  
Andrew Long ◽  
Maria Hanson ◽  
Aleetha Ellis ◽  
Michael Macarthur ◽  
...  

Abstract Background and Aims There are many benefits for performing dialysis at home including more flexibility and more frequent treatments. A possible barrier to election of home therapy (HT) by in-center patients is a lack of adequate HT education. To aid efficient education efforts, a predictive model was developed to help identify patients who are more likely to switch from in-center and succeed on HT. Method We developed a model using machine learning to predict which patients who are treated in-center without prior HT history are most likely to switch to HT in the next 90 days and stay on HT for at least 90 days. Training data was extracted from 2016–2019 for approximately 300,000 patients. We randomly sampled one in-center treatment date per patient and determined if the patient would switch and succeed on HT. The input features consisted of treatment vitals, laboratories, absence history, comprehensive assessments, facility information, county-level housing, and patient characteristics. Patients were excluded if they had less than 30 days on dialysis due to lack of data. A machine learning model (XGBoost classifier) was deployed monthly in a pilot with a team of HT educators to investigate the model’s utility for identifying HT candidates. Results There were approximately 1,200 patients starting a home therapy per month in a large dialysis provider, with approximately one-third being in-center patients. The prevalence of switching and succeeding to HT in this population was 2.54%. The predictive model achieved an area under the curve of 0.87, sensitivity of 0.77, and a specificity of 0.80 on a hold-out test dataset. The pilot was successfully executed for several months and two major lessons were learned: 1) some patients who reappeared on each month’s list should be removed from the list after expressing no interest in HT, and 2) a data collection mechanism should be put in place to capture the reasons why patients are not interested in HT. Conclusion This quality-improvement initiative demonstrates that predictive modeling can be used to identify patients likely to switch and succeed on home therapy. Integration of the model in existing workflows requires creating a feedback loop which can help improve future worklists.


2020 ◽  
Vol 21 (7) ◽  
pp. 546-557
Author(s):  
Rahul Semwal ◽  
Pritish Kumar Varadwaj

Aims: To develop a tool that can annotate subcellular localization of human proteins. Background: With the progression of high throughput human proteomics projects, an enormous amount of protein sequence data has been discovered in the recent past. All these raw sequence data require precise mapping and annotation for their respective biological role and functional attributes. The functional characteristics of protein molecules are highly dependent on the subcellular localization/ compartment. Therefore, a fully automated and reliable protein subcellular localization prediction system would be very useful for current proteomic research. Objective: To develop a machine learning-based predictive model that can annotate the subcellular localization of human proteins with high accuracy and precision. Methods: In this study, we used the PSI-CD-HIT homology criterion and utilized the sequence-based features of protein sequences to develop a powerful subcellular localization predictive model. The dataset used to train the HumDLoc model was extracted from a reliable data source, Uniprot knowledge base, which helps the model to generalize on the unseen dataset. Result : The proposed model, HumDLoc, was compared with two of the most widely used techniques: CELLO and DeepLoc, and other machine learning-based tools. The result demonstrated promising predictive performance of HumDLoc model based on various machine learning parameters such as accuracy (≥97.00%), precision (≥0.86), recall (≥0.89), MCC score (≥0.86), ROC curve (0.98 square unit), and precision-recall curve (0.93 square unit). Conclusion: In conclusion, HumDLoc was able to outperform several alternative tools for correctly predicting subcellular localization of human proteins. The HumDLoc has been hosted as a web-based tool at https://bioserver.iiita.ac.in/HumDLoc/.


2020 ◽  
Author(s):  
Nida Fatima

Abstract Background: Preoperative prognostication of clinical and surgical outcome in patients with neurosurgical diseases can improve the risk stratification, thus can guide in implementing targeted treatment to minimize these events. Therefore, the author aims to highlight the development and validation of predictive models determining neurosurgical outcomes through machine learning algorithms using logistic regression.Methods: Logistic regression (enter, backward and forward) and least absolute shrinkage and selection operator (LASSO) method for selection of variables from selected database can eventually lead to multiple candidate models. The final model with a set of predictive variables must be selected based upon the clinical knowledge and numerical results.Results: The predictive model which performed best on the discrimination, calibration, Brier score and decision curve analysis must be selected to develop machine learning algorithms. Logistic regression should be compared with the LASSO model. Usually for the big databases, the predictive model selected through logistic regression gives higher Area Under the Curve (AUC) than those with LASSO model. The predictive probability derived from the best model could be uploaded to an open access web application which is easily deployed by the patients and surgeons to make a risk assessment world-wide.Conclusions: Machine learning algorithms provide promising results for the prediction of outcomes following cranial and spinal surgery. These algorithms can provide useful factors for patient-counselling, assessing peri-operative risk factors, and predicting post-operative outcomes after neurosurgery.


2015 ◽  
Vol 22 (2) ◽  
pp. 111-118 ◽  
Author(s):  
Fahad M.A. Al-Hemaid ◽  
M. Ajmal Ali ◽  
Joongku Lee ◽  
Soo-Yong Kim ◽  
Md. Oliur Rahman

The present study explored molecular phylogenetic analysis of 28 species of Euphorbia L. for the identification and establishment of molecular evolutionary relationships of Euphorbia scordifolia Jacq. within the genus based on the internal transcribed spacers (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA (nrDNA). The sequence similarity search using Basic Local Alignment Search Tool (BLAST) of the ITS sequence of E. scordifolia showed the closest sequence similarity to E. supina Raf. The analysis of ITS sequence data revealed four major clades consistent with subgeneric classifications of the genus. Molecular data support placement of E. scordifolia in the subgenus Chamaesyce.Bangladesh J. Plant Taxon. 22(2): 111-118, 2015 (December)


2013 ◽  
Vol 20 (2) ◽  
pp. 233-238 ◽  
Author(s):  
M. Ajmal Ali ◽  
Fahad M. Al-Hemaid ◽  
Ritesh K. Choudhary ◽  
Joongku Lee ◽  
Soo-Yong Kim ◽  
...  

The present study focuses on the status of Reseda pentagyna Abdallah & A.G. Miller (Resedaceae). The internal transcribed spacer (ITS) region of nuclear ribosomal DNA and chloroplast trnL-F gene of the questioned species were sequenced. The Basic Local Alignment Search Tool (BLAST) search showed maximum identity with R. stenostachya. The parsimony analysis of ITS, trnL-F and combined sequences data analyses revealed grouping of Reseda species consistent with established taxonomic sections of the genus, R. pentagyna showed proximity with R. stenostachya (100% bootstrap support), nested within the clade of section Reseda.DOI: http://dx.doi.org/10.3329/bjpt.v20i2.17397Bangladesh J. Plant Taxon. 20(2): 233-238, 2013


2020 ◽  
Author(s):  
Janani Durairaj ◽  
Elena Melillo ◽  
Harro J Bouwmeester ◽  
Jules Beekwilder ◽  
Dick de Ridder ◽  
...  

AbstractSesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.Author summaryPredicting enzyme function is a popular problem in the bioinformatics field that grows more pressing with the increase in protein sequences, and more attainable with the increase in experimentally characterized enzymes. Terpenes and terpenoids form the largest classes of natural products and find use in many drugs, flavouring agents, and perfumes. Terpene synthases catalyze the biosynthesis of terpenes via multiple cyclizations and carbocation rearrangements, generating a vast array of product skeletons. In this work, we present a three-pronged computational approach to predict carbocation specificity in sesquiterpene synthases, a subset of terpene synthases with one of the highest diversities of products. Using homology modelling, machine learning and co-evolutionary analysis, our approach combines sparse structural data, large amounts of uncharacterized sequence data, and the current set of experimentally characterized enzymes to provide insight into residues and structural regions that likely play a role in determining product specifcity. Similar techniques can be repurposed for function prediction and enzyme engineering in many other classes of enzymes.


Diagnostics ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 307 ◽  
Author(s):  
Chih-Min Tsai ◽  
Chun-Hung Richard Lin ◽  
Huan Zhang ◽  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
...  

Blood culture is frequently used to detect bacteremia in febrile children. However, a high rate of negative or false-positive blood culture results is common at the pediatric emergency department (PED). The aim of this study was to use machine learning to build a model that could predict bacteremia in febrile children. We conducted a retrospective case-control study of febrile children who presented to the PED from 2008 to 2015. We adopted machine learning methods and cost-sensitive learning to establish a predictive model of bacteremia. We enrolled 16,967 febrile children with blood culture tests during the eight-year study period. Only 146 febrile children had true bacteremia, and more than 99% of febrile children had a contaminant or negative blood culture result. The maximum area under the curve of logistic regression and support vector machines to predict bacteremia were 0.768 and 0.832, respectively. Using the predictive model, we can categorize febrile children by risk value into five classes. Class 5 had the highest probability of having bacteremia, while class 1 had no risk. Obtaining blood cultures in febrile children at the PED rarely identifies a causative pathogen. Prediction models can help physicians determine whether patients have bacteremia and may reduce unnecessary expenses.


2021 ◽  
pp. 47-51
Author(s):  
Ikechi – Nwogu, Chinyerum Gloria ◽  
B. A. Odogwu ◽  
O. G. Obiakoeze

Broccoli (Brassica oleracea var. italica) is a nutritional vegetable that looks like a small tree. Despite the fact that it is extensively loaded with arrays of vitamins, minerals, fiber and antioxidants, it has been observed that it has a short lifespan of not more than 2-5 days due to post-harvest deterioration. A study was conducted to isolate and identify the common fungal pathogens causing post-harvest deterioration of broccoli crown. Diseased broccoli crowns were collected from Ogunabali Fruit Garden Market in D-Line, Port Harcourt Local Government Area of Rivers State. Fungal isolates were collected and morphologically identified. The DNA of the most common fungal isolate, BC-3B was molecularly characterized using Internal Transcribed Spacer 4 and 5 (ITS-4 and 5) molecular markers. The morphological studies revealed that the BC-3B isolate was an Aspergillus niger. The BC-3B isolate DNA sequence was aligned using Basic Local Alignment Search Tool for Nucleotide (BLASTN) 2.8.0 version of National Center for Biotechnology Information (NCBI) database. The molecular weight of the DNA of the isolates was over 600base pairs. Based on sequence similarity, it was observed that the broccoli isolate BC-3B was 93% identical to Aspergillus niger. From the above results, these findings showed that Aspergillus niger is the causal fungal pathogen of post-harvest rot of broccoli. Phylogenetic tree was constructed to access the relationship between the isolates obtained from this study. This study has provided information on some of the fungal organisms found in broccoli. It is anticipated that this result will provide information for disease control approach for alleviating the post-harvest losses of broccoli caused by Aspergillus niger and provide a foundation for further study of possible harm of consuming diseased broccoli.


Author(s):  
Timothy L. Bailey

We are in the midst of an explosive increase in the number of DNA and protein sequences available for study, as various genome projects come on line. This wealth of information offers important opportunities for understanding many biological processes and developing new plant and animal models, and ultimately drugs, for human diseases, in addition to other applications of modern biotechnology. Unfortunately, sequences are accumulating at a pace that strains present methods for extracting significant biological information from them. A consequence of this explosion in the sequence databases is that there is much interest and effort in developing tools that can efficiently and automatically extract the relevant biological information in sequence data and make it available for use in biology and medicine. In this chapter, we describe one such method that we have developed based on algorithms from artificial intelligence research. We call this software tool MEME (Multiple Expectation-maximization for Motif Elicitation). It has the attractive property that it is an “unsupervised” discovery tool: it can identify motifs, such as regulatory sites in DNA and functional domains in proteins, from large or small groups of unaligned sequences. As we show below, motifs are a rich source of information about a dataset; they can be used to discover other homologs in a database, to identify protein subsets that contain one or more motifs, and to provide information for mutagenesis studies to elucidate structure and function in the protein family as well as its evolution. Learning tools are used to extract higher level biological patterns from lower level DNA and protein sequence data. In contrast, search tools such as BLAST (Basic Local Alignment Search Tool) take a given higher level pattern and find all items in a database that possess the pattern. Searching for items that have a certain pattern is a problem intrinsically easier than discovering what the pattern is from items that possess it. The patterns considered here are motifs, which for DNA data can be subsequences that interact with transcription factors, polymerases, and other proteins.


2015 ◽  
Vol 54 ◽  
pp. 58-64 ◽  
Author(s):  
Aisling O’Driscoll ◽  
Vladislav Belogrudov ◽  
John Carroll ◽  
Kai Kropp ◽  
Paul Walsh ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document