scholarly journals Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine

2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Masayuki Yarimizu ◽  
Cao Wei ◽  
Yusuke Komiyama ◽  
Kokoro Ueki ◽  
Shugo Nakamura ◽  
...  

Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions.

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3827
Author(s):  
Gemma Urbanos ◽  
Alberto Martín ◽  
Guillermo Vázquez ◽  
Marta Villanueva ◽  
Manuel Villa ◽  
...  

Hyperspectral imaging techniques (HSI) do not require contact with patients and are non-ionizing as well as non-invasive. As a consequence, they have been extensively applied in the medical field. HSI is being combined with machine learning (ML) processes to obtain models to assist in diagnosis. In particular, the combination of these techniques has proven to be a reliable aid in the differentiation of healthy and tumor tissue during brain tumor surgery. ML algorithms such as support vector machine (SVM), random forest (RF) and convolutional neural networks (CNN) are used to make predictions and provide in-vivo visualizations that may assist neurosurgeons in being more precise, hence reducing damages to healthy tissue. In this work, thirteen in-vivo hyperspectral images from twelve different patients with high-grade gliomas (grade III and IV) have been selected to train SVM, RF and CNN classifiers. Five different classes have been defined during the experiments: healthy tissue, tumor, venous blood vessel, arterial blood vessel and dura mater. Overall accuracy (OACC) results vary from 60% to 95% depending on the training conditions. Finally, as far as the contribution of each band to the OACC is concerned, the results obtained in this work are 3.81 times greater than those reported in the literature.


2001 ◽  
Vol 21 (15) ◽  
pp. 5109-5121 ◽  
Author(s):  
Yann-Gaël Gangloff ◽  
Jean-Christophe Pointud ◽  
Sylvie Thuault ◽  
Lucie Carré ◽  
Christophe Romier ◽  
...  

ABSTRACT The RNA polymerase II transcription factor TFIID comprises the TATA binding protein (TBP) and a set of TBP-associated factors (TAFIIs). TFIID has been extensively characterized for yeast, Drosophila, and humans, demonstrating a high degree of conservation of both the amino acid sequences of the constituent TAFIIs and overall molecular organization. In recent years, it has been assumed that all the metazoan TAFIIs have been identified, yet no metazoan homologues of yeast TAFII47 (yTAFII47) and yTAFII65 are known. Both of these yTAFIIs contain a histone fold domain (HFD) which selectively heterodimerizes with that of yTAFII25. We have cloned a novel mouse protein, TAFII140, containing an HFD and a plant homeodomain (PHD) finger, which we demonstrated by immunoprecipitation to be a mammalian TFIID component. TAFII140 shows extensive sequence similarity toDrosophila BIP2 (dBIP2) (dTAFII155), which we also show to be a component of DrosophilaTFIID. These proteins are metazoan homologues of yTAFII47 as their HFDs selectively heterodimerize with dTAFII24 and human TAFII30, metazoan homologues of yTAFII25. We further show that yTAFII65 shares two domains with theDrosophila Prodos protein, a recently described potential dTAFII. These conserved domains are critical for yTAFII65 function in vivo. Our results therefore identify metazoan homologues of yTAFII47 and yTAFII65.


2018 ◽  
Vol 26 (1) ◽  
pp. 141-155 ◽  
Author(s):  
Li Luo ◽  
Fengyi Zhang ◽  
Yao Yao ◽  
RenRong Gong ◽  
Martina Fu ◽  
...  

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.


2020 ◽  
Author(s):  
Murad Megjhani ◽  
Kalijah Terilli ◽  
Ayham Alkhachroum ◽  
David J. Roh ◽  
Sachin Agarwal ◽  
...  

AbstractObjectiveTo develop a machine learning based tool, using routine vital signs, to assess delayed cerebral ischemia (DCI) risk over time.MethodsIn this retrospective analysis, physiologic data for 540 consecutive acute subarachnoid hemorrhage patients were collected and annotated as part of a prospective observational cohort study between May 2006 and December 2014. Patients were excluded if (i) no physiologic data was available, (ii) they expired prior to the DCI onset window (< post bleed day 3) or (iii) early angiographic vasospasm was detected on admitting angiogram. DCI was prospectively labeled by consensus of treating physicians. Occurrence of DCI was classified using various machine learning approaches including logistic regression, random forest, support vector machine (linear and kernel), and an ensemble classifier, trained on vitals and subject characteristic features. Hourly risk scores were generated as the posterior probability at time t. We performed five-fold nested cross validation to tune the model parameters and to report the accuracy. All classifiers were evaluated for good discrimination using the area under the receiver operating characteristic curve (AU-ROC) and confusion matrices.ResultsOf 310 patients included in our final analysis, 101 (32.6%) patients developed DCI. We achieved maximal classification of 0.81 [0.75-0.82] AU-ROC. We also predicted 74.7 % of all DCI events 12 hours before typical clinical detection with a ratio of 3 true alerts for every 2 false alerts.ConclusionA data-driven machine learning based detection tool offered hourly assessments of DCI risk and incorporated new physiologic information over time.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10884
Author(s):  
Xin Yu ◽  
Qian Yang ◽  
Dong Wang ◽  
Zhaoyang Li ◽  
Nianhang Chen ◽  
...  

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.


mBio ◽  
2020 ◽  
Vol 11 (3) ◽  
Author(s):  
Begüm D. Topçuoğlu ◽  
Nicholas A. Lesniak ◽  
Mack T. Ruffin ◽  
Jenna Wiens ◽  
Patrick D. Schloss

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.


Author(s):  
Sunday Olakunle Idowu ◽  
Amos Akintayo Fatokun

Oxidative stress induced by excessive levels of reactive oxygen species (ROS) underlies several diseases. Therapeutic strategies to combat oxidative damage are, therefore, a subject of intense scientific investigation to prevent and treat such diseases, with the use of phytochemical antioxidants, especially polyphenols, being a major part. Polyphenols, however, exhibit structural diversity that determines different mechanisms of antioxidant action, such as hydrogen atom transfer (HAT) and single-electron transfer (SET). They also suffer from inadequate in vivo bioavailability, with their antioxidant bioactivity governed by permeability, gut-wall and first-pass metabolism, and HAT-based ROS trapping. Unfortunately, no current antioxidant assay captures these multiple dimensions to be sufficiently “biorelevant,” because the assays tend to be unidimensional, whereas biorelevance requires integration of several inputs. Finding a method to reliably evaluate the antioxidant capacity of these phytochemicals, therefore, remains an unmet need. To address this deficiency, we propose using artificial intelligence (AI)-based machine learning (ML) to relate a polyphenol’s antioxidant action as the output variable to molecular descriptors (factors governing in vivo antioxidant activity) as input variables, in the context of a biomarker selectively produced by lipid peroxidation (a consequence of oxidative stress), for example F2-isoprostanes. Support vector machines, artificial neural networks, and Bayesian probabilistic learning are some key algorithms that could be deployed. Such a model will represent a robust predictive tool in assessing biorelevant antioxidant capacity of polyphenols, and thus facilitate the identification or design of antioxidant molecules. The approach will also help to fulfill the principles of the 3Rs (replacement, reduction, and refinement) in using animals in biomedical research.


2020 ◽  
pp. 009385482096975
Author(s):  
Mehdi Ghasemi ◽  
Daniel Anvari ◽  
Mahshid Atapour ◽  
J. Stephen wormith ◽  
Keira C. Stockdale ◽  
...  

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.


2019 ◽  
Vol 11 (16) ◽  
pp. 1943 ◽  
Author(s):  
Omid Rahmati ◽  
Saleh Yousefi ◽  
Zahra Kalantari ◽  
Evelyn Uuemaa ◽  
Teimur Teimurian ◽  
...  

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.


2010 ◽  
Vol 2010 ◽  
pp. 1-9 ◽  
Author(s):  
Seizi Someya ◽  
Masanori Kakuta ◽  
Mizuki Morita ◽  
Kazuya Sumikoshi ◽  
Wei Cao ◽  
...  

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.


Sign in / Sign up

Export Citation Format

Share Document