scholarly journals KCF-Convoy: efficient Python package to convert KEGG Chemical Function and Substructure fingerprints

2018 ◽  
Author(s):  
Masayuki Sato ◽  
Hirotaka Suetake ◽  
Masaaki Kotera

AbstractMotivationIn silico methodologies to assess pharmaceutical activity and toxicity are increasingly important in QSAR, and many chemical fingerprints have been developed to tackle this problem. Among them, KEGG Chemical Function and Substructure (KCF-S) has been shown to perform well in some pharmaceutical and metabolic studies. However, the software that generates KCF-S fingerprints has limited usability: the input file must be Molfile or SDF format, and the output is only a text file.ResultsWe established a new Python package, KCF-Convoy, to generate KCF format and KCF-S fingerprints from Molfile, SDF, SMILES, and InChI seamlessly. The obtained KCF-S was used in a number of supervised machine-learning methods to distinguish herbicides from other pesticides, and to find characteristic substructures in taxonomy groups.AvailabilityKCF-Convoy is implemented as a Python package freely available at https://github.com/KCF-Convoy and the user can use the package management system “pip” and also the Docker [email protected]

RSC Advances ◽  
2017 ◽  
Vol 7 (11) ◽  
pp. 6697-6703 ◽  
Author(s):  
Qin Wang ◽  
Xiao Li ◽  
Hongbin Yang ◽  
Yingchun Cai ◽  
Yinyin Wang ◽  
...  

Chemical fingerprints combined with machine learning methods were used to build binary classification models for predicting the potential EC/EI of compounds.


2014 ◽  
Vol 14 (16) ◽  
pp. 1913-1922 ◽  
Author(s):  
Dimitar Dobchev ◽  
Girinath Pillai ◽  
Mati Karelson

2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 598.2-598
Author(s):  
E. Myasoedova ◽  
A. Athreya ◽  
C. S. Crowson ◽  
R. Weinshilboum ◽  
L. Wang ◽  
...  

Background:Methotrexate (MTX) is the most common anchor drug for rheumatoid arthritis (RA), but the risk of missing the opportunity for early effective treatment with alternative medications is substantial given the delayed onset of MTX action and 30-40% inadequate response rate. There is a compelling need to accurately predicting MTX response prior to treatment initiation, which allows for effectively identifying patients at RA onset who are likely to respond to MTX.Objectives:To test the ability of machine learning approaches with clinical and genomic biomarkers to predict MTX response with replications in independent samples.Methods:Age, sex, clinical, serological and genome-wide association study (GWAS) data on patients with early RA of European ancestry from 647 patients (336 recruited in United Kingdom [UK]; 307 recruited across Europe; 70% female; 72% rheumatoid factor [RF] positive; mean age 54 years; mean baseline Disease Activity Score with 28-joint count [DAS28] 5.65) of the PhArmacogenetics of Methotrexate in RA (PAMERA) consortium was used in this study. The genomics data comprised 160 genome-wide significant single nucleotide polymorphisms (SNPs) with p<1×10-5 associated with risk of RA and MTX metabolism. DAS28 score was available at baseline and 3-month follow-up visit. Response to MTX monotherapy at the dose of ≥15 mg/week was defined as good or moderate by the EULAR response criteria at 3 months’ follow up visit. Supervised machine-learning methods were trained with 5-repeats and 10-fold cross-validation using data from PAMERA’s 336 UK patients. Class imbalance (higher % of MTX responders) in training was accounted by using simulated minority oversampling technique. Prediction performance was validated in PAMERA’s 307 European patients (not used in training).Results:Age, sex, RF positivity and baseline DAS28 data predicted MTX response with 58% accuracy of UK and European patients (p = 0.7). However, supervised machine-learning methods that combined demographics, RF positivity, baseline DAS28 and genomic SNPs predicted EULAR response at 3 months with area under the receiver operating curve (AUC) of 0.83 (p = 0.051) in UK patients, and achieved prediction accuracies (fraction of correctly predicted outcomes) of 76.2% (p = 0.054) in the European patients, with sensitivity of 72% and specificity of 77%. The addition of genomic data improved the predictive accuracies of MTX response by 19% and achieved cross-site replication. Baseline DAS28 scores and following SNPs rs12446816, rs13385025, rs113798271, and rs2372536 were among the top predictors of MTX response.Conclusion:Pharmacogenomic biomarkers combined with DAS28 scores predicted MTX response in patients with early RA more reliably than using demographics and DAS28 scores alone. Using pharmacogenomics biomarkers for identification of MTX responders at early stages of RA may help to guide effective RA treatment choices, including timely escalation of RA therapies. Further studies on personalized prediction of response to MTX and other anti-rheumatic treatments are warranted to optimize control of RA disease and improve outcomes in patients with RA.Disclosure of Interests:Elena Myasoedova: None declared, Arjun Athreya: None declared, Cynthia S. Crowson Grant/research support from: Pfizer research grant, Richard Weinshilboum Shareholder of: co-founder and stockholder in OneOme, Liewei Wang: None declared, Eric Matteson Grant/research support from: Pfizer, Consultant of: Boehringer Ingelheim, Gilead, TympoBio, Arena Pharmaceuticals, Speakers bureau: Simply Speaking


2021 ◽  
Vol 13 (5) ◽  
pp. 974
Author(s):  
Lorena Alves Santos ◽  
Karine Ferreira ◽  
Michelle Picoli ◽  
Gilberto Camara ◽  
Raul Zurita-Milla ◽  
...  

The use of satellite image time series analysis and machine learning methods brings new opportunities and challenges for land use and cover changes (LUCC) mapping over large areas. One of these challenges is the need for samples that properly represent the high variability of land used and cover classes over large areas to train supervised machine learning methods and to produce accurate LUCC maps. This paper addresses this challenge and presents a method to identify spatiotemporal patterns in land use and cover samples to infer subclasses through the phenological and spectral information provided by satellite image time series. The proposed method uses self-organizing maps (SOMs) to reduce the data dimensionality creating primary clusters. From these primary clusters, it uses hierarchical clustering to create subclusters that recognize intra-class variability intrinsic to different regions and periods, mainly in large areas and multiple years. To show how the method works, we use MODIS image time series associated to samples of cropland and pasture classes over the Cerrado biome in Brazil. The results prove that the proposed method is suitable for identifying spatiotemporal patterns in land use and cover samples that can be used to infer subclasses, mainly for crop-types.


2016 ◽  
Vol 8 (4) ◽  
pp. 271 ◽  
Author(s):  
Zhiling Guo ◽  
Xiaowei Shao ◽  
Yongwei Xu ◽  
Hiroyuki Miyazaki ◽  
Wataru Ohira ◽  
...  

PLoS ONE ◽  
2016 ◽  
Vol 11 (12) ◽  
pp. e0166898 ◽  
Author(s):  
Monique A. Ladds ◽  
Adam P. Thompson ◽  
David J. Slip ◽  
David P. Hocking ◽  
Robert G. Harcourt

2021 ◽  
Author(s):  
Mohamed Ibrahim Mohamed ◽  
Dinesh Mehta ◽  
Erdal Ozkan

Abstract Determining the closure pressure is crucial for optimal hydraulic fracturing design and successful execution of fracturing treatment. Historically, the use of diagnostic tests before the main fracturing treatment has significantly advanced to gain more information about the pattern of fracture propagation and fluid performance to optimize the designs. The goal is to inject a small volume of fracturing fluid to breakdown the formation and create small fracture geometry, then once pumping is stopped the pressure decline is analyzed to observe the fracture closure. Many analytical methods such as G-Function, square root of time, etc. have been developed to determine the fracture closure pressure. There are cases in which there is difficulty in determining the fracture closure pressure, as well as personal bias and field experiences make it challenging to interpret the changes in the pressure derivative slope and identify fracture closure. These conditions include: High permeability reservoirs where fracture closure occurs very fast due to the quick fluid leakoff.Extremely low permeability reservoir, which requires a long shut-in time for the fluid to leak off and determine the fracture closure pressure.The non-ideal fluid leak-off behavior under complex conditions. The objective of this study is to apply machine learning methods to implement a predesigned algorithm to execute the required tasks and predict the fracture closure pressure while minimizing the shortcomings in determining the closure pressure for non-ideal or subjective conditions. This paper demonstrates training different supervised machine learning algorithms to help predict fracture closure pressure. The workflow involves using the datasets to train and optimize the models, which subsequently are used to predict the closure pressure of testing data. The output results are then compared with actual results from more than 120 DFIT data points. We further propose an integrated approach to feature selection and dataset processing and study the effects of data processing on the success of the model prediction. The results from this study limit the subjectivity and the need for the experience of personal interpreting the data. We speculate that a linear regression and MLP neural network algorithms can yield high scores in the prediction of fracture closure pressure.


Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


Sign in / Sign up

Export Citation Format

Share Document