scholarly journals A novel machine learning-based approach for the computational functional assessment of pharmacogenomic variants

Author(s):  
Maria-Theodora Pandi ◽  
Maria Koromina ◽  
Iordanis Tsafaridis ◽  
Sotirios Patsilinakos ◽  
Evangelos Christoforou ◽  
...  

Abstract Background: The field of pharmacogenomics focuses on the way a person’s genome affects his or her response to a certain dose of a specified medication. The main aim is to utilize this information to guide and personalize the treatment in a way that maximizes the clinical benefits and minimizes the risks for the patients, thus fulfilling the promises of personalized medicine. Technological advances in genome sequencing, combined with the development of improved computational methods for the efficient analysis of the huge amount of generated data, have allowed the fast and inexpensive sequencing of a patient’s genome, hence rendering its incorporation into clinical routine practice a realistic possibility. Results: The potential availability of a vast number of identified genetic variants in a clinical setting highlights the necessity of developing a method to evaluate and prioritize this information towards its exploitation in guiding medication or dosing scheme systematically and effectively. In this direction, the present study examines the development of a computational model that can classify new variants according to their possible effects on protein function, which in turn affects drug response, by using as a training set a dataset of functionally validated single nucleotide variants (SNVs) located in pharmacogenes. Conclusion: Overall, the proposed model holds promise to lead to an extremely useful variant prioritization and scoring tool with interesting clinical applications in pharmacogenomics.

2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Maria-Theodora Pandi ◽  
Maria Koromina ◽  
Iordanis Tsafaridis ◽  
Sotirios Patsilinakos ◽  
Evangelos Christoforou ◽  
...  

Abstract Background The field of pharmacogenomics focuses on the way a person’s genome affects his or her response to a certain dose of a specified medication. The main aim is to utilize this information to guide and personalize the treatment in a way that maximizes the clinical benefits and minimizes the risks for the patients, thus fulfilling the promises of personalized medicine. Technological advances in genome sequencing, combined with the development of improved computational methods for the efficient analysis of the huge amount of generated data, have allowed the fast and inexpensive sequencing of a patient’s genome, hence rendering its incorporation into clinical routine practice a realistic possibility. Methods This study exploited thoroughly characterized in functional level SNVs within genes involved in drug metabolism and transport, to train a classifier that would categorize novel variants according to their expected effect on protein functionality. This categorization is based on the available in silico prediction and/or conservation scores, which are selected with the use of recursive feature elimination process. Toward this end, information regarding 190 pharmacovariants was leveraged, alongside with 4 machine learning algorithms, namely AdaBoost, XGBoost, multinomial logistic regression, and random forest, of which the performance was assessed through 5-fold cross validation. Results All models achieved similar performance toward making informed conclusions, with RF model achieving the highest accuracy (85%, 95% CI: 0.79, 0.90), as well as improved overall performance (precision 85%, sensitivity 84%, specificity 94%) and being used for subsequent analyses. When applied on real world WGS data, the selected RF model identified 2 missense variants, expected to lead to decreased function proteins and 1 to increased. As expected, a greater number of variants were highlighted when the approach was used on NGS data derived from targeted resequencing of coding regions. Specifically, 71 variants (out of 156 with sufficient annotation information) were classified as to “Decreased function,” 41 variants as “No” function proteins, and 1 variant in “Increased function.” Conclusion Overall, the proposed RF-based classification model holds promise to lead to an extremely useful variant prioritization and act as a scoring tool with interesting clinical applications in the fields of pharmacogenomics and personalized medicine.


2021 ◽  
Vol 15 (1) ◽  
pp. 151-160
Author(s):  
Hemant P. Kasturiwale ◽  
Sujata N. Kale

The Autonomous Nervous System (ANS) controls the nervous system and Heart Rate Variability (HRV) can be used as a diagnostic tool to diagnose heart defects. HRV can be classified into linear and nonlinear HRV indices which are used mostly to measure the efficiency of the model. For prediction of cardiac diseases, the selection and extraction features of machine learning model are effective. The available model used till date is based on HRV indices to predict the cardiac diseases accurately. The model could hardly throw light on specifics of indices, selection process and stability of the model. The proposed model is developed considering all facet electrocardiogram amplitude (ECG), frequency components, sampling frequency, extraction methods and acquisition techniques. The machine learning based model and its performance shall be tested using the standard BioSignal method, both on the data available and on the data obtained by the author. This is unique model developed by considering the vast number of mixtures sets and more than four complex cardiac classes. The statistical analysis is performed on a variety of databases such as MIT/BIH Normal Sinus Rhythm (NSR), MIT/BIH Arrhythmia (AR) and MIT/BIH Atrial Fibrillation (AF) and Peripheral Pule Analyser using feature compatibility techniques. The classifiers are trained for prediction with approximately 40000 sets of parameters. The proposed model reaches an average accuracy of 97.87 percent and is sensitive and précised. The best features are chosen from the different HRV features that will be used for classification. The present model was checked under all possible subject scenarios, such as the raw database and the non-ECG signal. In this sense, robustness is defined not only by the specificity parameter, but also by other measuring output parameters. Support Vector Machine (SVM), K-nearest Neighbour (KNN), Ensemble Adaboost (EAB) with Random Forest (RF) are tested in a 5% higher precision band and a lower band configuration. The Random Forest has produced better results, and its robustness has been established.


Author(s):  
Alessia Napoleone ◽  
Ann-Louise Andersen

Manufacturing companies are currently struggling with the need to deal with ever changing marker requirements and technological advances. They can develop the reconfigurability capability in their factories in order to deal with such context. Moreover, companies can implement shop floor digitalisation to enhance their reconfigurability. This paper sustains two arguments: (i) the possibility to enhance diagnosability as a critical reconfigurability characteristic through shop floor digitalisation; and (ii) the relevance of the human role in reaching diagnosability in a digitalised shop floor. The paper first presents a literature review and based on this, aspects of shop floor digitalisation supporting operators in enhancing the diagnosability are identified and synthesized in a 3-e model (error reduction, ergonomics, and easiness). Secondly, insights from a case study are interpreted through the literature-based model in order to both consolidate the theoretical results and emphasize the implications for practitioners. The findings of this paper indicate that the proposed model can support practitioners in taking specific actions in regard to shop floor digitalisation in order to improve operator-dependent diagnosability and, in turn, the reconfigurability capability.


2018 ◽  
Author(s):  
Leandro Radusky ◽  
Carlos Modenutti ◽  
Javier Delgado ◽  
Juan P. Bustamante ◽  
Sebastian Vishnopolska ◽  
...  

AbstractUnderstanding the functional effect of Single Amino acid Substitutions (SAS), derived from the occurrence of single nucleotide variants (SNVs), and their relation to disease development is a major issue in clinical genomics. Even though there are several bioinformatic algorithms and servers that predict if a SAS can be pathogenic or not they give little or non-information on the actual effect on the protein function. Moreover, many of these algorithms are able to predict an effect that no necessarily translates directly into pathogenicity. VarQ Web Server is an online tool that given an UniProt id automatically analyzes known and user provided SAS for their effect on protein activity, folding, aggregation and protein interactions among others. VarQ assessment was performed over a set of previously manually curated variants, showing its ability to correctly predict the phenotypic outcome and its underlying cause. This resource is available online at http://varq.qb.fcen.uba.ar/.Contact: [email protected] Information & Tutorials may be found in the webpage of the tool.


Author(s):  
Isabelle Bonnet ◽  
Vincent Enouf ◽  
Florence Morel ◽  
Vichita Ok ◽  
Jérémy Jaffré ◽  
...  

The GeneLEAD VIII (Diagenode, Belgium) is a new, fully automated, sample-to-result precision instrument for the extraction of DNA and PCR detection of Mycobacterium tuberculosis complex (MTBC) directly from clinical samples. The Deeplex Myc-TB® assay (Genoscreen, France) is a diagnostic kit based on the deep sequencing of a 24-plexed amplicon mix allowing simultaneously the detection of resistance to 13 antituberculous (antiTB) drugs and the determination of spoligotype. We evaluated the performance of a strategy combining the both mentioned tools to detect directly from clinical samples, in 8 days, MTBC and its resistance to 13 antiTB drugs, and identify potential transmission of strains from patient-to-patient. Using this approach, we screened 112 clinical samples (65 smear-negative) and 94 MTBC cultured strains. The sensitivity and the specificity of the GeneLEAD/Deeplex Myc-TB approach for MTBC detection were 79.3% and 100%, respectively. One hundred forty successful Deeplex Myc-TB results were obtained for 46 clinical samples and 94 strains, a total of 85.4% of which had a Deeplex Myc-TB susceptibility and resistance prediction consistent with phenotypic drug susceptibility testing (DST). Importantly, the Deeplex Myc-TB assay was able to detect 100% of the multidrug-resistant (MDR) MTBC tested. The lowest concordance rates were for pyrazinamide, ethambutol, streptomycin, and ethionamide (84.5%, 81.5%, 73%, and 55%, respectively) for which the determination of susceptibility or resistance is generally difficult with current tools. One of the main difficulties of Deeplex Myc-TB is to interpret the non-synonymous uncharacterized variants that can represent up to 30% of the detected single nucleotide variants. We observed a good level of concordance between Deeplex Myc-TB-spoligotyping and MIRU-VNTR despite a lower discriminatory power for spoligotyping. The median time to obtain complete results from clinical samples was 8 days (IQR 7–13) provided a high-throughput NGS sequencing platform was available. Our results highlight that the GeneLEAD/Deeplex Myc-TB approach could be a breakthrough in rapid diagnosis of MDR TB in routine practice.


2020 ◽  
Author(s):  
Tair Shauli ◽  
Nadav Brandes ◽  
Michal Linial

Abstract The characterization of human genetic variation in coding regions is fundamental to the understanding of protein function, structure and evolution. Amino-acid (AA) substitution matrices encapsulate the stochastic nature of such proteomic variation and are widely used in studying protein families and evolutionary processes. The conventional substitution matrices, namely BLOSUM and PAM, were constructed to reflect polymorphism across species. In this study, we analyzed the frequencies of >4.8M single nucleotide variants within the healthy human population to accurately represent proteomic variability within the human species, at codon and AA resolution. Our model exposes various AA substitutions which are observed more frequently in one specific direction than in the opposite direction. We further demonstrate that nucleotide substitution rates only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification and ion-binding sites, exposing purifying selection over a range of residue-based functions. These novel matrices provide a robust baseline for the analysis of protein variation in health and disease.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Jian-Sheng Wu ◽  
Hai-Feng Hu ◽  
Shan-Cheng Yan ◽  
Li-Hua Tang

Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer fromweak-labelproblem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation.


Author(s):  
Yin Li ◽  
Jie Gu ◽  
Fengkai Xu ◽  
Qiaoliang Zhu ◽  
Yiwei Chen ◽  
...  

Abstract N6-methyladenosine (m6A) modification can regulate a variety of biological processes. However, the implications of m6A modification in lung adenocarcinoma (LUAD) remain largely unknown. Here, we systematically evaluated the m6A modification features in more than 2400 LUAD samples by analyzing the multi-omics features of 23 m6A regulators. We depicted the genetic variation features of m6A regulators, and found mutations of FTO and YTHDF3 were linked to worse overall survival. Many m6A regulators were aberrantly expressed in tumors, among which FTO, IGF2BP3, YTHDF1 and RBM15 showed consistent alteration features across 11 independent cohorts. Besides, the regulator-pathway interaction network demonstrated that m6A modification was associated with various biological pathways, including immune-related pathways. The correlation between m6A regulators and tumor microenvironment was also assessed. We found that LRPPRC was negatively correlated with most tumor-infiltrating immune cells. On the other hand, we established a scoring tool named m6Sig, which was positively correlated with PD-L1 expression and could reflect both the tumor microenvironment characterization and prognosis of LUAD patients. Comparison of CNV between high and low m6Sig groups revealed differences on chromosome 7. Application of m6Sig on an anti-PD-L1 immunotherapy cohort confirmed that the high m6Sig group demonstrated therapeutic advantages and clinical benefits. Our study indicated that m6A modification is involved in many aspects of LUAD and contributes to tumor microenvironment formation. A better understanding of m6A modification will provide more insights into the molecular mechanisms of LUAD and facilitate developing more effective personalized treatment strategies. A web application was built along with this study (http://www.bioinfo-zs.com/luadexpress/).


2017 ◽  
Vol 45 (1) ◽  
pp. 275-285 ◽  
Author(s):  
Mingzi M. Zhang ◽  
Howard C. Hang

Reversible protein S-palmitoylation confers spatiotemporal control of protein function by modulating protein stability, trafficking and activity, as well as protein–protein and membrane–protein associations. Enabled by technological advances, global studies revealed S-palmitoylation to be an important and pervasive posttranslational modification in eukaryotes with the potential to coordinate diverse biological processes as cells transition from one state to another. Here, we review the strategies and tools to analyze in vivo protein palmitoylation and interrogate the functions of the enzymes that put on and take off palmitate from proteins. We also highlight palmitoyl proteins and palmitoylation-related enzymes that are associated with cellular differentiation and/or tissue development in yeasts, protozoa, mammals, plants and other model eukaryotes.


Sign in / Sign up

Export Citation Format

Share Document