scholarly journals Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization

2018 ◽  
Vol 35 (8) ◽  
pp. 1395-1403 ◽  
Author(s):  
Yuan Luo ◽  
Chengsheng Mao ◽  
Yiben Yang ◽  
Fei Wang ◽  
Faraz S Ahmad ◽  
...  

Abstract Motivation Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features. Results In this article, we present a hybrid non-negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real-world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype–genotype interactions that characterize cardiac abnormalities. Availability and implementation Our code is publicly available on github at https://github.com/yuanluo/hnmf. Supplementary information Supplementary data are available at Bioinformatics online.

Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 540
Author(s):  
Soodabeh Asadi ◽  
Janez Povh

This article uses the projected gradient method (PG) for a non-negative matrix factorization problem (NMF), where one or both matrix factors must have orthonormal columns or rows. We penalize the orthonormality constraints and apply the PG method via a block coordinate descent approach. This means that at a certain time one matrix factor is fixed and the other is updated by moving along the steepest descent direction computed from the penalized objective function and projecting onto the space of non-negative matrices. Our method is tested on two sets of synthetic data for various values of penalty parameters. The performance is compared to the well-known multiplicative update (MU) method from Ding (2006), and with a modified global convergent variant of the MU algorithm recently proposed by Mirzal (2014). We provide extensive numerical results coupled with appropriate visualizations, which demonstrate that our method is very competitive and usually outperforms the other two methods.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 3115-3115
Author(s):  
Kate E Ridout ◽  
Pauline Robbe ◽  
Doriane Cavalieri ◽  
Jennifer Becq ◽  
Miao He ◽  
...  

Abstract Background Chronic Lymphocytic Leukemia (CLL) is characterised by a highly heterogeneous natural history and treatment response. Indeed, 50% of immunoglobulin heavy chain variable region (IgHV) hypermutated patients have an excellent progression free survival (PFS) after chemoimmunotherapy. Conversely, 25% of FCR treated patients relapse within 24 months (high risk CLL). Recent studies have shown that complex karyotype with or without TP53 disruption predicts for relapse after BCL2 therapy and BTK inhibitors. However, TP53 is the only marker for which routine testing is available. Overall, nearly 80% of patients relapsing after frontline FCR do not present a known poor risk genomic marker. Additional candidate genomic predictors of poor outcome including mutations in coding regions of NOTCH1, SF3B1 and RPS15, non-coding regions of NOTCH1 and enhancer regions of PAX5, telomere length, IgHV status, and DNA Damage Repair (DDR) germline mutations including TP53 and ATM have been reported in CLL. Further, the role of mutational signatures and regions of kataegis also merit additional investigation in progressive CLL. Evaluating all candidate predictors requires complex time consuming, multi-modality testing outside the scope of routine clinical diagnostic practice, however, in isolation, each has low predictive value. Here, we show preliminary data on a novel patient stratification method based on whole genome sequencing (WGS) data incorporating multiple genomic features in a single test. Patients and Methods Tumor (peripheral blood) and germline (saliva) samples were collected from 321 patients from 6 UK trials via the Genomics England CLL pilot: ARCTIC (n=61), AdMIRe (n=64), CLL 210 (n=30), CLEAR (n=12), RIAltO (n=88) and FLAIR (n=66). We performed WGS on the HiSeqX (Illumina). After read alignment, we detected somatic variants using Strelka 2.4.7 for small variants detection (SNV and InDels), Manta 0.28.0 for structural variant (SV) detection, and Canvas 1.3.1 for copy number variant (CNV) detection (Illumina). Non-coding regions were annotated with information from primary CLL, CLL cell lines and B-cell ENCODE databases. Mutational signatures and putative regions of kataegis were calculated based on Alexandrov et al. (Nature, 2013) and Lawrence et al. (Nature, 2013). Telomere lengths were assessed using Telomerecat. Data aggregation was performed using contingency tables combined with non-negative matrix factorization. Results Mean coverage was 94.2X for tumor and 28.5X for germline samples. We found a median of 9172 SNPs/sample after filtering and 2348 indels/sample across 321 patients. High risk CLL was enriched for genomic complexity and poor prognostic mutations. The most frequently mutated genes were SF3B1 (17%), TP53 (13%), NOTCH1 (12%), IGLL5 (12%), and ATM (11%). Analysis of non-coding regions using DNA methylation markers, ATAC-seq and Hi-C revealed potential candidate regions associated with early relapse. Using CNA and SV data, we identified interesting patterns of genomic complexity and structural variants, including a trend towards enrichment of del8p in Relapse/Refractory and FCR non-responders. Additionally, we investigated mutation signatures and kataegis across coding and non-coding regions of the genome. We correlated exonic regions of DDR genes in germline data with clinical outcomes and extended this to genes mutated in both tumor and germline data, termed germline-tumor double-hits. We examined the relationship between the Alexandrov hypermutation signature, IgHV status (determined by % homology to the reference genome) and PFS, and combined mutational density at the Ig locus with mutation signature aiming to predict IgHV status. Finally, we produced a binary contingency matrix, using non-negative matrix factorization to cluster the samples. This method highlighted patient groups with shared genomic profiles. Conclusion We present preliminary data on a patient stratification method derived from WGS of 321 paired germline and CLL trial samples. Our predictive signature includes driver gene mutations, CNAs, IgHV status, genomic complexity, telomere length, overall mutation burden and genes with germline-tumor double-hits. Our comprehensive, NGS-based patient stratification attempts to predict patient outcome in a single sequencing run. Disclosures Becq: Illumina: Employment. He:Illumina: Employment. Ross:Illumina: Employment. Bentley:Illumina: Employment. Pettitt:Celgene: Research Funding; Gilead: Research Funding; Roche: Research Funding; GSK/Novartis: Research Funding; Napp: Research Funding; AstraZeneca: Research Funding; Chugai: Research Funding. Hillmen:Novartis: Research Funding; Gilead Sciences, Inc.: Honoraria, Research Funding; Alexion Pharmaceuticals, Inc: Consultancy, Honoraria; F. Hoffmann-La Roche Ltd: Research Funding; Celgene: Research Funding; Acerta: Membership on an entity's Board of Directors or advisory committees; Abbvie: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Pharmacyclics: Research Funding; Janssen: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding. Schuh:Giles, Roche, Janssen, AbbVie: Honoraria.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i154-i160 ◽  
Author(s):  
Xinrui Lyu ◽  
Jean Garret ◽  
Gunnar Rätsch ◽  
Kjong-Van Lehmann

Abstract Motivation Understanding the underlying mutational processes of cancer patients has been a long-standing goal in the community and promises to provide new insights that could improve cancer diagnoses and treatments. Mutational signatures are summaries of the mutational processes, and improving the derivation of mutational signatures can yield new discoveries previously obscured by technical and biological confounders. Results from existing mutational signature extraction methods depend on the size of available patient cohort and solely focus on the analysis of mutation count data without considering the exploitation of metadata. Results Here we present a supervised method that utilizes cancer type as metadata to extract more distinctive signatures. More specifically, we use a negative binomial non-negative matrix factorization and add a support vector machine loss. We show that mutational signatures extracted by our proposed method have a lower reconstruction error and are designed to be more predictive of cancer type than those generated by unsupervised methods. This design reduces the need for elaborate post-processing strategies in order to recover most of the known signatures unlike the existing unsupervised signature extraction methods. Signatures extracted by a supervised model used in conjunction with cancer-type labels are also more robust, especially when using small and potentially cancer-type limited patient cohorts. Finally, we adapted our model such that molecular features can be utilized to derive an according mutational signature. We used APOBEC expression and MUTYH mutation status to demonstrate the possibilities that arise from this ability. We conclude that our method, which exploits available metadata, improves the quality of mutational signatures as well as helps derive more interpretable representations. Availability and implementation https://github.com/ratschlab/SNBNMF-mutsig-public. Supplementary information Supplementary data are available at Bioinformatics online.


Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1757
Author(s):  
Bingjie Li ◽  
Xi Shi ◽  
Zhenyue Zhang

As a special class of non-negative matrix factorization, symmetric non-negative matrix factorization (SymNMF) has been widely used in the machine learning field to mine the hidden non-linear structure of data. Due to the non-negative constraint and non-convexity of SymNMF, the efficiency of existing methods is generally unsatisfactory. To tackle this issue, we propose a two-phase algorithm to solve the SymNMF problem efficiently. In the first phase, we drop the non-negative constraint of SymNMF and propose a new model with penalty terms, in order to control the negative component of the factor. Unlike previous methods, the factor sequence in this phase is not required to be non-negative, allowing fast unconstrained optimization algorithms, such as the conjugate gradient method, to be used. In the second phase, we revisit the SymNMF problem, taking the non-negative part of the solution in the first phase as the initial point. To achieve faster convergence, we propose an interpolation projected gradient (IPG) method for SymNMF, which is much more efficient than the classical projected gradient method. Our two-phase algorithm is easy to implement, with convergence guaranteed for both phases. Numerical experiments show that our algorithm performs better than others on synthetic data and unsupervised clustering tasks.


Sign in / Sign up

Export Citation Format

Share Document