scholarly journals Hierarchical Non-Negative Matrix Factorization Using Clinical Information for Microbial Communities.

2021 ◽  
Author(s):  
Ko Abe ◽  
Masaaki Hirayama ◽  
Kinji Ohno ◽  
Teppei Shimamura

Abstract Background: The human microbiome forms very complex communities that consist of hundreds to thousands of different microorganisms that not only affect the host, but also participate in disease processes. Several state-of-the-art methods have been proposed for learning the structure of microbial communities and to investigate the relationship between microorganisms and host environmental factors. However, these methods were mainly designed to model and analyze single microbial communities that do not interact with or depend on other communities. Such methods therefore cannot comprehend the properties between interdependent systems in communities that affect host behavior and disease processes. Results: We introduce a novel hierarchical Bayesian framework, called BALSAMICO (BAyesian Latent Semantic Analysis of MIcrobial COmmunities), which uses microbial metagenome data to discover the underlying microbial community structures and the associations between microbiota and their environmental factors. BALSAMICO models mixtures of communities in the framework of nonnegative matrix factorization, taking into account environmental factors. This method first proposes an efficient procedure for estimating parameters. A simulation then evaluates the accuracy of the estimated parameters. Finally, the method is used to analyze clinical data. In this analysis, we successfully detected bacteria related to colorectal cancer. These results show that the method not only accurately estimates the parameters needed to analyze the connections between communities of microbiota and their environments, but also allows for the effective detection of these communities in real-world circumstances.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ko Abe ◽  
Masaaki Hirayama ◽  
Kinji Ohno ◽  
Teppei Shimamura

Abstract Background The human microbiome forms very complex communities that consist of hundreds to thousands of different microorganisms that not only affect the host, but also participate in disease processes. Several state-of-the-art methods have been proposed for learning the structure of microbial communities and to investigate the relationship between microorganisms and host environmental factors. However, these methods were mainly designed to model and analyze single microbial communities that do not interact with or depend on other communities. Such methods therefore cannot comprehend the properties between interdependent systems in communities that affect host behavior and disease processes. Results We introduce a novel hierarchical Bayesian framework, called BALSAMICO (BAyesian Latent Semantic Analysis of MIcrobial COmmunities), which uses microbial metagenome data to discover the underlying microbial community structures and the associations between microbiota and their environmental factors. BALSAMICO models mixtures of communities in the framework of nonnegative matrix factorization, taking into account environmental factors. We proposes an efficient procedure for estimating parameters. A simulation then evaluates the accuracy of the estimated parameters. Finally, the method is used to analyze clinical data. In this analysis, we successfully detected bacteria related to colorectal cancer. Conclusions These results show that the method not only accurately estimates the parameters needed to analyze the connections between communities of microbiota and their environments, but also allows for the effective detection of these communities in real-world circumstances.


2019 ◽  
Author(s):  
Ko Abe ◽  
Masaaki Hirayama ◽  
Kinji Ohno ◽  
Teppei Shimamura

AbstractBackgroundThe human microbiome forms very complex communities that consist of hundreds to thousands of different microorganisms that not only affect the host, but also participate in disease processes. Several state-of-the-art methods have been proposed for learning the structure of microbial communities and to investigate the relationship between microorganisms and host environmental factors. However, these methods were mainly designed to model and analyze single microbial communities that do not interact with or depend on other communities. Such methods therefore cannot comprehend the properties between interdependent systems in communities that affect host behavior and disease processes.ResultsWe introduce a novel hierarchical Bayesian framework, called BALSAMICO (BAyesian Latent Semantic Analysis of MIcrobial COmmunities), which uses microbial metagenome data to discover the underlying microbial community structures and the associations between microbiota and their environmental factors. BALSAMICO models mixtures of communities in the framework of nonnegative matrix factorization, taking into account environmental factors. This method first proposes an efficient procedure for estimating parameters. A simulation then evaluates the accuracy of the estimated parameters. Finally, the method is used to analyze clinical data. In this analysis, we successfully detected bacteria related to colorectal cancer. These results show that the method not only accurately estimates the parameters needed to analyze the connections between communities of microbiota and their environments, but also allows for the effective detection of these communities in real-world circumstances.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-18
Author(s):  
Kai Liu ◽  
Xiangyu Li ◽  
Zhihui Zhu ◽  
Lodewijk Brand ◽  
Hua Wang

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.


Author(s):  
Bin Qian ◽  
Lei Tong ◽  
Zhenmin Tang ◽  
Xiaobo Shen

Hyperspectral unmixing is one of the most important techniques in the remote sensing image analysis tasks. In recent decades, nonnegative matrix factorization (NMF) has been shown to be effective for hyperspectral unmixing due to the strong discovery of the latent structure. Most NMFs put emphasize on the spectral information, but ignore the spatial information, which is very crucial for analyzing hyperspectral data. In this paper, we propose an improved NMF method, namely NMF with region sparsity learning (RSLNMF), to simultaneously consider both spectral and spatial information. RSLNMF defines a new sparsity learning model based on a small homogeneous region that is obtained via the graph cut algorithm. Thus RSLNMF is able to explore the relationship of spatial neighbor pixels within each region. An efficient optimization scheme is developed for the proposed RSLNMF, and its convergence is theoretically guaranteed. Experiments on both synthetic and real hyperspectral data validate the superiority of the proposed method over several state-of-the-art unmixing approaches.


Author(s):  
Yuanyuan Ma ◽  
Xiaohua Hu ◽  
Tingting He ◽  
Xingpeng Jiang

Integration of multi-view datasets which are comprised of heterogeneous sources or different representations is challenging to understand the subtle and complex relationship in data. Such data integration methods attempt to combine efficiently the complementary information of multiple data types to construct a comprehensive view of underlying data. Nonnegative matrix factorization (NMF), an approach that can be used for signal compression and noise reduction, has aroused widespread attention in the last two decades. The Kullback–Leibler divergence (or relative entropy) information distance can be used to measure the loss function of NMF. In this article, we propose a fast and robust framework (RSNMF) based on symmetric nonnegative matrix factorization (SNMF) and similarity network fusion (SNF) for clustering human microbiome data including functional, metabolic and phylogenetic profiles. Many existing methods typically utilize all the information provided by each view to create a consensus representation, which often suffers a lot from noise in data and cannot provide a precise representation of the latent data structures. In contrast, RSNMF combines the strength of SNMF and the advantage of SNF to form a robust clustering indicator matrix thus can reduce the noise influence. We conduct experiments on one synthetic and two real dataset (microbiome data, text data) and the results show that the proposed RSNMF has better performance over the baseline and the state-of-art methods, which demonstrates the potential application of RSNMF for microbiome data analysis.


Sign in / Sign up

Export Citation Format

Share Document