Gene Expression Analysis through Parallel Non-Negative Matrix Factorization

Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.

Download Full-text

A Bregman-proximal point algorithm for robust non-negative matrix factorization with possible missing values and outliers - application to gene expression analysis

BMC Bioinformatics ◽

10.1186/s12859-016-1120-8 ◽

2016 ◽

Vol 17 (S8) ◽

Cited By ~ 2

Author(s):

Stéphane Chrétien ◽

Christophe Guyeux ◽

Bastien Conesa ◽

Régis Delage-Mouroux ◽

Michèle Jouvenot ◽

...

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Matrix Factorization ◽

Proximal Point Algorithm ◽

Gene Expression Analysis ◽

Missing Values ◽

Proximal Point ◽

Non Negative Matrix Factorization

Download Full-text

Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e95.a.818 ◽

2012 ◽

Vol E95-A (4) ◽

pp. 818-823 ◽

Cited By ~ 2

Author(s):

Sang Ha PARK ◽

Seokjin LEE ◽

Koeng-Mo SUNG

Keyword(s):

Matrix Factorization ◽

Clustering Algorithm ◽

Musical Sound ◽

Sound Separation ◽

Non Negative Matrix Factorization

Download Full-text

A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0039 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Yixin Kong ◽

Ariangela Kozik ◽

Cindy H. Nakatsu ◽

Yava L. Jones-Hall ◽

Hyonho Chun

Keyword(s):

Matrix Factorization ◽

Factor Model ◽

R Package ◽

Biological Data ◽

Superior Performance ◽

Sequencing Data ◽

Fecal Microbiome ◽

Brain Gene Expression ◽

Cell Transcriptome ◽

Non Negative Matrix Factorization

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

Multi-view Fuzzy Clustering Algorithm Based on Non-Negative Matrix Factorization and Partition Adaptive Fusion

Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence ◽

10.1145/3377713.3377715 ◽

2019 ◽

Author(s):

Xingliu Tao ◽

Lu Yu ◽

Xiaoying Wang

Keyword(s):

Fuzzy Clustering ◽

Matrix Factorization ◽

Clustering Algorithm ◽

Fuzzy Clustering Algorithm ◽

Adaptive Fusion ◽

Non Negative Matrix Factorization

Download Full-text

Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization

Oncogene ◽

10.1038/sj.onc.1208858 ◽

2005 ◽

Vol 24 (47) ◽

pp. 7105-7113 ◽

Cited By ~ 57

Author(s):

Kentaro Inamura ◽

Takeshi Fujiwara ◽

Yujin Hoshida ◽

Takayuki Isagawa ◽

Michael H Jones ◽

...

Keyword(s):

Gene Expression ◽

Squamous Cell Carcinoma ◽

Cell Carcinoma ◽

Hierarchical Clustering ◽

Squamous Cell ◽

Matrix Factorization ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Lung Squamous Cell Carcinoma ◽

Non Negative Matrix Factorization

Download Full-text

Neural networks for gene expression analysis and gene selection from DNA microarray

Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. ◽

10.1109/ijcnn.2005.1555883 ◽

2006 ◽

Cited By ~ 7

Author(s):

J.C. Patra ◽

Qin Zhen ◽

Ee Luang Ang ◽

Amitabha Das

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Dna Microarray ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Gene Selection

Download Full-text

Optimisation of region-specific reference gene selection and relative gene expression analysis methods for pre-clinical trials of Huntington's disease

Molecular Neurodegeneration ◽

10.1186/1750-1326-3-17 ◽

2008 ◽

Vol 3 (1) ◽

pp. 17 ◽

Cited By ~ 36

Author(s):

Caroline L Benn ◽

Helen Fox ◽

Gillian P Bates

Keyword(s):

Gene Expression ◽

Clinical Trials ◽

Reference Gene ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Gene Selection ◽

Specific Reference ◽

Relative Gene Expression ◽

Reference Gene Selection ◽

Relative Gene

Download Full-text

A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

10.21203/rs.3.rs-537545/v1 ◽

2021 ◽

Author(s):

Zhihong Zhang ◽

Sai Hu ◽

Wei Yan ◽

Bihai Zhao ◽

Lei Wang

Keyword(s):

Protein Interaction ◽

Matrix Factorization ◽

Biological Data ◽

Protein Domain ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Non Negative Matrix Factorization

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Download Full-text

Bioinformatics in otolaryngology research. Part one: concepts in DNA sequencing and gene expression analysis

The Journal of Laryngology & Otology ◽

10.1017/s002221511400200x ◽

2014 ◽

Vol 128 (10) ◽

pp. 848-858 ◽

Cited By ~ 1

Author(s):

T J Ow ◽

K Upadhyay ◽

T J Belbin ◽

M B Prystowsky ◽

H Ostrer ◽

...

Keyword(s):

Gene Expression ◽

Data Storage ◽

Expression Analysis ◽

High Throughput ◽

Gene Expression Analysis ◽

Biological Data ◽

Biological Research ◽

Nucleotide Sequencing ◽

New Era ◽

Crucial Component

AbstractBackground:Advances in high-throughput molecular biology, genomics and epigenetics, coupled with exponential increases in computing power and data storage, have led to a new era in biological research and information. Bioinformatics, the discipline devoted to storing, analysing and interpreting large volumes of biological data, has become a crucial component of modern biomedical research. Research in otolaryngology has evolved along with these advances.Objectives:This review highlights several modern high-throughput research methods, and focuses on the bioinformatics principles necessary to carry out such studies. Several examples from recent literature pertinent to otolaryngology are provided. The review is divided into two parts; this first part discusses the bioinformatics approaches applied in nucleotide sequencing and gene expression analysis.Conclusion:This paper demonstrates how high-throughput nucleotide sequencing and transcriptomics are changing biology and medicine, and describes how these changes are affecting otorhinolaryngology. Sound bioinformatics approaches are required to obtain useful information from the vast new sources of data.

Download Full-text