Dimensionality reduction using non-negative matrix factorization for information retrieval

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

Download Full-text

Information Retrieval Using Non-negative Matrix Factorization

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.124.1500 ◽

2004 ◽

Vol 124 (7) ◽

pp. 1500-1506 ◽

Cited By ~ 1

Author(s):

Satoru Tsuge ◽

Masami Shishibori ◽

Singo Kuroiwa ◽

Kenji Kita

Keyword(s):

Information Retrieval ◽

Matrix Factorization ◽

Non Negative Matrix Factorization

Download Full-text

A constrained non-negative matrix factorization in information retrieval

Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications ◽

10.1109/iri.2003.1251424 ◽

2004 ◽

Cited By ~ 1

Author(s):

Baowen Xu ◽

Jianjiang Lu ◽

Gangshi Huang

Keyword(s):

Information Retrieval ◽

Matrix Factorization ◽

Non Negative Matrix Factorization

Download Full-text

Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa064 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Shuqin Zhang ◽

Liu Yang ◽

Jinwen Yang ◽

Zhixiang Lin ◽

Michael K Ng

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Sparse Matrix ◽

Differential Expression Analysis ◽

Data Matrix ◽

Research Attention ◽

Single Cell Rna Sequencing ◽

Non Negative Matrix Factorization

Abstract Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend to have more dropouts than those sequenced to greater depths. In this study, we aimed to develop a dimensionality reduction method to address both dropouts and the non-negativity constraints in scRNA-seq data. The developed method simultaneously performs dimensionality reduction and dropout imputation under the non-negative matrix factorization (NMF) framework. The dropouts were modeled as a non-negative sparse matrix. Summation of the observed data matrix and dropout matrix was approximated by NMF. To ensure the sparsity pattern was maintained, a weighted ℓ1 penalty that took into account the dependency of dropouts on the sequencing depth in each cell was imposed. An efficient algorithm was developed to solve the proposed optimization problem. Experiments using both synthetic data and real data showed that dimensionality reduction via the proposed method afforded more robust clustering results compared with those obtained from the existing methods, and that dropout imputation improved the differential expression analysis.

Download Full-text