scholarly journals A New Ensemble Method for Detecting Anomalies in Gene Expression Matrices

Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 882
Author(s):  
Laura Selicato ◽  
Flavia Esposito ◽  
Grazia Gargano ◽  
Maria Carmela Vegliante ◽  
Giuseppina Opinto ◽  
...  

One of the main problems in the analysis of real data is often related to the presence of anomalies. Namely, anomalous cases can both spoil the resulting analysis and contain valuable information at the same time. In both cases, the ability to detect these occurrences is very important. In the biomedical field, a correct identification of outliers could allow the development of new biological hypotheses that are not considered when looking at experimental biological data. In this work, we address the problem of detecting outliers in gene expression data, focusing on microarray analysis. We propose an ensemble approach for detecting anomalies in gene expression matrices based on the use of Hierarchical Clustering and Robust Principal Component Analysis, which allows us to derive a novel pseudo-mathematical classification of anomalies.

2005 ◽  
Vol 03 (02) ◽  
pp. 303-316 ◽  
Author(s):  
ZHENQIU LIU ◽  
DECHANG CHEN ◽  
HALIMA BENSMAIL ◽  
YING XU

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms.


2005 ◽  
Vol 2005 (2) ◽  
pp. 155-159 ◽  
Author(s):  
Zhenqiu Liu ◽  
Dechang Chen ◽  
Halima Bensmail

One important feature of the gene expression data is that the number of genesMfar exceeds the number of samplesN. Standard statistical methods do not work well whenN<M. Development of new methodologies or modification of existing methodologies is needed for the analysis of the microarray data. In this paper, we propose a novel analysis procedure for classifying the gene expression data. This procedure involves dimension reduction using kernel principal component analysis (KPCA) and classification with logistic regression (discrimination). KPCA is a generalization and nonlinear version of principal component analysis. The proposed algorithm was applied to five different gene expression datasets involving human tumor samples. Comparison with other popular classification methods such as support vector machines and neural networks shows that our algorithm is very promising in classifying gene expression data.


Biometrics ◽  
2005 ◽  
Vol 61 (2) ◽  
pp. 632-634
Author(s):  
Luc Wouters ◽  
Hinrich W. Göhlmann ◽  
Luc Bijnens ◽  
Stefan U. Kass ◽  
Geert Molenberghs ◽  
...  

2017 ◽  
Vol 41 (8) ◽  
pp. 844-865 ◽  
Author(s):  
Mengque Liu ◽  
Xinyan Fan ◽  
Kuangnan Fang ◽  
Qingzhao Zhang ◽  
Shuangge Ma

2018 ◽  
Vol 35 (4) ◽  
pp. 553-559
Author(s):  
Zi Yang ◽  
George Michailidis

Abstract Motivation The diversity of biological omics data provides richness of information, but also presents an analytic challenge. While there has been much methodological and theoretical development on the statistical handling of large volumes of biological data, far less attention has been devoted to characterizing their veracity and variability. Results We propose a method of statistically quantifying heterogeneity among multiple groups of datasets, derived from different omics modalities over various experimental and/or disease conditions. It draws upon strategies from analysis of variance and principal component analysis in order to reduce dimensionality of the variability across multiple data groups. The resulting hypothesis-based inference procedure is demonstrated with synthetic and real data from a cell line study of growth factor responsiveness based on a factorial experimental design. Availability and implementation Source code and datasets are freely available at https://github.com/yangzi4/gPCA. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document