Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.

Download Full-text

Chaotic Harmony Search based Multi-objective Feature Selection for Classification of Gene Expression Profiles

2021 IEEE 9th International Conference on Bioinformatics and Computational Biology (ICBCB) ◽

10.1109/icbcb52223.2021.9459222 ◽

2021 ◽

Author(s):

Aiguo Wang ◽

Huancheng Liu ◽

Guilin Chen

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Profiles ◽

Harmony Search ◽

Gene Expression Profiles ◽

Multi Objective ◽

Selection For

Download Full-text

Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles

Lecture Notes in Computer Science - Data Mining for Biomedical Applications ◽

10.1007/11691730_11 ◽

2006 ◽

pp. 106-115 ◽

Cited By ~ 58

Author(s):

Xin Jin ◽

Anbang Xu ◽

Rongfang Bie ◽

Ping Guo

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Feature Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Classification ◽

Machine Learning Techniques ◽

Chi Square ◽

Learning Techniques ◽

Selection For

Download Full-text

Feature Selection for Gene Expression Data Analysis – A Review

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i5/pr2020695 ◽

2020 ◽

Vol 24 (5) ◽

pp. 6955-6964

Author(s):

Dr. Prema R

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Selection For

Download Full-text

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

P02.10 FocuSCOPE: a single cell, multi-omics solution to simultaneously analyze tumor variants and microenvironment

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-itoc8.22 ◽

2021 ◽

Vol 9 (Suppl 1) ◽

pp. A12.1-A12

Author(s):

Y Arjmand Abbassi ◽

N Fang ◽

W Zhu ◽

Y Zhou ◽

Y Chen ◽

...

Keyword(s):

Gene Expression ◽

Tumor Microenvironment ◽

Single Cell ◽

High Throughput ◽

Immune Cells ◽

Genetic Variants ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Single Cell Sequencing

Recent advances of high-throughput single cell sequencing technologies have greatly improved our understanding of the complex biological systems. Heterogeneous samples such as tumor tissues commonly harbor cancer cell-specific genetic variants and gene expression profiles, both of which have been shown to be related to the mechanisms of disease development, progression, and responses to treatment. Furthermore, stromal and immune cells within tumor microenvironment interact with cancer cells to play important roles in tumor responses to systematic therapy such as immunotherapy or cell therapy. However, most current high-throughput single cell sequencing methods detect only gene expression levels or epigenetics events such as chromatin conformation. The information on important genetic variants including mutation or fusion is not captured. To better understand the mechanisms of tumor responses to systematic therapy, it is essential to decipher the connection between genotype and gene expression patterns of both tumor cells and cells in the tumor microenvironment. We developed FocuSCOPE, a high-throughput multi-omics sequencing solution that can detect both genetic variants and transcriptome from same single cells. FocuSCOPE has been used to successfully perform single cell analysis of both gene expression profiles and point mutations, fusion genes, or intracellular viral sequences from thousands of cells simultaneously, delivering comprehensive insights of tumor and immune cells in tumor microenvironment at single cell resolution.Disclosure InformationY. Arjmand Abbassi: None. N. Fang: None. W. Zhu: None. Y. Zhou: None. Y. Chen: None. U. Deutsch: None.

Download Full-text

Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

Artificial Neural Networks in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/11829898_26 ◽

2006 ◽

pp. 286-297

Author(s):

Hans A. Kestler ◽

Wolfgang Lindner ◽

André Müller

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Set Covering

Download Full-text

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Christopher E. Gillies ◽

Xiaoli Gao ◽

Nilesh V. Patel ◽

Mohammad-Reza Siadat ◽

George D. Wilson

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Personalized Medicine ◽

Objective Function ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genetic Profile ◽

Data Set ◽

Coordinate Descent Algorithm ◽

Gene Similarity

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.

Download Full-text

Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures

BioMed Research International ◽

10.1155/2019/2497509 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Suyan Tian ◽

Chi Wang ◽

Bing Wang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Selection ◽

Selection Process ◽

Biological Knowledge ◽

Expression Data ◽

Selection Methods ◽

Its Gene ◽

Active Research

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.

Download Full-text

Microbial single-cell RNA sequencing by split-pool barcoding

Science ◽

10.1126/science.aba5257 ◽

2020 ◽

Vol 371 (6531) ◽

pp. eaba5257 ◽

Cited By ~ 2

Author(s):

Anna Kuchina ◽

Leandra M. Brettner ◽

Luana Paleologu ◽

Charles M. Roco ◽

Alexander B. Rosenberg ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cell Analysis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Growth Stages ◽

High Throughput Analysis ◽

Single Cell Rna Sequencing

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.

Download Full-text

Kidney transplant classification with gene expression profiles using L1 feature selection ensemble classifier based on data clustering

2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis.2017.8355040 ◽

2017 ◽

Author(s):

M Octaviano Pratama

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Kidney Transplant ◽

Data Clustering ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Ensemble Classifier

Download Full-text