scholarly journals Fully Moderated T-statistic for Small Sample Size Gene Expression Arrays

Author(s):  
Lianbo Yu ◽  
Parul Gulati ◽  
Soledad Fernandez ◽  
Michael Pennell ◽  
Lawrence Kirschner ◽  
...  

Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.

Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


2009 ◽  
Vol 27 (15_suppl) ◽  
pp. 9009-9009
Author(s):  
H. A. Tawbi ◽  
S. Buch ◽  
P. Pancoska ◽  
Y. Lin ◽  
M. Saul ◽  
...  

9009 Background: Temozolomide and dacarbazine (TMZ and DTIC) remain the mainstay of alkylator-based chemotherapy for MM, despite response rates of 10–15% and the absence of any impact on survival. Classification of patients according to responsiveness can guide the individualization of therapy and inform approaches to abrogate mechanisms of chemotherapy resistance. Epigenetic mechanisms play an important role in regulation of genes associated with resistance and were evaluated in tandem with gene expression profiling in biological samples from MM patients (pts) to refine our understanding of the epigenomic-genomic-phenotypic interplay. Methods: We examined promoter methylation and gene expression in tumor tissues of 21 pts with MM treated with TMZ or DTIC, using high throughput technologies (Illumina Inc). The cases were divided into responder (R) and non-responder (NR) groups based on clinical response. The data were analyzed using Prediction Analysis of Microarrays (PAM) from BRB array tools. Results: Differential promoter methylation analysis revealed that 63.6% of promoter sites were hypomethylated in tumors obtained from R pts (p<0.0001). PAM analysis of gene expression data revealed that a classifier set consisting of 82 genes was able to predict NRs from Rs with 83% sensitivity and 89% specificity. Promoter methylation profiling did not independently correlate with R status. A simultaneous analysis of the promoter methylation and gene expression values first stratified into 3 data-driven categories and then combined into a 3 by 3 matrix allowed us to identify a common gene expression/methylation signature of 15 genes that classified both NR and R groups accurately 100% of the time. Conclusions: Gene expression signatures independently predict response to chemotherapy in MM, however promoter methylation profiling alone does not. Analysis of combined gene expression and promoter methylation in a well- annotated clinical data set dichotomized according to response identified a highly predictive signature. The findings from this study are qualified by the relatively small sample size and are currently being validated in an expanded sample set. Supported in part by the ECOG Paul Carbone, MD, Fellowship Award. No significant financial relationships to disclose.


2012 ◽  
Vol 2012 ◽  
pp. 1-18
Author(s):  
Jiajuan Liang

High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry that is especially suitable for high-dimensional data with small sample size. The new graphical method associated with the local acceptance regions can provide a quick visual perception on the assumption of spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo study and illustrated by a real data set.


Author(s):  
Yanming Di ◽  
Daniel W Schafer ◽  
Jason S Cumbie ◽  
Jeff H Chang

We propose a new statistical test for assessing differential gene expression using RNA sequencing (RNA-Seq) data. Commonly used probability distributions, such as binomial or Poisson, cannot appropriately model the count variability in RNA-Seq data due to overdispersion. The small sample size that is typical in this type of data also prevents the uncritical use of tools derived from large-sample asymptotic theory. The test we propose is based on the NBP parameterization of the negative binomial distribution. It extends an exact test proposed by Robinson and Smyth (2007, 2008). In one version of Robinson and Smyth’s test, a constant dispersion parameter is used to model the count variability between biological replicates. We introduce an additional parameter to allow the dispersion parameter to depend on the mean. Our parametric method complements nonparametric regression approaches for modeling the dispersion parameter. We apply the test we propose to an Arabidopsis data set and a range of simulated data sets. The results show that the test is simple, powerful and reasonably robust against departures from model assumptions.


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


2015 ◽  
Vol 23 (3) ◽  
pp. 617-626 ◽  
Author(s):  
Nophar Geifman ◽  
Sanchita Bhattacharya ◽  
Atul J Butte

Abstract Objective Cytokines play a central role in both health and disease, modulating immune responses and acting as diagnostic markers and therapeutic targets. This work takes a systems-level approach for integration and examination of immune patterns, such as cytokine gene expression with information from biomedical literature, and applies it in the context of disease, with the objective of identifying potentially useful relationships and areas for future research. Results We present herein the integration and analysis of immune-related knowledge, namely, information derived from biomedical literature and gene expression arrays. Cytokine-disease associations were captured from over 2.4 million PubMed records, in the form of Medical Subject Headings descriptor co-occurrences, as well as from gene expression arrays. Clustering of cytokine-disease co-occurrences from biomedical literature is shown to reflect current medical knowledge as well as potentially novel relationships between diseases. A correlation analysis of cytokine gene expression in a variety of diseases revealed compelling relationships. Finally, a novel analysis comparing cytokine gene expression in different diseases to parallel associations captured from the biomedical literature was used to examine which associations are interesting for further investigation. Discussion We demonstrate the usefulness of capturing Medical Subject Headings descriptor co-occurrences from biomedical publications in the generation of valid and potentially useful hypotheses. Furthermore, integrating and comparing descriptor co-occurrences with gene expression data was shown to be useful in detecting new, potentially fruitful, and unaddressed areas of research. Conclusion Using integrated large-scale data captured from the scientific literature and experimental data, a better understanding of the immune mechanisms underlying disease can be achieved and applied to research.


Stroke ◽  
2014 ◽  
Vol 45 (suppl_1) ◽  
Author(s):  
Blake Haas ◽  
Nestor R Gonzalez ◽  
Elina Nikkola ◽  
Mark Connolly ◽  
William Hsu ◽  
...  

Introduction: Intracranial aneurysms (IA) growth and rupture have been associated with chronic remodeling of the arterial wall. However, the pathobiology of this process remains poorly understood. The objective of the present study was to evaluate the feasibility of analyzing gene expression patterns in peripheral blood of patients with ruptured and unruptured saccular IAs. Materials and Methods: We analyzed human whole blood transcriptomes by performing paired-end, 100 bp RNA-sequencing (RNAseq) using the Illumina platform. We used STAR to align reads to the genome, HTSeq to count reads, and DESeq to normalize counts across samples. Self-reported patient information was used to correct expression values for ancestry, age, and sex. We utilized weighted gene co-expression network analysis (WGCNA) to identify gene expression network modules associated with IA size and rupture. The DAVID tool was employed to search for Gene Ontology enrichment in relevant modules. Results: Samples from 12 patients (9 females, age 57.6 +/-12) with IAs were analyzed. Four had ruptured aneurysms. RNA isolation and application of the methodology described above was successful in all samples. Although the small sample size prevents us from drawing definite conclusions, we observed promising novel co-expression networks for IAs: WCGNA analysis showed down-regulation of two transcript modules associated with ruptured IA status (r=-0.78, p=0.008 and r=-0.77, p=0.009), and up-regulation of two modules associated with aneurysm size (r=0.86, p=0.002 and r=0.9, p=4e-04), respectively. DAVID analyses showed that genes upregulated in an IA size-associated module were enriched with genes involved in cellular respiration and translation, while genes involved in transcription were down-regulated in a module associated with ruptured IAs. Conclusions: Whole blood RNAseq analysis is a feasible tool to capture transcriptome dynamics and achieve a better understanding of the pathophysiology of IAs. Further longitudinal studies of patients with IAs using network analysis are justified.


Author(s):  
WEIXIANG LIU ◽  
KEHONG YUAN ◽  
JIAN WU ◽  
DATIAN YE ◽  
ZHEN JI ◽  
...  

Classification of gene expression samples is a core task in microarray data analysis. How to reduce thousands of genes and to select a suitable classifier are two key issues for gene expression data classification. This paper introduces a framework on combining both feature extraction and classifier simultaneously. Considering the non-negativity, high dimensionality and small sample size, we apply a discriminative mixture model which is designed for non-negative gene express data classification via non-negative matrix factorization (NMF) for dimension reduction. In order to enhance the sparseness of training data for fast learning of the mixture model, a generalized NMF is also adopted. Experimental results on several real gene expression datasets show that the classification accuracy, stability and decision quality can be significantly improved by using the generalized method, and the proposed method can give better performance than some previous reported results on the same datasets.


Sign in / Sign up

Export Citation Format

Share Document