Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes

Author(s):  
Xing Qiu ◽  
Lev Klebanov ◽  
Andrei Yakovlev

Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. The empirical Bayes methodology in the nonparametric and parametric formulations, as well as closely related methods employing a two-component mixture model, represent typical examples. It is frequently assumed that dependence between gene expressions (or associated test statistics) is sufficiently weak to justify the application of such methods for selecting differentially expressed genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.

2017 ◽  
Author(s):  
John D. Blischak ◽  
Ludovic Tailleux ◽  
Marsha Myrthil ◽  
Cécile Charlois ◽  
Emmanuel Bergot ◽  
...  

ABSTRACTTuberculosis (TB) is a deadly infectious disease, which kills millions of people every year. The causative pathogen, Mycobac-terium tuberculosis (MTB), is estimated to have infected up to a third of the world’s population; however, only approximately 10% of infected healthy individuals progress to active TB. Despite evidence for heritability, it is not currently possible to predict who may develop TB. To explore approaches to classify susceptibility to TB, we infected with MTB dendritic cells (DCs) from putatively resistant individuals diagnosed with latent TB, and from susceptible individuals that had recovered from active TB. We measured gene expression levels in infected and non-infected cells and found hundreds of differentially expressed genes between susceptible and resistant individuals in the non-infected cells. We further found that genetic polymorphisms nearby the differentially expressed genes between susceptible and resistant individuals are more likely to be associated with TB susceptibility in published GWAS data. Lastly, we trained a classifier based on the gene expression levels in the non-infected cells, and demonstrated decent performance on our data and an independent data set. Overall, our promising results from this small study suggest that training a classifier on a larger cohort may enable us to accurately predict TB susceptibility.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


Author(s):  
Chris A Glasbey ◽  
Thorsten Forster ◽  
Peter Ghazal

Digital images obtained by the laser scanning of spotted microarrays often include saturated pixel values. These arise when the scan settings are sufficiently high and some pixels exceed the limit L=65535 and are instead set to L. Failure to adjust for this censoring leads to biased estimates of gene expression levels. To impute censored values, we propose a linear model based on the principal components of uncensored spots on the same array. This is computationally fast, flexible to adapt to distinctive spot shapes and profiles on different arrays, and is shown to be more effective than the polynomial-hyperbolic model in correcting for the bias. The application to biological data demonstrates the potential for enhancing the dynamic range of detection. Fortran90 subroutines implementing these methods are available at http://www.bioss.ac.uk/~chris.


2020 ◽  
Author(s):  
Hansapani Rodrigo ◽  
Bryan Martinez ◽  
Roberto De La Garza ◽  
Upal Roy

Abstract Background: HIV Associated Neurological Disorders (HAND) is relatively common among people with HIV-1 infection, even those taking combined antiretroviral treatment (cART). Genome-wide screening of transcription regulation in brain tissue helps in identifying substantial abnormalities present in patients’ gene transcripts and to discover possible biomarkers for HAND. This study explores the possibility of identifying differentially expressed (DE) genes, which can serve as potential biomarkers to detect HAND. In this study, we have investigated the gene expression levels of three subject groups with different impairment levels of HAND along with a control group in three distinct brain sectors: white matter, frontal cortex, and basal ganglia. Methods: Linear models with weighted least squares along with Benjamini-Hochberg multiple corrections were used to identify DE genes in each brain region. Genes with an adjusted p-value of less than 0.01 were identified as differentially expressed. Principal component analyses (PCA) were performed to detect any groupings among the subject groups. Significance Analysis of Microarrays (SAM) and random forests (RF) methods with two distinct approaches were used to identify DE genes. Results: A total of 710 genes in basal ganglia, 794 genes in the frontal cortex, and 1481 genes in white matter were screened. The highest proportion of DE genes was observed within the two brain regions, frontal neocortex, and basal ganglia. PCA analyses do not exhibit clear groupings among four subject groups. SAM and RF models reveal the genes, CIRBP, RBM3, GPNMB, ISG15, IFIT6, IFI6, and IFIT3, to have DE genes in the frontal cortex or basal ganglia among the subject groups. The gene, GADD45A, a protein-coding gene whose transcript levels tend to increase with stressful growth arrest conditions, was consistently ranked among the top genes by both RF models within the frontal cortex. Conclusions: Our study contributes to a comprehensive understanding of the gene expression levels of the subject with different severity levels of HAND. Several genes that appear to play critical roles in the inflammatory response have been found, and they have an excellent potential to be used as biomarkers to detect HAND under further investigations.


2020 ◽  
Vol 14 ◽  
pp. 117793222090616
Author(s):  
Badreddine Nouadi ◽  
Yousra Sbaoui ◽  
Mariame El Messal ◽  
Faiza Bennis ◽  
Fatima Chegdani

Nowadays, the integration of biological data is a major challenge for bioinformatics. Many studies have examined gene expression in the epithelial tissue in the intestines of infants born to term and breastfed, generating a large amount of data. The integration of these data is important to understand the biological processes involved during bacterial colonization of the newborns intestine, particularly through breast milk. This work aims to exploit the bioinformatics approaches, to provide a new representation and interpretation of the interactions between differentially expressed genes in the host intestine induced by the microbiota.


Author(s):  
Abdulkerim DÝLER

This study was carried out to identify the HSPA1A, TNF, IL1B and IL6 mRNA gene expression levels of Holstein dairy cattle sheltered in different floor types. Nineteen Holstein cows were used in this study. The cattle taken into research were divided into two groups as concrete (CON; n= 10) or rubber mat (RUB; n=9). HSPA1A, TNF, IL1B and IL6 mRNA genes are isolated from milk somatic cells and the gene expression is identified by Real-Time PCR. Between the groups, the HSPA1A (P less than 0.01) and IL1B (P less thann 0.05) gene expression levels were found to be statistically significant, while IL6 and TNF gene expressions were not significant. While the IL6 and TNF gene expression differences are insignificant between the groups, numerically higher level of gene expression was observed in the CON group. Overall results of the study suggested that the rubber mat floor type has a positive impact on both the animal welfare and the udder health.


2017 ◽  
Vol 15 (05) ◽  
pp. 1750020 ◽  
Author(s):  
Na You ◽  
Xueqin Wang

The microarray technology is widely used to identify the differentially expressed genes due to its high throughput capability. The number of replicated microarray chips in each group is usually not abundant. It is an efficient way to borrow information across different genes to improve the parameter estimation which suffers from the limited sample size. In this paper, we use a hierarchical model to describe the dispersion of gene expression profiles and model the variance through the gene expression level via a link function. A heuristic algorithm is proposed to estimate the hyper-parameters and link function. The differentially expressed genes are identified using a multiple testing procedure. Compared to SAM and LIMMA, our proposed method shows a significant superiority in term of detection power as the false discovery rate being controlled.


Genes ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 19 ◽  
Author(s):  
Chao Zhang ◽  
Xiang-Dong Liu

Wing dimorphism is considered as an adaptive trait of insects. Brown planthoppers (BPHs) Nilaparvata lugens, a serious pest of rice, are either macropterous or brachypterous. Genetic and environmental factors are both likely to control wing morph determination in BPHs, but the hereditary law and genes network are still unknown. Here, we investigated changes in gene expression levels between macropterous and brachypterous BPHs by creating artificially bred morphotype lines. The nearly pure-bred strains of macropterous and brachypterous BPHs were established, and their transcriptomes and gene expression levels were compared. Over ten-thousand differentially expressed genes (DEGs) between macropterous and brachypterous strains were found in the egg, nymph, and adult stages, and the three stages shared 6523 DEGs. The regulation of actin cytoskeleton, focal adhesion, tight junction, and adherens junction pathways were consistently enriched with DEGs across the three stages, whereas insulin signaling pathway, metabolic pathways, vascular smooth muscle contraction, platelet activation, oxytocin signaling pathway, sugar metabolism, and glycolysis/gluconeogenesis were significantly enriched by DEGs in a specific stage. Gene expression trend profiles across three stages were different between the two strains. Eggs, nymphs, and adults from the macropterous strain were distinguishable from the brachypterous based on gene expression levels, and genes that were related to wing morphs were differentially expressed between wing strains or strain × stage. A proposed mode based on genes and environments to modulate the wing dimorphism of BPHs was provided.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jin Wang ◽  
Qinxue Zhang ◽  
Xiong You ◽  
Xilin Hou

BackgroundNon-heading Chinese cabbage (Brassica rapa ssp. chinensis) is an important leaf vegetable grown worldwide. However, there has currently been not enough transcriptome and small RNA combined sequencing analysis of cold tolerance, which hinders further functional genomics research.ResultsIn this study, 63.43 Gb of clean data was obtained from the transcriptome analysis. The clean data of each sample reached 6.99 Gb, and the basic percentage of Q30 was 93.68% and above. The clean reads of each sample were sequence aligned with the designated reference genome (Brassica rapa, IVFCAASv1), and the efficiency of the alignment varied from 81.54 to 87.24%. According to the comparison results, 1,860 new genes were discovered in Pak-choi, of which 1,613 were functionally annotated. Among them, 13 common differentially expressed genes were detected in all materials, including seven upregulated and six downregulated. At the same time, we used quantitative real-time PCR to confirm the changes of these gene expression levels. In addition, we sequenced miRNA of the same material. Our findings revealed a total of 34,182,333 small RNA reads, 88,604,604 kinds of small RNAs, among which the most common size was 24 nt. In all materials, the number of common differential miRNAs is eight. According to the corresponding relationship between miRNA and its target genes, we carried out Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analysis on the set of target genes on each group of differentially expressed miRNAs. Through the analysis, it is found that the distributions of candidate target genes in different materials are different. We not only used transcriptome sequencing and small RNA sequencing but also used experiments to prove the expression levels of differentially expressed genes that were obtained by sequencing. Sequencing combined with experiments proved the mechanism of some differential gene expression levels after low-temperature treatment.ConclusionIn all, this study provides a resource for genetic and genomic research under abiotic stress in Pak-choi.


Sign in / Sign up

Export Citation Format

Share Document