scholarly journals Meta-analysis of crowdsourced data compendia suggests pan-disease transcriptional signatures of autoimmunity

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2884 ◽  
Author(s):  
William W. Lau ◽  
Rachel Sparks ◽  
John S. Tsang ◽  

Background: The proliferation of publicly accessible large-scale biological data together with increasing availability of bioinformatics tools have the potential to transform biomedical research. Here we report a crowdsourcing Jamboree that explored whether a team of volunteer biologists without formal bioinformatics training could use OMiCC, a crowdsourcing web platform that facilitates the reuse and (meta-) analysis of public gene expression data, to compile and annotate gene expression data, and design comparisons between disease and control sample groups. Methods: The Jamboree focused on several common human autoimmune diseases, including systemic lupus erythematosus (SLE), multiple sclerosis (MS), type I diabetes (DM1), and rheumatoid arthritis (RA), and the corresponding mouse models. Meta-analyses were performed in OMiCC using comparisons constructed by the participants to identify 1) gene expression signatures for each disease (disease versus healthy controls at the gene expression and biological pathway levels), 2) conserved signatures across all diseases within each species (pan-disease signatures), and 3) conserved signatures between species for each disease and across all diseases (cross-species signatures). Results: A large number of differentially expressed genes were identified for each disease based on meta-analysis, with observed overlap among diseases both within and across species. Gene set/pathway enrichment of upregulated genes suggested conserved signatures (e.g., interferon) across all human and mouse conditions. Conclusions: Our Jamboree exercise provides evidence that when enabled by appropriate tools, a "crowd" of biologists can work together to accelerate the pace by which the increasingly large amounts of public data can be reused and meta-analyzed for generating and testing hypotheses. Our encouraging experience suggests that a similar crowdsourcing approach can be used to explore other biological questions.

2015 ◽  
Vol 13 (06) ◽  
pp. 1550019 ◽  
Author(s):  
Alexei A. Sharov ◽  
David Schlessinger ◽  
Minoru S. H. Ko

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


Genomics ◽  
2020 ◽  
Vol 112 (2) ◽  
pp. 1761-1767 ◽  
Author(s):  
Konstantina E. Vennou ◽  
Daniele Piovani ◽  
Panagiota I. Kontou ◽  
Stefanos Bonovas ◽  
Pantelis G. Bagos

Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 931 ◽  
Author(s):  
Mok ◽  
Kim ◽  
Lee ◽  
Choi ◽  
Lee ◽  
...  

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.


2012 ◽  
Vol 132 (8) ◽  
pp. 2050-2059 ◽  
Author(s):  
Marloes S. van Kester ◽  
Martin K. Borg ◽  
Willem H. Zoutman ◽  
Jacoba J. Out-Luiting ◽  
Patty M. Jansen ◽  
...  

2021 ◽  
Author(s):  
Mariann Koel ◽  
Urmo Võsa ◽  
Maarja Lepamets ◽  
Hannele Laivuori ◽  
Susanna Lemmelä ◽  
...  

Background The uterine cervix has an important role in female reproductive health, but not much is known about the genetic determinants of cervical biology and pathology. Genome-wide association studies (GWAS) with increasing sample sizes have reported a few genetic associations for cervical cancer. However, GWAS is only the first step in mapping the genetic susceptibility and thus, the underlying biology in cervical cancer and other cervical phenotypes is still not entirely understood. Here, we use data from large biobanks to characterise the genetics of cervical phenotypes (including cervical cancer) and leverage latest computational methods and gene expression data to refine the association signals for cervical cancer. Methods Using Estonian Biobank and FinnGen data, we characterise the genetic signals associated with cervical ectropion (10,162 cases/151,347 controls), cervicitis (19,285/185,708) and cervical dysplasia (14,694/150,563). We present the results from the largest trans-ethnic GWAS meta-analysis of cervical cancer, including up to 9,229 cases and 490,304 controls from Estonian Biobank, the FinnGen study, the UK Biobank and Biobank Japan. We combine GWAS results with gene expression data and chromatin regulatory annotations in HeLa cervical carcinoma cells to propose the most likely candidate genes and causal variants for every locus associated with cervical cancer. We further dissect the HLA association with cervical pathology using imputed data on alleles and amino acid polymorphisms. Results We report a single associated locus on 2q13 for both cervical ectropion (rs3748916, p=5.1 x 10-16) and cervicitis (rs1049137, p=3.9 x 10-10), and five signals for cervical dysplasia - 6p21.32 (rs1053726, p=9.1 x 10-9; rs36214159, 1.6 x 10-22), 2q24.1 (rs12611652, p=3.2 x 10-9) near DAPL1, 2q13 ns1049137, p=6.4 x 10-9) near PAX8, and 5p15.33 (rs6866294, p=2.1 x 10-9), downstream of CLPTM1L. We identify five loci associated with cervical cancer in the trans-ethnic meta-analysis: 1p36.12 (rs2268177, p= 3.1 x 10-8), 2q13 (rs4849177, p=9.4 x 10-15), 5p15.33 (rs27069, p=1.3 x 10-14), 17q12 (rs12603332, p=1.2 x 10-9), and 6p21.32 (rs35508382, p=1.0 x 10-39). Joint analysis of dysplasia and cancer datasets revealed an association on chromosome 19 (rs425787, p=3.5 x 10-8), near CD70. Conclusions Our results map PAX8/PAX8-AS1, LINC00339, CDC42, CLPTM1L, HLA-DRB1, HLA-B, and GSDMB as the most likely candidate genes for cervical cancer, which provides novel insight into cervical cancer pathogenesis and supports the role of genes involved in reproductive tract development, immune response and cellular proliferation/apoptosis. We further show that PAX8/PAX8-AS1 has a central role in cervical biology and pathology, as it was associated with all analysed phenotypes. The detailed characterisation of association signals, together with mapping of causal variants and genes offers valuable leads for further functional studies.


Sign in / Sign up

Export Citation Format

Share Document