scholarly journals Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets

2015 ◽  
Vol 43 (16) ◽  
pp. 7779-7789 ◽  
Author(s):  
David Amar ◽  
Tom Hait ◽  
Shai Izraeli ◽  
Ron Shamir
2021 ◽  
Author(s):  
Taguchi Y-h. ◽  
Turki Turki

Abstract The integrated analysis of multiple gene expression profiles measured in distinct studies is always problematic. Especially, missing sample matching and missing common labeling between distinct studies prevent the integration of multiple studies in fully data-driven and unsupervised manner. In this study, we propose a strategy enabling the integration of multiple gene expression profiles among multiple independent studies without either labeling or sample matching, using tensor decomposition-based unsupervised feature extraction. As an example, we applied this strategy to Alzheimer’s disease (AD)-related gene expression profiles that lack exact correspondence among samples as well as AD single-cell RNA-seq (scRNA-seq) data. We found that we could select biologically reasonable genes with integrated analysis. Overall, integrated gene expression profiles can function analogously to prior learning and/or transfer learning strategies in other machine learning applications. For scRNA-seq, the proposed approach was able to drastically reduce the required computational memory.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Harpreet Kaur ◽  
Sherry Bhalla ◽  
Dilraj Kaur ◽  
Gajendra PS Raghava

Abstract Liver cancer is the fourth major lethal malignancy worldwide. To understand the development and progression of liver cancer, biomedical research generated a tremendous amount of transcriptomics and disease-specific biomarker data. However, dispersed information poses pragmatic hurdles to delineate the significant markers for the disease. Hence, a dedicated resource for liver cancer is required that integrates scattered multiple formatted datasets and information regarding disease-specific biomarkers. Liver Cancer Expression Resource (CancerLivER) is a database that maintains gene expression datasets of liver cancer along with the putative biomarkers defined for the same in the literature. It manages 115 datasets that include gene-expression profiles of 9611 samples. Each of incorporated datasets was manually curated to remove any artefact; subsequently, a standard and uniform pipeline according to the specific technique is employed for their processing. Additionally, it contains comprehensive information on 594 liver cancer biomarkers which include mainly 315 gene biomarkers or signatures and 178 protein- and 46 miRNA-based biomarkers. To explore the full potential of data on liver cancer, a web-based interactive platform was developed to perform search, browsing and analyses. Analysis tools were also integrated to explore and visualize the expression patterns of desired genes among different types of samples based on individual gene, GO ontology and pathways. Furthermore, a dataset matrix download facility was provided to facilitate the users for their extensive analysis to elucidate more robust disease-specific signatures. Eventually, CancerLivER is a comprehensive resource which is highly useful for the scientific community working in the field of liver cancer.Availability: CancerLivER can be accessed on the web at https://webs.iiitd.edu.in/raghava/cancerliver.


2019 ◽  
Author(s):  
Kulwadee Thanamit ◽  
Franziska Hoerhold ◽  
Marcus Oswald ◽  
Rainer Koenig

ABSTRACTFinding drug targets for antimicrobial treatment is a central focus in biomedical research. To discover new drug targets, we developed a method to identify which nutrients are essential for microorganisms. Using 13C labeled metabolites to infer metabolic fluxes is the most informative way to infer metabolic fluxes to date. However, the data can get difficult to acquire in complicated environments, for example, if the pathogen homes in host cells. Although data from gene expression profiling is less informative compared to metabolic tracer derived data, its generation is less laborious, and may still provide the relevant information. Besides this, metabolic fluxes have been successfully predicted by flux balance analysis (FBA). We developed an FBA based approach using the stoichiometric knowledge of the metabolic reactions of a cell combining them with expression profiles of the coding genes. We aimed to identify essential drug targets for specific nutritional uptakes of microorganisms. As a case study, we predicted each single carbon source out of a pool of eight different carbon sources for B. subtilis based on gene expression profiles. The models were in good agreement to models basing on 13C metabolic flux data of the same conditions. We could well predict every carbon source. Later, we applied successfully the model to unseen data from a study in which the carbon source was shifted from glucose to malate and vice versa. Technically, we present a new and fast method to reduce thermodynamically infeasible loops, which is a necessary preprocessing step for such model-building algorithms.SIGNIFICANCEIdentifying metabolic fluxes using 13C labeled tracers is the most informative way to gain insight into metabolic fluxes. However, obtaining the data can be laborious and challenging in a complex environment. Though transcriptional data is an indirect mean to estimate the fluxes, it can help to identify this. Here, we developed a new method employing constraint-based modeling to predict metabolic fluxes embedding gene expression profiles in a linear regression model. As a case study, we used the data from Bacillus subtilis grown under different carbon sources. We could well predict the correct carbon source. Additionally, we established a novel and fast method to remove thermodynamically infeasible loops.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Carl Grant Mangleburg ◽  
Timothy Wu ◽  
Hari K. Yalamanchili ◽  
Caiwei Guo ◽  
Yi-Chen Hsieh ◽  
...  

Abstract Background Tau neurofibrillary tangle pathology characterizes Alzheimer’s disease and other neurodegenerative tauopathies. Brain gene expression profiles can reveal mechanisms; however, few studies have systematically examined both the transcriptome and proteome or differentiated Tau- versus age-dependent changes. Methods Paired, longitudinal RNA-sequencing and mass-spectrometry were performed in a Drosophila model of tauopathy, based on pan-neuronal expression of human wildtype Tau (TauWT) or a mutant form causing frontotemporal dementia (TauR406W). Tau-induced, differentially expressed transcripts and proteins were examined cross-sectionally or using linear regression and adjusting for age. Hierarchical clustering was performed to highlight network perturbations, and we examined overlaps with human brain gene expression profiles in tauopathy. Results TauWT induced 1514 and 213 differentially expressed transcripts and proteins, respectively. TauR406W had a substantially greater impact, causing changes in 5494 transcripts and 697 proteins. There was a ~ 70% overlap between age- and Tau-induced changes and our analyses reveal pervasive bi-directional interactions. Strikingly, 42% of Tau-induced transcripts were discordant in the proteome, showing opposite direction of change. Tau-responsive gene expression networks strongly implicate innate immune activation. Cross-species analyses pinpoint human brain gene perturbations specifically triggered by Tau pathology and/or aging, and further differentiate between disease amplifying and protective changes. Conclusions Our results comprise a powerful, cross-species functional genomics resource for tauopathy, revealing Tau-mediated disruption of gene expression, including dynamic, age-dependent interactions between the brain transcriptome and proteome.


2018 ◽  
Author(s):  
Friederike Ehrhart ◽  
Susan L. Coort ◽  
Lars Eijssen ◽  
Elisa Cirillo ◽  
Eric E. Smeets ◽  
...  

AbstractRett syndrome (RTT) is a rare disorder causing severe intellectual and physical disability. The cause is a mutation in the gene coding for the methyl-CpG binding protein 2 (MECP2), a multifunctional regulator protein. Purpose of the study was integration and investigation of multiple gene expression profiles in human cells with impaired MECP2 gene to obtain a data-driven insight in downstream effects. Information about changed gene expression was extracted from five previously published studies. We identified a set of genes which are significantly changed not in all but several transcriptomics datasets and were not mentioned in the context of RTT before. Using overrepresentation analysis of molecular pathways and gene ontology we found that these genes are involved in several processes and molecular pathways known to be affected in RTT. Integrating transcription factors we identified a possible link how MECP2 regulates cytoskeleton organization via MEF2C and CAPG. Integrative analysis of omics data and prior knowledge databases is a powerful approach to identify links between mutation and phenotype especially in rare disease research where little data is available.AbbreviationsRett syndrome (RTT), embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), fold change (FC), Gene Ontology (GO), EIF (eukaryotic initiation of transcription factor)For genes the symbols according to the HGNC nomenclature were used.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Chien-Yueh Lee ◽  
Amrita Chattopadhyay ◽  
Li-Mei Chiang ◽  
Jyh-Ming Jimmy Juang ◽  
Liang-Chuan Lai ◽  
...  

Abstract Integrated analysis of DNA variants and gene expression profiles may facilitate precise identification of gene regulatory networks involved in disease mechanisms. Despite the widespread availability of public resources, we lack databases that are capable of simultaneously providing gene expression profiles, variant annotations, functional prediction scores and pathogenic analyses. VariED is the first web-based querying system that integrates an annotation database and expression profiles for genetic variants. The database offers a user-friendly platform and locates gene/variant names in the literature by connecting to established online querying tools, biological annotation tools and records from free-text literature. VariED acts as a central hub for organized genome information consisting of gene annotation, variant allele frequency, functional prediction, clinical interpretation and gene expression profiles in three species: human, mouse and zebrafish. VariED also provides a novel scoring scheme to predict the functional impact of a DNA variant. With one single entry, all results regarding queried DNA variants can be downloaded. VariED can potentially serve as an efficient way to obtain comprehensive variant knowledge for clinicians and scientists around the world working on important drug discoveries and precision treatments.


Sign in / Sign up

Export Citation Format

Share Document