scholarly journals Mining influential genes based on deep learning

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Fengjiao Xu ◽  
Mingmin Xu ◽  
Zutan Li ◽  
...  

Abstract Background Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. Results Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. Conclusions We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.

2020 ◽  
Author(s):  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Cong Pian ◽  
Mingmin Xu ◽  
Zutan Li ◽  
...  

Abstract Background: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1,000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome.Results: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics (mean absolute error (MAE) and Pearson correlation coefficient (PCC)). This reveals that the landmark genes detected by our method contain more genomic information.Conclusions: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.


2021 ◽  
Author(s):  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Fengjiao Xu ◽  
Mingmin Xu ◽  
Zutan Li ◽  
...  

Abstract Background: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~1,000 carefully chosen landmark genes that can capture ~80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. Results: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics (mean absolute error (MAE) and Pearson correlation coefficient (PCC)). This reveals that the landmark genes detected by our method contain more genomic information.Conclusions: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.


2015 ◽  
Author(s):  
Yifei Chen ◽  
Yi Li ◽  
Rajiv Narayan ◽  
Aravind Subramanian ◽  
Xiaohui Xie

Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost- effective strategy of profiling only ̃1,000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression, limiting its accuracy since it does not capture complex nonlinear relationship between expression of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based GEO dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms linear regression with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than linear regression in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2,921 expression profiles. Deep learning still outperforms linear regression with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability: D-GEX is available at https://github.com/uci-cbcl/D-GEX.


2004 ◽  
Vol 33 (3) ◽  
pp. 609-622 ◽  
Author(s):  
Knut R Steffensen ◽  
Soek Ying Neo ◽  
Thomas M Stulnig ◽  
Vinsensius B Vega ◽  
Safia S Rahman ◽  
...  

The liver X receptors α and β (LXRα and LXRβ ) are members of the nuclear receptor superfamily of proteins which are highly expressed in metabolically active tissues. They regulate gene expression of critical genes involved in cholesterol catabolism and transport, lipid and triglyceride biosynthesis and carbohydrate metabolism in response to distinct oxysterols and intermediates in the cholesterol metabolic pathway. The biological roles of the LXRs in tissues other than liver, intestine and adipose tissue are poorly elucidated. In this study we used global gene-expression profiling analysis to detect differences in expression patterns in several tissues from mice fed an LXR agonist or vehicle. Our results show that LXR plays an important role in the kidney, lung, adrenals, brain, testis and heart where several putative LXR target genes were found. The effects of the LXRs were further analysed in adrenals where treatment with an LXR agonist induced expression of adrenocorticotrophic hormone receptor, suppressed expression of uncoupling protein (UCP)-1 and UCP-3 as well as several glycolytic enzymes and led to increased serum corticosterone levels. These results indicate novel biological roles of the LXR including regulation of energy metabolism, glycolysis and steroidogenesis in the adrenals via alteration of expression profiles of putative target genes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shengqiao Gao ◽  
Lu Han ◽  
Dan Luo ◽  
Gang Liu ◽  
Zhiyong Xiao ◽  
...  

Abstract Background Querying drug-induced gene expression profiles with machine learning method is an effective way for revealing drug mechanism of actions (MOAs), which is strongly supported by the growth of large scale and high-throughput gene expression databases. However, due to the lack of code-free and user friendly applications, it is not easy for biologists and pharmacologists to model MOAs with state-of-art deep learning approach. Results In this work, a newly developed online collaborative tool, Genetic profile-activity relationship (GPAR) was built to help modeling and predicting MOAs easily via deep learning. The users can use GPAR to customize their training sets to train self-defined MOA prediction models, to evaluate the model performances and to make further predictions automatically. Cross-validation tests show GPAR outperforms Gene set enrichment analysis in predicting MOAs. Conclusion GPAR can serve as a better approach in MOAs prediction, which may facilitate researchers to generate more reliable MOA hypothesis.


2021 ◽  
Author(s):  
Alexandre Laverré ◽  
Eric Tannier ◽  
Anamaria Necsulea

AbstractGene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements is rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve regulatory element sequences and their interactions with target genes. The extent of evolutionary conservation is remarkable for long-range promoter-enhancer contacts, illustrating how the structure of regulatory interactions constrains large-scale genome evolution. Notably, we show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny or contacts with target genes, is tightly linked to gene expression evolution.


2021 ◽  
pp. gr.275901.121
Author(s):  
Alexandre Laverre ◽  
Eric Tannier ◽  
Anamaria Necsulea

Gene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements is rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve not only regulatory element sequences but also their chromatin contacts with target genes. The extent of evolutionary conservation is remarkable for long-range promoter-enhancer contacts, illustrating how the structure of regulatory landscapes constrains large-scale genome evolution. We show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny or contacts with target genes, is significantly associated with gene expression evolution.


Blood ◽  
2004 ◽  
Vol 104 (11) ◽  
pp. 4351-4351
Author(s):  
Wei-Feng Dong ◽  
Naoto Takahashi ◽  
Matthew N. Bainbridge ◽  
Andrea R. Hull ◽  
Stuart A. Scott ◽  
...  

Abstract RIZ1 (PRDM2) is a tumor suppressor gene on 1p36 that frequently undergoes deletion, rearrangements, and loss of heterozygosity in a broad spectrum of tumors. RIZ1 is a member of the nuclear protein methyltransferase superfamily involved in chromatin remodeling. RIZ1 contains a ~130 amino acid conserved domain (PR or SET) that is important in chromatin-mediated regulation of gene expression and in the development of cancer. RIZ1 methylates Histone H3 on K9 and this activity may play a role in transcription repression as H3-K9 methylation is known to be associated with repression. Aberrant activities or mistargeting of chromatin modifying activities are proving to have unexpected links to cancer. We and others have shown that RIZ1 expression is down regulated in human leukemias and in the human erythroleukemia cell line K562. Expression of RIZ1 in K562 reduced proliferation, increased apoptosis, and promoted erythroid differentiation. To understand how RIZ1’s DNA binding, methyltransferase, and transcription repressor functions are related to its tumor suppressor activity it is necessary to characterize RIZ1 target genes. We used DNA microarrays to globally monitor how RIZ1 affects gene expression profiles. We constructed a K562 cell line with RIZ1 stably integrated under the control of a CMV promoter and analyzed the gene expression profiles of K652 and K562 + RIZ1 using a 42K Stanford human gene microarray. By comparing the gene expression profiles of these cell lines, we identified potential RIZ1 gene targets that are up and down regulated in the presence of RIZ1. In total, we identified 5 upregulated genes and 20 down regulated genes using significance analysis of microarrays (SAM) and standard deviation filter analysis of the gene expression data. RIZ1-mediated changes in gene expression profiling indicate that RIZ1 is potentially involved in the regulation and connection of the IGF-1 (IGF-1, IGFBP2) and integrin (LMS1) pathways, and in the activation of the TGF-β (SPARC) pathway. The genes perturbed by RIZ1 expression suggest that the tumor suppressor properties of RIZ1 arise from its control of proliferation, apoptosis and differentiation using these pathways. Finally, we observed an overrepresentation of the SP-1 transcription factor binding sites in genes that are upregulated in the absence of RIZ1. This correlates with the ability of RIZ1 to recognize SP1 sequences.


2019 ◽  
Author(s):  
Lu Zhang ◽  
Jia Xing Chen ◽  
Shuai Cheng Li

ABSTRACTThe fast accumulation of high-throughput gene expression data provides us an unprecedented opportunity to understand the gene interactions and prioritize disease candidate genes. However, these data are typically noisy and highly heterogeneous, complicating their use in constructing large expression compendium. Recent studies suggest that the collective expression pattern can be better modeled by Gaussian mixtures. This motivates our present work, which applies a Multimodal framework (MMF) to depict the gene expression profiles. MMF introduces two new statistics: Multimodal Mutual Information and Multimodal Direct Information. Through extensive simulations, MMF outperforms other approaches for detecting gene co-expressions or gene regulatory interactions, regardless of the level of noise or strength of interactions. In the principal component analysis for very large collections of expression data, the use of MMI enables more biologically meaningful spaces to be extracted than the use of Pearson correlation. The practical use of MMF is further demonstrated with three biological applications: 1. Prioritizing KIF1A as the candidate causal gene of hereditary spastic paraparesis from familial exome sequencing data; 2. Detecting ANK2 as the ‘hot genes’ for autism spectrum disorders, derived from exome sequencing family based study; 3. Predicting the microRNA target genes based on both sequence and expression information.


2021 ◽  
Author(s):  
Eric de Bony ◽  
Fien Gysens ◽  
Nurten Yigit ◽  
Jasper Anckaert ◽  
Celine Everaert ◽  
...  

AbstractMolecular phenotyping through shallow 3’-end RNA-sequencing workflows is increasingly applied in the context of large-scale chemical or genetic perturbation screens to study disease biology or support drug discovery. While these workflows enable accurate quantification of the most abundant genes, they are less effective for applications that require expression profiling of low abundant transcripts, like long non-coding RNAs (lncRNAs), or selected gene panels. To tackle these issues, we describe a workflow combining 3’-end library preparation with 3’-end hybrid capture probes and shallow RNA-sequencing for cost-effective, targeted quantification of subsets of (low abundant) genes across hundreds to thousands of samples. To assess the performance of the method, we designed a capture probe set for more than 100 mRNA and lncRNA target genes and applied the workflow to a cohort of 360 samples. When compared to standard 3’-end RNA-sequencing, 3’-end capture sequencing resulted in a more than 100-fold enrichment of target gene abundance while conserving relative inter-gene and inter-sample abundances. 3’-end RNA capture sequencing enables accurate targeted gene expression profiling at extremely shallow sequencing depth.


Sign in / Sign up

Export Citation Format

Share Document