scholarly journals Large-scale analysis of human gene expression variability associates highly variable drug targets with lower drug effectiveness and safety

2019 ◽  
Vol 35 (17) ◽  
pp. 3028-3037 ◽  
Author(s):  
Eyal Simonovsky ◽  
Ronen Schuster ◽  
Esti Yeger-Lotem

Abstract Motivation The effectiveness of drugs tends to vary between patients. One of the well-known reasons for this phenomenon is genetic polymorphisms in drug target genes among patients. Here, we propose that differences in expression levels of drug target genes across individuals can also contribute to this phenomenon. Results To explore this hypothesis, we analyzed the expression variability of protein-coding genes, and particularly drug target genes, across individuals. For this, we developed a novel variability measure, termed local coefficient of variation (LCV), which ranks the expression variability of each gene relative to genes with similar expression levels. Unlike commonly used methods, LCV neutralizes expression levels biases without imposing any distribution over the variation and is robust to data incompleteness. Application of LCV to RNA-sequencing profiles of 19 human tissues and to target genes of 1076 approved drugs revealed that drug target genes were significantly more variable than protein-coding genes. Analysis of 113 drugs with available effectiveness scores showed that drugs targeting highly variable genes tended to be less effective in the population. Furthermore, comparison of approved drugs to drugs that were withdrawn from the market showed that withdrawn drugs targeted significantly more variable genes than approved drugs. Last, upon analyzing gender differences we found that the variability of drug target genes was similar between men and women. Altogether, our results suggest that expression variability of drug target genes could contribute to the variable responsiveness and effectiveness of drugs, and is worth considering during drug treatment and development. Availability and implementation LCV is available as a python script in GitHub (https://github.com/eyalsim/LCV). Supplementary information Supplementary data are available at Bioinformatics online.

2017 ◽  
Author(s):  
Morgan N. Price ◽  
Adam P. Arkin

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.


2019 ◽  
Author(s):  
Deepank R Korandla ◽  
Jacob M Wozniak ◽  
Anaamika Campeau ◽  
David J Gonzalez ◽  
Erik S Wright

Abstract Motivation A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. Results Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88–95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. Availability and implementation AssessORF is available as an R package via the Bioconductor package repository. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 116 (44) ◽  
pp. 22020-22029 ◽  
Author(s):  
Aritro Nath ◽  
Eunice Y. T. Lau ◽  
Adam M. Lee ◽  
Paul Geeleher ◽  
William C. S. Cho ◽  
...  

Large-scale cancer cell line screens have identified thousands of protein-coding genes (PCGs) as biomarkers of anticancer drug response. However, systematic evaluation of long noncoding RNAs (lncRNAs) as pharmacogenomic biomarkers has so far proven challenging. Here, we study the contribution of lncRNAs as drug response predictors beyond spurious associations driven by correlations with proximal PCGs, tissue lineage, or established biomarkers. We show that, as a whole, the lncRNA transcriptome is equally potent as the PCG transcriptome at predicting response to hundreds of anticancer drugs. Analysis of individual lncRNAs transcripts associated with drug response reveals nearly half of the significant associations are in fact attributable to proximal cis-PCGs. However, adjusting for effects of cis-PCGs revealed significant lncRNAs that augment drug response predictions for most drugs, including those with well-established clinical biomarkers. In addition, we identify lncRNA-specific somatic alterations associated with drug response by adopting a statistical approach to determine lncRNAs carrying somatic mutations that undergo positive selection in cancer cells. Lastly, we experimentally demonstrate that 2 lncRNAs, EGFR-AS1 and MIR205HG, are functionally relevant predictors of anti-epidermal growth factor receptor (EGFR) drug response.


Genetics ◽  
1999 ◽  
Vol 153 (1) ◽  
pp. 179-219 ◽  
Author(s):  
M Ashburner ◽  
S Misra ◽  
J Roote ◽  
S E Lewis ◽  
R Blazej ◽  
...  

Abstract A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized “Adh region.” A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.


2020 ◽  
Author(s):  
Xiao Ma ◽  
Shuangshuang Cen ◽  
Luming Wang ◽  
Chao Zhang ◽  
Limin Wu ◽  
...  

Abstract Background: The gonad is the major factor affecting animal reproduction. The regulatory mechanism of the expression of protein-coding genes involved in reproduction still remains to be elucidated. Increasing evidence has shown that ncRNAs play key regulatory roles in gene expression in many life processes. The roles of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in reproduction have been investigated in some species. However, the regulatory patterns of miRNA and lncRNA in the sex biased expression of protein coding genes remains to be elucidated. In this study, we performed an integrated analysis of miRNA, messenger RNA (mRNA), and lncRNA expression profiles to explore their regulatory patterns in the female ovary and male testis of Chinese soft-shelled turtle, Pelodiscus sinensis.Results: We identified 10 446 mature miRNAs, 20 414 mRNAs and 28 500 lncRNAs in the ovaries and testes, and 633 miRNAs, 11 319 mRNAs, and 10 495 lncRNAs showed differential expression. A total of 2 814 target genes were identified for miRNAs. The predicted target genes of these differentially expressed (DE) miRNAs and lncRNAs included abundant genes related to reproductive regulation. Furthermore, we found that 189 DEmiRNAs and 5 408 DElncRNAs showed sex-specific expression. Of these, 3 DEmiRNAs and 917 DElncRNAs were testis-specific, and 186 DEmiRNAs and 4 491 DElncRNAs were ovary-specific. We further constructed complete endogenous lncRNA-miRNA-mRNA networks using bioinformatics, including 103 DEmiRNAs, 636 DEmRNAs, and 1 622 DElncRNAs. The target genes for the differentially expressed miRNAs and lncRNAs included abundant genes involved in gonadal development, including Wt1, Creb3l2, Gata4, Wnt2, Nr5a1, Hsd17, Igf2r, H2afz, Lin52, Trim71, Zar1, and Jazf1.Conclusions: In animals, miRNA and lncRNA as master regulators regulate reproductive processes by controlling the expression of mRNAs. Considering their importance, the identified miRNAs, lncRNAs, and their targets in P. sinensis might be useful for studying the molecular processes involved in sexual reproduction and genome editing to produce higher quality aquaculture animals. A thorough understanding of ncRNA-based cellular regulatory networks will aid in the improvement of P. sinensis reproductive traits for aquaculture.


2019 ◽  
Author(s):  
Xiao Ma ◽  
Shuangshuang Cen ◽  
Luming Wang ◽  
Chao Zhang ◽  
Limin Wu ◽  
...  

Abstract Abstract Background: Gonad is the major factor affecting the animal reproduction. The regulation mechanism of protein coding genes expression involved reproduction is still remains to be elucidated. Increasing evidence has shown that ncRNAs play key regulatory roles in gene expression in many life processes. The roles of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in reproduction had been investigated in some species. However, the regulation patterns of miRNA and lncRNA in sex biased expression of protein coding genes remains to be elucidated. In this study, we performed an integrated analysis of miRNA, messenger RNA (mRNA), and lncRNA expression profiles to explore their regulatory patterns in the female ovary and male testis of the soft-shelled turtle, Pelodiscus sinensis. Results: We identified 10 796 mature miRNAs, 44 678 mRNAs, and 58 923 lncRNAs in the testis and ovary. A total of 16 817 target genes were identified for miRNAs. Of these, 11 319 mRNAs, 10 495 lncRNAs, and 633 miRNAs were expressed differently. The predicted target genes of these differential expression (DE) miRNAs and lncRNAs included genes related to reproduction regulation. Furthermore, we found that 5 408 DElncRNAs and 186 DE miRNAs showed sex-specific expression. Of these, 3 miRNAs and 917 lncRNAs were testis specific and 186 DEmiRNAs and 4 491 DElncRNAs were ovary specific. We constructed compete endogenous lncRNA-miRNA-mRNA networks using bioinformatics, including 273 DEmRNAs, 5 730 DEmiRNAs, and 2 945 DElncRNAs. The target genes for the different expressed of miRNAs and lncRNAs included Wt1, Creb3l2, Gata4, Wnt2, Nr5a1, Hsd17, Igf2r, H2afz, Lin52, Trim71, Zar1, and Jazf1, etc. Conclusions: In animals, miRNA and lncRNA regulate the reproduction process, including the regulation of oocyte maturation and spermatogenesis. Considering their importance, the identified miRNAs, lncRNAs, and their targets in P. sinensis might be useful for genome editing to produce higher quality aquaculture animals. A thorough understanding of ncRNA-based cellular regulatory networks will aid in the improvement of P. sinensis reproduction traits for aquaculture.


2021 ◽  
Vol 16 ◽  
Author(s):  
Yuan Quan ◽  
Hong-Yu Zhang

Background: Genome-wide association studies (GWAS) have opened the door to unprecedented large-scale identification of susceptibility loci for human diseases and traits. However, it is still a great challenge to validate these loci and elucidate how these sequence variants give rise to the genetic and phenotypic changes. Because many drug targets are genetic disease genes and the general drug mode of action (MoA, agonist or antagonist) is in line with the consequence of target gene mutations (loss-of-function (LOF) or gain-of-function (GOF)), here we propose a chemical genetic method to address the above issues of GWAS. Objective: This study intends to use chemical genetics information to validate GWAS-derived disease loci and interpret their underlying pathogenesis. Method: We conducted a comprehensive comparative analysis on GWAS data and drug/target information (chemical genetics information). Results: We have identified hundreds of GWAS-derived disease loci which are linked to drug target genes and have matched disease traits and drug indications. It is interesting to note that more than 40% genes have been recognized as disorder factors, indicating the potential power of chemical genetic validation. The pathogenesis of these loci was inferred by corresponding drug MoA. Some inferences were supported by prior experimental observations; some were interpreted in terms of microRNA regulation, codon usage bias, and transcriptional regulation, in particular the transcription factorbinding affinity variation induced by disease-causing mutations. Conclusion: In summary, chemical genetics information is useful to validate GWAS-derived disease loci and to interpret their underlying pathogenesis as well, which has important implications not only in medical genetics but also in methodology evaluation of GWAS.


Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 2507-2507
Author(s):  
Shuangli Mi ◽  
Fuhong He ◽  
Jun Wu ◽  
Jing Zhou ◽  
George Wu ◽  
...  

Abstract Abstract 2507 The MLL (mixed-lineage leukemia) gene encodes a histone methyltransferase that is critical in maintaining gene expression during hematopoiesis. Chromosomal translocations disrupting MLL often leads to the creation of MLL fusion genes that act as potent drivers of acute leukemia. MLL fusion proteins are oncogenic transcription factors that activate the expression of downstream target genes. Expression profiling on patient primary samples and established mouse models has revealed hundreds of protein coding genes which are either up-regulated or down-regulated in MLL associated leukemias. Persistent coexpression of two of those genes, HoxA9 and Meis1, is essential for the initiation and maintenance of MLL leukemia. Our studies have also shown strong association of a microRNA (miRNA) expression signature with MLL- rearranged leukemia, and the expression of several miRNAs were under the control of MLL wild type and fusion proteins. Although profiling of miRNA expression has been reported, the mechanisms underlying deregulated miRNA expression in MLL associated leukemia are poorly understood. Given the role of miRNA as a global suppressor of mRNA gene expression, we hypothesized that the expression of miRNAs could be directly activated by MLL fusion and/or wild type proteins upon MLL gene rearrangement and subsequently down-regulate pertinent target mRNAs to contribute to leukemogenesis. To test our hypothesis in a systematic way, we examined an inducible MLL-ENL-ER transformed mouse cell line, which grow as myeloblastic cells in the presence of MLL-ENL, and differentiate into neutrophils upon inactivation of the fusion protein. Using chromatin immunoprecipitation assay followed by next generation sequencing (ChIP-Seq), we determined whole genome MLL binding pattern in this cellular model. Upon activation of MLL-ENL, 24 miRNAs showed a significant increase in the level of MLL binding (FDR<0.25), suggesting that those genes are directly bound by the MLL-ENL fusion protein. To explore the impact of MLL fusion protein on miRNA and mRNA gene regulation, we performed whole genome expression analysis using Affymetrix mouse microarray in the presence or absence of MLL-ENL. Upon induction of MLL-ENL, the expression levels of 38 miRNAs (out of 609 tiled on the array) were increased, and 57 of 7858 expressed protein-coding genes were down-regulated. An integrative analysis of MLL binding and mRNA/miRNA expression profiling data showed that transcription of three miRNAs were activated upon binding of MLL-ENL, and ten protein coding genes are potential targets of these miRNAs. We are currently exploring the role of these three miRNAs and their respective mRNA target genes in leukemogenesis using in vitro and in vivo models. Taken together, our data suggest that MLL fusion protein may play an important role in leukemogenesis by promoting miRNA transcription, which subsequently inhibit the expression of critical mRNA target genes. Our study provides a basis to further explore the regulatory network involving MLL fusion protein and its key miRNA target genes in the leukemic genome. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Vol 36 (7) ◽  
pp. 2025-2032
Author(s):  
Yuwei Zhang ◽  
Tianfei Yi ◽  
Huihui Ji ◽  
Guofang Zhao ◽  
Yang Xi ◽  
...  

Abstract Motivation Long noncoding RNA (lncRNA) has been verified to interact with other biomolecules especially protein-coding genes (PCGs), thus playing essential regulatory roles in life activities and disease development. However, the inner mechanisms of most lncRNA–PCG relationships are still unclear. Our study investigated the characteristics of true lncRNA–PCG relationships and constructed a novel predictor with machine learning algorithms. Results We obtained the 307 true lncRNA-PCG pairs from database and found that there are significant differences in multiple characteristics between true and random lncRNA–PCG sets. Besides, 3-fold cross-validation and prediction results on independent test sets show the great AUC values of LR, SVM and RF, among which RF has the best performance with average AUC 0.818 for cross-validation, 0.823 and 0.853 for two independent test sets, respectively. In case study, some candidate lncRNA–PCG relationships in colorectal cancer were found and HOTAIR–COMP interaction was specially exemplified. The proportion of the reported pairs in the predicted positive results was significantly higher than that in negative results (P &lt; 0.05). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3938-3940 ◽  
Author(s):  
Da Kuang ◽  
Jochen Weile ◽  
Roujia Li ◽  
Tom W Ouellette ◽  
Jarry A Barber ◽  
...  

Abstract Summary Fully realizing the promise of personalized medicine will require rapid and accurate classification of pathogenic human variation. Multiplexed assays of variant effect (MAVEs) can experimentally test nearly all possible variants in selected gene targets. Planning a MAVE study involves identifying target genes with clinical impact, and identifying scalable functional assays for that target. Here, we describe MaveQuest, a web-based resource enabling systematic variant effect mapping studies by identifying potential functional assays, disease phenotypes and clinical relevance for nearly all human protein-coding genes. Availability and implementation MaveQuest service: https://mavequest.varianteffect.org/. MaveQuest source code: https://github.com/kvnkuang/mavequest-front-end/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document