Discovering novel long non-coding RNA predictors of anticancer drug sensitivity beyond protein-coding genes

AbstractLarge-scale cancer cell line screens have identified thousands of protein-coding genes (PCGs) as biomarkers of anticancer drug response. However, systematic evaluation of long non-coding RNAs (lncRNAs) as pharmacogenomic biomarkers has so far proven challenging. Here, we study the contribution of lncRNAs as drug response predictors beyond spurious associations driven by correlations with proximal PCGs, tissue-lineage or established biomarkers. We show that, as a whole, the lncRNA transcriptome is equally potent as the PCG transcriptome at predicting response to hundreds of anticancer drugs. Analysis of individual lncRNAs transcripts associated with drug response reveals nearly half of the significant associations are in fact attributable to proximal cis-PCGs. However, adjusting for effects of cis-PCGs revealed significant lncRNAs that augment drug response predictions for most drugs, including those with well-established clinical biomarkers. In addition, we identify lncRNA-specific somatic alterations associated with drug response by adopting a statistical approach to determine lncRNAs carrying somatic mutations that undergo positive selection in cancer cells. Lastly, we experimentally demonstrate that two novel lncRNA, EGFR-AS1 and MIR205HG, are functionally relevant predictors of anti-EGFR drug response.

Download Full-text

Discovering long noncoding RNA predictors of anticancer drug sensitivity beyond protein-coding genes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1909998116 ◽

2019 ◽

Vol 116 (44) ◽

pp. 22020-22029 ◽

Cited By ~ 9

Author(s):

Aritro Nath ◽

Eunice Y. T. Lau ◽

Adam M. Lee ◽

Paul Geeleher ◽

William C. S. Cho ◽

...

Keyword(s):

Anticancer Drug ◽

Noncoding Rna ◽

Large Scale ◽

Drug Response ◽

Cancer Cell Line ◽

Systematic Evaluation ◽

Protein Coding ◽

Protein Coding Genes ◽

Clinical Biomarkers ◽

Response Predictors

Large-scale cancer cell line screens have identified thousands of protein-coding genes (PCGs) as biomarkers of anticancer drug response. However, systematic evaluation of long noncoding RNAs (lncRNAs) as pharmacogenomic biomarkers has so far proven challenging. Here, we study the contribution of lncRNAs as drug response predictors beyond spurious associations driven by correlations with proximal PCGs, tissue lineage, or established biomarkers. We show that, as a whole, the lncRNA transcriptome is equally potent as the PCG transcriptome at predicting response to hundreds of anticancer drugs. Analysis of individual lncRNAs transcripts associated with drug response reveals nearly half of the significant associations are in fact attributable to proximal cis-PCGs. However, adjusting for effects of cis-PCGs revealed significant lncRNAs that augment drug response predictions for most drugs, including those with well-established clinical biomarkers. In addition, we identify lncRNA-specific somatic alterations associated with drug response by adopting a statistical approach to determine lncRNAs carrying somatic mutations that undergo positive selection in cancer cells. Lastly, we experimentally demonstrate that 2 lncRNAs, EGFR-AS1 and MIR205HG, are functionally relevant predictors of anti-epidermal growth factor receptor (EGFR) drug response.

Download Full-text

PaperBLAST: Text-mining papers for information about homologs

10.1101/133041 ◽

2017 ◽

Author(s):

Morgan N. Price ◽

Adam P. Arkin

Keyword(s):

Text Mining ◽

Genome Sequencing ◽

Full Text ◽

Large Scale ◽

Scientific Literature ◽

Protein Sequences ◽

Protein Coding ◽

Link Protein ◽

Protein Coding Genes ◽

Link Type

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.

Download Full-text

An Exploration of the Sequence of a 2.9-Mb Region of the Genome of Drosophila melanogaster: The Adh Region

Genetics ◽

10.1093/genetics/153.1.179 ◽

1999 ◽

Vol 153 (1) ◽

pp. 179-219 ◽

Cited By ~ 15

Author(s):

M Ashburner ◽

S Misra ◽

J Roote ◽

S E Lewis ◽

R Blazej ◽

...

Keyword(s):

Drosophila Melanogaster ◽

Transposable Element ◽

Large Scale ◽

Chromosome Region ◽

Complete Sequence ◽

Test Methods ◽

P Element ◽

Cdna Libraries ◽

Protein Coding ◽

Protein Coding Genes

Abstract A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized “Adh region.” A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.

Download Full-text

Abstract B28: Identification of drug-response biomarkers for combined mTORC1/2 and MEK1/2 investigational agents using a large-scale cancer cell line screen.

10.1158/1535-7163.targ-13-b28 ◽

2013 ◽

Author(s):

Hyunjin Shin ◽

Derek Blair ◽

Bin Li ◽

Greg Hather ◽

William L. Trepicchio ◽

...

Keyword(s):

Cell Line ◽

Cancer Cell ◽

Large Scale ◽

Drug Response ◽

Cancer Cell Line ◽

Drug Response Biomarkers ◽

Investigational Agents ◽

Response Biomarkers

Download Full-text

Large-Scale Parsimony Analysis of Metazoan Indels in Protein-Coding Genes

Molecular Biology and Evolution ◽

10.1093/molbev/msp263 ◽

2009 ◽

Vol 27 (2) ◽

pp. 441-451 ◽

Cited By ~ 32

Author(s):

F. Belinky ◽

O. Cohen ◽

D. Huchon

Keyword(s):

Large Scale ◽

Parsimony Analysis ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

High‐quality genomes reveal significant genetic divergence and cryptic speciation in the model organism Folsomia candida (Collembola)

10.22541/au.164018558.87095695/v1 ◽

2021 ◽

Author(s):

Yun-Xia Luan ◽

Yingying Cui ◽

Wan-Jun Chen ◽

Jianfeng Jin ◽

Ai-Min Liu ◽

...

Keyword(s):

Large Scale ◽

Test Organism ◽

Gene Families ◽

Species Differentiation ◽

Folsomia Candida ◽

Cryptic Speciation ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Soil Arthropod

The collembolan Folsomia candida Willem, 1902, is an important representative soil arthropod that is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal “standard” because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we present two high-quality chromosome-level genomes of F. candida, for the parthenogenetic Danish strain (FCDK, 219.08 Mb, N50 of 38.47 Mb, 25,139 protein-coding genes) and the sexual Shanghai strain (FCSH, 153.09 Mb, N50 of 25.75 Mb, 21,609 protein-coding genes). The seven chromosomes of FCDK are each 25–54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to its broader environmental adaptation. In addition, the overall sequence identity of the two mitogenomes is only 78.2%, and FCDK has fewer strain-specific microRNAs than FCSH. In conclusion, FCDK and FCSH have accumulated independent genetic changes and evolved into distinct species since diverging 10 Mya. Our work shows that F. candida represents a good model of rapidly cryptic speciation. Moreover, it provides important genomic resources for studying the mechanisms of species differentiation, soil arthropod adaptation to soil ecosystems, and Wolbachia-induced parthenogenesis as well as the evolution of Collembola, a pivotal phylogenetic clade between Crustacea and Insecta.

Download Full-text

Abstract 3611: Identifying gene expression markers of anticancer drug response using large scale genomic and drug response databases established from patient derived tumors

10.1158/1538-7445.am2012-3611 ◽

2012 ◽

Author(s):

Thomas Broudy ◽

Kesavan Praveen Nair ◽

Erica I. Livingston ◽

Steve Hoffmaster ◽

Martin Vo ◽

...

Keyword(s):

Gene Expression ◽

Anticancer Drug ◽

Large Scale ◽

Drug Response ◽

Anticancer Drug Response

Download Full-text

Large-scale analysis of human gene expression variability associates highly variable drug targets with lower drug effectiveness and safety

Bioinformatics ◽

10.1093/bioinformatics/btz023 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3028-3037 ◽

Cited By ~ 8

Author(s):

Eyal Simonovsky ◽

Ronen Schuster ◽

Esti Yeger-Lotem

Keyword(s):

Drug Target ◽

Large Scale ◽

Target Genes ◽

Supplementary Information ◽

Protein Coding ◽

Expression Levels ◽

Expression Variability ◽

Protein Coding Genes ◽

Approved Drugs ◽

Variable Genes

Abstract Motivation The effectiveness of drugs tends to vary between patients. One of the well-known reasons for this phenomenon is genetic polymorphisms in drug target genes among patients. Here, we propose that differences in expression levels of drug target genes across individuals can also contribute to this phenomenon. Results To explore this hypothesis, we analyzed the expression variability of protein-coding genes, and particularly drug target genes, across individuals. For this, we developed a novel variability measure, termed local coefficient of variation (LCV), which ranks the expression variability of each gene relative to genes with similar expression levels. Unlike commonly used methods, LCV neutralizes expression levels biases without imposing any distribution over the variation and is robust to data incompleteness. Application of LCV to RNA-sequencing profiles of 19 human tissues and to target genes of 1076 approved drugs revealed that drug target genes were significantly more variable than protein-coding genes. Analysis of 113 drugs with available effectiveness scores showed that drugs targeting highly variable genes tended to be less effective in the population. Furthermore, comparison of approved drugs to drugs that were withdrawn from the market showed that withdrawn drugs targeted significantly more variable genes than approved drugs. Last, upon analyzing gender differences we found that the variability of drug target genes was similar between men and women. Altogether, our results suggest that expression variability of drug target genes could contribute to the variable responsiveness and effectiveness of drugs, and is worth considering during drug treatment and development. Availability and implementation LCV is available as a python script in GitHub (https://github.com/eyalsim/LCV). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Personalized cancer therapy prioritization based on driver alteration co-occurrence patterns

Genome Medicine ◽

10.1186/s13073-020-00774-x ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Lidia Mateo ◽

Miquel Duran-Frigola ◽

Albert Gris-Oliver ◽

Marta Palafox ◽

Maurizio Scaltriti ◽

...

Keyword(s):

Survival Data ◽

Large Scale ◽

Drug Response ◽

Progression Free Survival ◽

Driver Gene ◽

Precision Oncology ◽

Response Predictors ◽

Personalized Cancer Therapy ◽

Personalized Cancer ◽

Occurrence Patterns

Abstract Identification of actionable genomic vulnerabilities is key to precision oncology. Utilizing a large-scale drug screening in patient-derived xenografts, we uncover driver gene alteration connections, derive driver co-occurrence (DCO) networks, and relate these to drug sensitivity. Our collection of 53 drug-response predictors attains an average balanced accuracy of 58% in a cross-validation setting, rising to 66% for a subset of high-confidence predictions. We experimentally validated 12 out of 14 predictions in mice and adapted our strategy to obtain drug-response models from patients’ progression-free survival data. Our strategy reveals links between oncogenic alterations, increasing the clinical impact of genomic profiling.

Download Full-text

Functional and transcriptional profiling of non-coding RNAs in yeast reveal context-dependent phenotypes and in trans effects on the protein regulatory network

PLoS Genetics ◽

10.1371/journal.pgen.1008761 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008761

Author(s):

Laura Natalia Balarezo-Cisneros ◽

Steven Parker ◽

Marcin G. Fraczek ◽

Soukaina Timouma ◽

Ping Wang ◽

...

Keyword(s):

Large Scale ◽

Transcriptional Profiling ◽

Growth Conditions ◽

Phenotypic Data ◽

Protein Coding ◽

Protein Coding Genes ◽

Genome Wide ◽

In Trans ◽

Non Coding Rnas ◽

The Impact

Non-coding RNAs (ncRNAs), including the more recently identified Stable Unannotated Transcripts (SUTs) and Cryptic Unstable Transcripts (CUTs), are increasingly being shown to play pivotal roles in the transcriptional and post-transcriptional regulation of genes in eukaryotes. Here, we carried out a large-scale screening of ncRNAs in Saccharomyces cerevisiae, and provide evidence for SUT and CUT function. Phenotypic data on 372 ncRNA deletion strains in 23 different growth conditions were collected, identifying ncRNAs responsible for significant cellular fitness changes. Transcriptome profiles were assembled for 18 haploid ncRNA deletion mutants and 2 essential ncRNA heterozygous deletants. Guided by the resulting RNA-seq data we analysed the genome-wide dysregulation of protein coding genes and non-coding transcripts. Novel functional ncRNAs, SUT125, SUT126, SUT035 and SUT532 that act in trans by modulating transcription factors were identified. Furthermore, we described the impact of SUTs and CUTs in modulating coding gene expression in response to different environmental conditions, regulating important biological process such as respiration (SUT125, SUT126, SUT035, SUT432), steroid biosynthesis (CUT494, SUT053, SUT468) or rRNA processing (SUT075 and snR30). Overall, these data capture and integrate the regulatory and phenotypic network of ncRNAs and protein-coding genes, providing genome-wide evidence of the impact of ncRNAs on cellular homeostasis.

Download Full-text