scholarly journals Mutation severity spectrum of rare alleles in the human genome is predictive of disease type

2019 ◽  
Author(s):  
Jimin Pei ◽  
Lisa Kinch ◽  
Nick V. Grishin

AbstractThe human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs and has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs observed in the general population into a mutation severity measure of protein-coding genes. This measure reflects a gene’s tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.

2020 ◽  
Author(s):  
Anyou Wang ◽  
Rong Hai

AbstractEukaryotic genomes gradually gain noncoding regions when advancing evolution and human genome actively transcribes >90% of its noncoding regions1, suggesting their criticality in evolutionary human genome. Yet <1% of them have been functionally characterized2, leaving most human genome in dark. Here we systematically decode endogenous lncRNAs located in unannotated regions of human genome and decipher a distinctive functional regime of lncRNAs hidden in massive RNAseq data. LncRNAs divergently distribute across chromosomes, independent of protein-coding regions. Their transcriptions barely initiate on promoters through polymerase II, but mostly on enhancers. Yet conventional enhancer activators(e.g. H3K4me1) only account for a small proportion of lncRNA activation, suggesting alternatively unknown mechanisms initiating the majority of lncRNAs. Meanwhile, lncRNA-self regulation also notably contributes to lncRNA activation. LncRNAs trans-regulate broad bioprocesses, including transcription and RNA processing, cell cycle, respiration, response to stress, chromatin organization, post-translational modification, and development. Overall lncRNAs govern their owned regime distinctive from protein’s.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fabien Degalez ◽  
Frédéric Jehl ◽  
Kévin Muret ◽  
Maria Bernard ◽  
Frédéric Lecerf ◽  
...  

Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called “multi-nucleotide variants” (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs— including 3.3M SNPs with reliable genotypes—were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stop-gained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Noriko Ichino ◽  
MaKayla R Serres ◽  
Rhianna M Urban ◽  
Mark D Urban ◽  
Anthony J Treichel ◽  
...  

One key bottleneck in understanding the human genome is the relative under-characterization of 90% of protein coding regions. We report a collection of 1200 transgenic zebrafish strains made with the gene-break transposon (GBT) protein trap to simultaneously report and reversibly knockdown the tagged genes. Protein trap-associated mRFP expression shows previously undocumented expression of 35% and 90% of cloned genes at 2 and 4 days post-fertilization, respectively. Further, investigated alleles regularly show 99% gene-specific mRNA knockdown. Homozygous GBT animals in ryr1b, fras1, tnnt2a, edar and hmcn1 phenocopied established mutants. 204 cloned lines trapped diverse proteins, including 64 orthologs of human disease-associated genes with 40 as potential new disease models. Severely reduced skeletal muscle Ca2+ transients in GBT ryr1b homozygous animals validated the ability to explore molecular mechanisms of genetic diseases. This GBT system facilitates novel functional genome annotation towards understanding cellular and molecular underpinnings of vertebrate biology and human disease.


PLoS ONE ◽  
2010 ◽  
Vol 5 (1) ◽  
pp. e8949 ◽  
Author(s):  
Danny A. Bitton ◽  
Duncan L. Smith ◽  
Yvonne Connolly ◽  
Paul J. Scutt ◽  
Crispin J. Miller

2021 ◽  
Author(s):  
Noah Dukler ◽  
Mehreen R Mughal ◽  
Ritika Ramani ◽  
Yi-Fei Huang ◽  
Adam Siepel

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.


2021 ◽  
Author(s):  
Roberta Esposito ◽  
Andres Lanzos ◽  
Taisia Polidori ◽  
Hugo Guillen-Ramirez ◽  
Bernard Merlin ◽  
...  

Tumour DNA contains thousands of single nucleotide variants (SNVs) in non-protein-coding regions, yet it remains unclear which are driver mutations that promote cell fitness. Amongst the most highly mutated non-coding elements are long noncoding RNAs (lncRNAs), which can promote cancer and may be targeted therapeutically. We here searched for evidence that driver mutations may act through alteration of lncRNA function. Using an integrative driver discovery algorithm, we analysed single nucleotide variants (SNVs) from 2583 primary tumours and 3527 metastases to reveal 54 candidate driver lncRNAs (FDR<0.1). Their relevance is supported by enrichment for previously-reported cancer genes and by clinical and genomic features. Using knockdown and transgene overexpression, we show that tumour SNVs in two novel lncRNAs can boost cell fitness. Researchers have noted particularly high yet unexplained mutation rates in the iconic cancer lncRNA, NEAT1. We apply in cellulo mutagenesis by CRISPR-Cas9 to identify vulnerable regions of NEAT1 where SNVs reproducibly increase cell fitness in both transformed and normal backgrounds. In particular, mutations in the 5-prime region of NEAT1 alter ribonucleoprotein assembly and boost the population of subnuclear paraspeckles. Together, this work reveals function-altering somatic lncRNA mutations as a new route to enhanced cell fitness during transformation and metastasis.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8025 ◽  
Author(s):  
Shan-Shan Sun ◽  
Xiao-Jun Zhou ◽  
Zhi-Zhong Li ◽  
Hong-Yang Song ◽  
Zhi-Cheng Long ◽  
...  

Chloroplasts are typically inherited from the female parent and are haploid in most angiosperms, but rare intra-individual heteroplasmy in plastid genomes has been reported in plants. Here, we report an example of plastome heteroplasmy and its characteristics in Gentiana tongolensis (Gentianaceae). The plastid genome of G. tongolensis is 145,757 bp in size and is missing parts of petD gene when compared with other Gentiana species. A total of 112 single nucleotide polymorphisms (SNPs) and 31 indels with frequencies of more than 2% were detected in the plastid genome, and most were located in protein coding regions. Most sites with SNP frequencies of more than 10% were located in six genes in the LSC region. After verification via cloning and Sanger sequencing at three loci, heteroplasmy was identified in different individuals. The cause of heteroplasmy at the nucleotide level in plastome of G. tongolensis is unclear from the present data, although biparental plastid inheritance and transfer of plastid DNA seem to be most likely. This study implies that botanists should reconsider the heredity and evolution of chloroplasts and be cautious with using chloroplasts as genetic markers, especially in Gentiana.


BMC Genomics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 141 ◽  
Author(s):  
Jainab Khatun ◽  
Yanbao Yu ◽  
John A Wrobel ◽  
Brian A Risk ◽  
Harsha P Gunawardena ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document