Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

AbstractPigs (Sus scrofa) exhibit diverse phenotypes in different breeds shaped by the combined effects of various local adaptation and artificial selection. To comprehensively characterize the genetic diversity of pigs, we construct a pig pan-genome by comparing genome assemblies of 11 representative pig breeds with the reference genome (Sscrofa11.1). Approximately 72.5 Mb non-redundant sequences were identified as pan-sequences which were absent from the Sscrofa11.1. On average, 41.7 kb of spurious heterozygous SNPs per individual are removed and 12.9 kb novel SNPs per individual are recovered using pan-genome as the reference for SNP calling, thereby providing enhanced resolution for genetic diversity in pigs. Homolog annotation and analysis using RNA-seq and Hi-C data indicate that these pan-sequences contain protein-coding regions and regulatory elements. These pan-sequences can further improve the interpretation of local 3D structure. The pan-genome as well as the accompanied web-based database will serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.

Download Full-text

G-quadruplex-forming sequences as potential drivers of genetic diversity in primate protein coding genes

10.1101/2020.08.28.272971 ◽

2020 ◽

Author(s):

Manuel Jara-Espejo ◽

Sergio Roberto Peres Line

Keyword(s):

Genetic Diversity ◽

Amino Acid ◽

Protein Function ◽

Evolutionary Dynamics ◽

Regulatory Elements ◽

Evolutionary Trend ◽

Protein Coding ◽

Coding Regions ◽

Mutational Pattern ◽

Species Specific

ABSTRACTWhile non-coding G-quadruplexes (G4s) act as conserved regulatory elements when located in gene promoter and splice sites, the G4 evolutionary conservation in protein coding regions have been low explored. To address the evolutionary dynamics acting on coding G4, we mapped and characterized potential G4-forming sequences across twenty-four primate’s gene orthologous. We found that potentially more stable G4 motifs exist in coding regions following a species-specific trend. Moreover, these motifs depicted the least conserved sites across primates at both the DNA and amino acid levels and are characterized by an indel-rich mutational pattern. This trend was not observed for less stable G4 motifs. A deeper analysis revealed that [G>=3N1]4 motifs, depicting potentially most stable G4s, were associated with the lowest conservation and highest indel frequencies. This mutational pattern was more evident when G4-associated amino acid regions were analyzed. We discuss the possibility of an overall conservation of less/moderate stability G4, while more stable G4 may be preserved or arises in a species-specific manner, which may explain their low conservation. Since structure-prone motifs, including G4, have the potential to induce genomic instability, this evolutionary trend may contribute to avoid broad deleterious effects driven by stable G4 on protein function while promoting genetic diversity across close-related species.

Download Full-text

Dysregulated Transcriptional Control in Prostate Cancer

International Journal of Molecular Sciences ◽

10.3390/ijms20122883 ◽

2019 ◽

Vol 20 (12) ◽

pp. 2883 ◽

Cited By ~ 8

Author(s):

Simon J. Baumgart ◽

Ekaterina Nevedomskaya ◽

Bernard Haendler

Keyword(s):

Prostate Cancer ◽

Drug Targets ◽

Transcriptional Control ◽

Regulatory Elements ◽

Protein Coding ◽

Coding Regions ◽

Super Enhancer ◽

Position Coding ◽

Transcription Dysregulation ◽

Dna Regulatory Elements

Recent advances in whole-genome and transcriptome sequencing of prostate cancer at different stages indicate that a large number of mutations found in tumors are present in non-protein coding regions of the genome and lead to dysregulated gene expression. Single nucleotide variations and small mutations affecting the recruitment of transcription factor complexes to DNA regulatory elements are observed in an increasing number of cases. Genomic rearrangements may position coding regions under the novel control of regulatory elements, as exemplified by the TMPRSS2-ERG fusion and the amplified enhancer identified upstream of the androgen receptor (AR) gene. Super-enhancers are increasingly found to play important roles in aberrant oncogenic transcription. Several players involved in these processes are currently being evaluated as drug targets and may represent new vulnerabilities that can be exploited for prostate cancer treatment. They include factors involved in enhancer and super-enhancer function such as bromodomain proteins and cyclin-dependent kinases. In addition, non-coding RNAs with an important gene regulatory role are being explored. The rapid progress made in understanding the influence of the non-coding part of the genome and of transcription dysregulation in prostate cancer could pave the way for the identification of novel treatment paradigms for the benefit of patients.

Download Full-text

Human to yeast pathway transplantation: cross-species dissection of the adenine de novo pathway regulatory node

10.1101/147579 ◽

2017 ◽

Cited By ~ 4

Author(s):

Neta Agmon ◽

Jasmine Temple ◽

Zuojian Tang ◽

Tobias Schraink ◽

Maayan Baron ◽

...

Keyword(s):

Yeast Strain ◽

Transcriptional Control ◽

De Novo ◽

Enzyme Level ◽

Yeast Cells ◽

Protein Coding ◽

Coding Regions ◽

Human Proteins ◽

Human Ortholog ◽

Yeast Genes

AbstractPathway transplantation from one organism to another represents a means to a more complete understanding of a biochemical or regulatory process. The purine biosynthesis pathway, a core metabolic function, was transplanted from human to yeast. We replaced the entireSaccharomyces cerevisiaeadenine de novo pathway with the cognate human pathway components. A yeast strain was “humanized” for the full pathway by deleting all relevant yeast genes completely and then providing the human pathway in trans using a neochromosome expressing the human protein coding regions under the transcriptional control of their cognate yeast promoters and terminators. The “humanized” yeast strain grows in the absence of adenine, indicating complementation of the yeast pathway by the full set of human proteins. While the strain with the neochromosome is indeed prototrophic, it grows slowly in the absence of adenine. Dissection of the phenotype revealed that the human ortholog ofADE4, PPAT, shows only partial complementation. We have used several strategies to understand this phenotype, that point toPPAT/ADE4as the central regulatory node. Pathway metabolites are responsible for regulatingPPAT’sprotein abundance through transcription and proteolysis as well as its enzymatic activity by allosteric regulation in these yeast cells. Extensive phylogenetic analysis of PPATs from diverse organisms hints at adaptations of the enzyme-level regulation to the metabolite levels in the organism. Finally, we isolated specific mutations in PPAT as well as in other genes involved in the purine metabolic network that alleviate incomplete complementation byPPATand provide further insight into the complex regulation of this critical metabolic pathway.

Download Full-text

Pandoravirus celtis illustrates the microevolution processes at work in the giant Pandoraviridae genomes

10.1101/500207 ◽

2018 ◽

Cited By ~ 1

Author(s):

Matthieu Legendre ◽

Jean-Marie Alempic ◽

Nadège Philippe ◽

Audrey Lartigue ◽

Sandra Jeudy ◽

...

Keyword(s):

De Novo ◽

Gene Repertoire ◽

Protein Coding ◽

Genomic Changes ◽

Coding Regions ◽

Protein Coding Genes ◽

Intergenic Regions ◽

Mere Existence ◽

Increasing Functions ◽

Similar Gene

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.

Download Full-text

Reference Genome for the Highly Transformable Setaria viridis ME034V

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401345 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3467-3478 ◽

Cited By ~ 2

Author(s):

Peter M. Thielen ◽

Amanda L. Pendleton ◽

Robert A. Player ◽

Kenneth V. Bowden ◽

Thomas J. Lawton ◽

...

Keyword(s):

De Novo ◽

Gene Families ◽

Model Organisms ◽

Phylogenomic Analysis ◽

Setaria Viridis ◽

Sequencing Technology ◽

Protein Coding ◽

Genotype Frequencies ◽

Green Foxtail ◽

Genome Assemblies

Setaria viridis (green foxtail) is an important model system for improving cereal crops due to its diploid genome, ease of cultivation, and use of C4 photosynthesis. The S. viridis accession ME034V is exceptionally transformable, but the lack of a sequenced genome for this accession has limited its utility. We present a 397 Mb highly contiguous de novo assembly of ME034V using ultra-long nanopore sequencing technology (read N50 = 41kb). We estimate that this genome is largely complete based on our updated k-mer based genome size estimate of 401 Mb for S. viridis. Genome annotation identified 37,908 protein-coding genes and >300k repetitive elements comprising 46% of the genome. We compared the ME034V assembly with two other previously sequenced Setaria genomes as well as to a diversity panel of 235 S. viridis accessions. We found the genome assemblies to be largely syntenic, but numerous unique polymorphic structural variants were discovered. Several ME034V deletions may be associated with recent retrotransposition of copia and gypsy LTR repeat families, as evidenced by their low genotype frequencies in the sampled population. Lastly, we performed a phylogenomic analysis to identify gene families that have expanded in Setaria, including those involved in specialized metabolism and plant defense response. The high continuity of the ME034V genome assembly validates the utility of ultra-long DNA sequencing to improve genetic resources for emerging model organisms. Structural variation present in Setaria illustrates the importance of obtaining the proper genome reference for genetic experiments. Thus, we anticipate that the ME034V genome will be of significant utility for the Setaria research community.

Download Full-text

Mutations in gene regulatory elements linked to human limb malformations

Journal of Medical Genetics ◽

10.1136/jmedgenet-2019-106369 ◽

2019 ◽

Vol 57 (6) ◽

pp. 361-370

Author(s):

Karol Nowosad ◽

Ewa Hordyjewska-Kowalczyk ◽

Przemko Tylzanowski

Keyword(s):

Regulatory Elements ◽

Regulatory Function ◽

Protein Coding ◽

Coding Regions ◽

Technological Advances ◽

Non Coding Rna ◽

Limb Malformations ◽

Gene Regulatory Elements ◽

Chromatin Organisation ◽

Human Limb

Most of the human genome has a regulatory function in gene expression. The technological progress made in recent years permitted the revision of old and discovery of new mutations outside of the protein-coding regions that do affect human limb morphology. Steadily increasing discovery rate of such mutations suggests that until now the largely neglected part of the genome rises to its well-deserved prominence. In this review, we describe the recent technological advances permitting this unprecedented advance in identifying non-coding mutations. We especially focus on the mutations in cis-regulatory elements such as enhancers, and trans-regulatory elements such as miRNA and long non-coding RNA, linked to hereditary or inborn limb defects. We also discuss the role of chromatin organisation and enhancer–promoter interactions in the aetiology of limb malformations.

Download Full-text

RaGOO: fast and accurate reference-guided scaffolding of draft genomes

Genome Biology ◽

10.1186/s13059-019-1829-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 56

Author(s):

Michael Alonge ◽

Sebastian Soyk ◽

Srividya Ramakrishnan ◽

Xingang Wang ◽

Sara Goodwin ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Open Source ◽

Genome Analysis ◽

De Novo ◽

Structural Variants ◽

Tomato Genome ◽

Pan Genome ◽

Link Type ◽

Genome Assemblies

Abstract We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO.

Download Full-text

De novo, systemic, deleterious amino acid substitutions are common in large cytoskeleton-related protein coding regions

Biomedical Reports ◽

10.3892/br.2016.826 ◽

2016 ◽

Vol 6 (2) ◽

pp. 211-216

Author(s):

Rebecca J. Stoll ◽

Grace R. Thompson ◽

Mohammad D. Samy ◽

George Blanck

Keyword(s):

Amino Acid ◽

De Novo ◽

Amino Acid Substitutions ◽

Related Protein ◽

Protein Coding ◽

Coding Regions

Download Full-text

Identification of novel CSNK2A1 variants and the genotype–phenotype relationship in patients with Okur–Chung neurodevelopmental syndrome: a case report and systematic literature review

Journal of International Medical Research ◽

10.1177/03000605211017063 ◽

2021 ◽

Vol 49 (5) ◽

pp. 030006052110170

Author(s):

Ruo-hao Wu ◽

Wen-ting Tang ◽

Kun-yin Qiu ◽

Xiao-juan Li ◽

Dan-xia Tang ◽

...

Keyword(s):

De Novo ◽

Facial Dysmorphism ◽

Comprehensive Overview ◽

Protein Coding ◽

Coding Regions ◽

Gtp Binding ◽

Phenotypic Spectrum ◽

Whole Exome ◽

First Time ◽

Binding Loop

De novo germline variants of the casein kinase 2α subunit (CK2α) gene ( CSNK2A1) have been reported in individuals with the congenital neuropsychiatric disorder Okur–Chung neurodevelopmental syndrome (OCNS). Here, we report on two unrelated children with OCNS and review the literature to explore the genotype–phenotype relationship in OCNS. Both children showed facial dysmorphism, growth retardation, and neuropsychiatric disorders. Using whole-exome sequencing, we identified two novel de novo CSNK2A1 variants: c.479A>G p.(H160R) and c.238C>T p.(R80C). A search of the literature identified 12 studies that provided information on 35 CSNK2A1 variants in various protein-coding regions of CK2α. By quantitatively analyzing data related to these CSNK2A1 variants and their corresponding phenotypes, we showed for the first time that mutations in protein-coding CK2α regions appear to influence the phenotypic spectrum of OCNS. Mutations altering the ATP/GTP-binding loop were more likely to cause the widest range of phenotypes. Therefore, any assessment of clinical spectra for this disorder should be extremely thorough. This study not only expands the mutational spectrum of OCNS, but also provides a comprehensive overview to improve our understanding of the genotype–phenotype relationship in OCNS.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text