scholarly journals New genomic signals underlying the emergence of human proto-genes

2022 ◽  
Author(s):  
Anna Grandchamp ◽  
Katrin Berk ◽  
Elias Dohmen ◽  
Erich Bornberg-Bauer

De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes properties, correlated to their age and mechanisms of emergence. In this study, we investigate four properties: introns, upstream regulatory motifs, 5 prime UTRs and protein domains, in 23135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our result suggest that 41% proto-genes might have captured existing introns, as well as the fact that 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic ones are more enriched in enhancers, even if the motif TATA is most expressed upstream these genes. Intergenic and intronic 5 prime UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our paper demonstrates that introns, 5 prime UTRs, and domains have specific properties in proto-genes. We also show the importance of studying proto-genes in relation to their genomic position, as it strongly impacts these properties.

2014 ◽  
Author(s):  
John Stewart Taylor

In 2009 Knowles and McLysaght reported the discovery of three human genes derived from non-coding DNA. They provided evidence that these genes, CLUU1, C22orf45, and DNAH10OS, were transcribed and translated, they identified orthologous non-coding DNA in chimpanzee (Pan troglodytes) and macaque (Macaca mulatta), and for each gene they located the critical ?enabler? mutations that extended the open reading frames (ORFs) allowing the production of a protein. These genes had no BLASTp hits in any other genome and were considered to be novel human genes, possibly responsible for human-specific traits. Since the discovery of these genes, new high quality Denisovan and Neanderthal genomes have been reported. I used these resources in an effort to determine whether or not CLUU1, C22orf45, and DNAH10OS were truly human-specific.


2018 ◽  
Vol 39 (11) ◽  
pp. 1505-1516 ◽  
Author(s):  
Isabelle Thiffault ◽  
Maxime Cadieux‐Dion ◽  
Emily Farrow ◽  
Raymond Caylor ◽  
Neil Miller ◽  
...  

2014 ◽  
Vol 24 (5) ◽  
pp. 869-884 ◽  
Author(s):  
P. J. Balwierz ◽  
M. Pachkov ◽  
P. Arnold ◽  
A. J. Gruber ◽  
M. Zavolan ◽  
...  

2016 ◽  
Vol 4 (5) ◽  
Author(s):  
Jihua Wang ◽  
Li Wang ◽  
Gan Cao ◽  
Muqing Zhang ◽  
Ying Guo

Here, we report the draft genome sequence of Leifsonia xyli subsp. xyli strain gdw1, isolated from the stem of Badila sugarcane located at the Guangdong Key Laboratory for Crops Genetic Improvement (Guanzhou, China), that causes ratoon stunting disease of sugarcane. The de novo genome of Leifsonia xyli subsp. xyli was assembled with 48 scaffolds and a G+C content of 67.68%, and contained 2.6 Mb bp and 2,838 coding sequences.


2019 ◽  
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  

Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs, providing an approach to select smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.


2019 ◽  
Vol 36 (8) ◽  
pp. 1701-1710 ◽  
Author(s):  
Donate Weghorn ◽  
Daniel J Balick ◽  
Christopher Cassa ◽  
Jack A Kosmicki ◽  
Mark J Daly ◽  
...  

Abstract The fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein-truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation–selection balance, as has been proposed earlier (Cassa et al. 2017). Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC data set (Lek et al. 2016). Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation–selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.


DNA Research ◽  
2019 ◽  
Vol 26 (4) ◽  
pp. 341-352
Author(s):  
Michal Růžička ◽  
Přemysl Souček ◽  
Petr Kulhánek ◽  
Lenka Radová ◽  
Lenka Fajkusová ◽  
...  

Abstract Mutations can be induced by environmental factors but also arise spontaneously during DNA replication or due to deamination of methylated cytosines at CpG dinucleotides. Sites where mutations occur with higher frequency than would be expected by chance are termed hotspots while sites that contain mutations rarely are termed coldspots. Mutations are permanently scanned and repaired by repair systems. Among them, the mismatch repair targets base pair mismatches, which are discriminated from canonical base pairs by probing altered elasticity of DNA. Using biased molecular dynamics simulations, we investigated the elasticity of coldspots and hotspots motifs detected in human genes associated with inherited disorders, and also of motifs with Czech population hotspots and de novo mutations. Main attention was paid to mutations leading to G/T and A+/C pairs. We observed that hotspots without CpG/CpHpG sequences are less flexible than coldspots, which indicates that flexible sequences are more effectively repaired. In contrary, hotspots with CpG/CpHpG sequences exhibited increased flexibility as coldspots. Their mutability is more likely related to spontaneous deamination of methylated cytosines leading to C > T mutations, which are primarily targeted by base excision repair. We corroborated conclusions based on computer simulations by measuring melting curves of hotspots and coldspots containing G/T mismatch.


Sign in / Sign up

Export Citation Format

Share Document