scholarly journals Understanding the factors that shape patterns of nucleotide diversity in the house mouse genome

2018 ◽  
Author(s):  
Tom R. Booker ◽  
Peter D. Keightley

AbstractA major goal of population genetics has been to determine the extent to which selection at linked sites influences patterns of neutral nucleotide diversity in the genome. Multiple lines of evidence suggest that diversity is influenced by both positive and negative selection. For example, in many species there are troughs in diversity surrounding functional genomic elements, consistent with the action of either background selection (BGS) or selective sweeps. In this study, we investigated the causes of the diversity troughs that are observed in the wild house mouse genome. Using the unfolded site frequency spectrum (uSFS), we estimated the strength and frequencies of deleterious and advantageous mutations occurring in different functional elements in the genome. We then used these estimates to parameterize forward-in-time simulations of chromosomes, using realistic distributions of functional elements and recombination rate variation in order to determine if selection at linked sites can explain the observed patterns of nucleotide diversity. The simulations suggest that BGS alone cannot explain the dips in diversity around either exons or conserved non-coding elements (CNEs). A combination of BGS and selective sweeps, however, can explain the troughs in diversity around CNEs. This is not the case for protein-coding exons, where observed dips in diversity cannot be explained by parameter estimates obtained from the uSFS. We discuss the extent to which our results provide evidence of sweeps playing a role in shaping patterns of nucleotide diversity and the limitations of using the uSFS for obtaining inferences of the frequency and effects of advantageous mutations.Author SummaryWe present a study examining the causes of variation in nucleotide diversity across the mouse genome. The status of mice as a model organism in the life sciences makes them an excellent model system for studying molecular evolution in mammals. In our study, we analyse how natural selection acting on new mutations can affect levels of nucleotide diversity through the processes of background selection and selective sweeps. To perform our analyses, we first estimated the rate and strengths of selected mutations from a sample of wild mice and then use our estimates in realistic population genetic simulations. Analysing simulations, we find that both harmful and beneficial mutations are required to explain patterns of nucleotide diversity in regions of the genome close to gene regulatory elements. For protein-coding genes, however, our approach is not able to fully explain observed patterns and we think that this is because there are strongly advantageous mutations that occur in protein-coding genes that we were not able to detect.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joonhyung Jung ◽  
Changkyun Kim ◽  
Joo-Hwan Kim

Abstract Background Commelinaceae (Commelinales) comprise 41 genera and are widely distributed in both the Old and New Worlds, except in Europe. The relationships among genera in this family have been suggested in several morphological and molecular studies. However, it is difficult to explain their relationships due to high morphological variations and low support values. Currently, many researchers have been using complete chloroplast genome data for inferring the evolution of land plants. In this study, we completed 15 new plastid genome sequences of subfamily Commelinoideae using the Mi-seq platform. We utilized genome data to reveal the structural variations and reconstruct the problematic positions of genera for the first time. Results All examined species of Commelinoideae have three pseudogenes (accD, rpoA, and ycf15), and the former two might be a synapomorphy within Commelinales. Only four species in tribe Commelineae presented IR expansion, which affected duplication of the rpl22 gene. We identified inversions that range from approximately 3 to 15 kb in four taxa (Amischotolype, Belosynapsis, Murdannia, and Streptolirion). The phylogenetic analysis using 77 chloroplast protein-coding genes with maximum parsimony, maximum likelihood, and Bayesian inference suggests that Palisota is most closely related to tribe Commelineae, supported by high support values. This result differs significantly from the current classification of Commelinaceae. Also, we resolved the unclear position of Streptoliriinae and the monophyly of Dichorisandrinae. Among the ten CDS (ndhH, rpoC2, ndhA, rps3, ndhG, ndhD, ccsA, ndhF, matK, and ycf1), which have high nucleotide diversity values (Pi > 0.045) and over 500 bp length, four CDS (ndhH, rpoC2, matK, and ycf1) show that they are congruent with the topology derived from 77 chloroplast protein-coding genes. Conclusions In this study, we provide detailed information on the 15 complete plastid genomes of Commelinoideae taxa. We identified characteristic pseudogenes and nucleotide diversity, which can be used to infer the family evolutionary history. Also, further research is needed to revise the position of Palisota in the current classification of Commelinaceae.


2017 ◽  
Vol 114 (34) ◽  
pp. 9158-9163 ◽  
Author(s):  
Steven Timmermans ◽  
Marc Van Montagu ◽  
Claude Libert

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Chen Xie ◽  
Cemalettin Bekpen ◽  
Sven Künzel ◽  
Maryam Keshavarz ◽  
Rebecca Krebs-Wheaton ◽  
...  

The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation.


Genetics ◽  
1999 ◽  
Vol 151 (1) ◽  
pp. 343-357 ◽  
Author(s):  
F Liu ◽  
D Charlesworth ◽  
M Kreitman

AbstractTo test the theoretical prediction that highly inbreeding populations should have low neutral genetic diversity relative to closely related outcrossing populations, we sequenced portions of the cytosolic phosphoglucose isomerase (PgiC) gene in the plant genus Leavenworthia, which includes both self-incompatible and inbreeding taxa. On the basis of sequences of intron 12 of this gene, the expected low diversity was seen in both populations of the selfers Leavenworthia uniflora and L. torulosa and in three highly inbreeding populations of L. crassa, while high diversity was found in self-incompatible L. stylosa, and moderate diversity in L. crassa populations with partial or complete self-incompatibility. In L. stylosa, the nucleotide diversity was strongly structured into three haplotypic classes, differing by several insertion/deletion sequences, with linkage disequilibrium between sequences of the three types in intron 12, but not in the adjacent regions. Differences between the three kinds of haplotypes are larger than between sequences of this gene region from different species. The haplotype divergence suggests the presence of a balanced polymorphism at this locus, possibly predating the split between L. stylosa and its two inbreeding sister taxa, L. uniflora and L. torulosa. It is therefore difficult to distinguish between different potential causes of the much lower sequence diversity at this locus in inbreeding than outcrossing populations. Selective sweeps during the evolution of these populations are possible, or background selection, or merely loss of a balanced polymorphism maintained by overdominance in the populations that evolved high selfing rates.


2020 ◽  
Author(s):  
Yura Kim ◽  
Mariam Naghavi ◽  
Ying-Tao Zhao

ABSTRACTThe human genome contains more than 4000 genes that are longer than 100 kb. These long genes require more time and resources to make a transcript than shorter genes do. Long genes have also been linked to various human diseases. Specific mechanisms are utilized by long genes to facilitate their transcription and co-transcriptional processes. This results in unique features in their multi-omics profiles. Although these unique profiles are important to understand long genes, a database that provides an integrated view and easy access to the multi-omics profiles of long genes does not exist. We leveraged the publicly accessible multi-omics data and systematically analyzed the genomic conservation, histone modifications, chromatin organization, tissue-specific transcriptome, and single cell transcriptome of 992 protein-coding genes that are longer than 200 kb in the mouse genome. We also examined the evolution history of their gene lengths in 15 species that belong to six Classes and 11 Orders. To share the multi-omics profiles of long genes, we developed a user-friendly and easy-to-use database, LongGeneDB (https://longgenedb.com), for users to search, browse, and download these profiles. LongGeneDB will be a useful data hub for the biomedical research community to understand long genes.


2020 ◽  
Author(s):  
Yuan Hua ◽  
Ning Li ◽  
Jie Chen ◽  
Bao-Zhen Hua ◽  
Shi-Heng Tao

Abstract Background: Mitochondrial genomes play a significant role in reconstructing phylogenetic relationships and revealing molecular evolution in insects. However, only two species of Panorpidae have been documented for mitochondrial genomes in Mecoptera to date.Results: We obtained complete mitochondrial genomes of 17 species of Panorpidae. The results show that the complete mitogenome sequences of Panorpidae all contain 37 genes (13 protein-coding genes (PCGs), two rRNAs, 22 tRNAs) and one control region. The mitogenomes exhibit a strong AT bias. The AT-skew can either be slightly positive or slightly negative, while the GC-skew is usually negative. The 22 tRNA genes can fold into a common cloverleaf secondary structure except trnS1. The sliding window and genetic distance analyses demonstrate highly variable nucleotide diversity among the 13 protein-coding genes, with comparatively low evolutionary rate of cox1, cox2 and nad1, and high variability of nad2 and nad6. The phylogeny of Panorpidae can be presented as (Neopanorpa + Furcatopanorpa) + (Dicerapanorpa + (Panorpa debilis + (Sinopanorpa + (Cerapanorpa + Panorpa)))).Conclusions: Our analyses indicate that the genes nad2 and nad6 can be regarded as potential markers for population genetics and species delimitation in Panorpidae. Panorpa is reconfirmed a paraphyletic group.


Plants ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1497
Author(s):  
Cai-Yun Zhang ◽  
Tong-Jian Liu ◽  
Xiao-Lu Mo ◽  
Hui-Run Huang ◽  
Gang Yao ◽  
...  

Pogostemon Desf., the largest genus of the tribe Pogostemoneae (Lamiaceae), consists of ca. 80 species distributed mainly from South and Southeast Asia to China. The genus contains many patchouli plants, which are of great economic importance but taxonomically difficult. Therefore, it is necessary to characterize more chloroplast (cp) genomes for infrageneric phylogeny analyses and species identification of Pogostemon, especially for patchouli plants. In this study, we newly generated four cp genomes for three patchouli plants (i.e., Pogostemon plectranthoides Desf., P. septentrionalis C. Y. Wu et Y. C. Huang, and two cultivars of P. cablin (Blanoco) Benth.). Comparison of all samples (including online available cp genomes of P. yatabeanus (Makino) Press and P. stellatus (Lour.) Kuntze) suggested that Pogostemon cp genomes are highly conserved in terms of genome size and gene content, with a typical quadripartite circle structure. Interspecific divergence of cp genomes has been maintained at a relatively low level, though seven divergence hotspot regions were identified by stepwise window analysis. The nucleotide diversity (Pi) value was correlated significantly with gap proportion (indels), but significantly negative with GC content. Our phylogenetic analyses based on 80 protein-coding genes yielded high-resolution backbone topologies for the Lamiaceae and Pogostemon. For the overall mean substitution rates, the synonymous (dS) and nonsynonymous (dN) substitution rate values of protein-coding genes varied approximately threefold, while the dN values among different functional gene groups showed a wider variation range. Overall, the cp genomes of Pogostemon will be useful for phylogenetic reconstruction, species delimitation and identification in the future.


2011 ◽  
Vol 21 (5) ◽  
pp. 756-767 ◽  
Author(s):  
M. Brosch ◽  
G. I. Saunders ◽  
A. Frankish ◽  
M. O. Collins ◽  
L. Yu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document