scholarly journals Protein-Coding Genes of Helicobacter pylori Predominantly Present Purifying Selection though Many Membrane Proteins Suffer from Selection Pressure: A Proposal to Analyze Bacterial Pangenomes

Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 377
Author(s):  
Alejandro Rubio ◽  
Antonio Pérez-Pulido

The current availability of complete genome sequences has allowed knowing that bacterial genomes can bear genes not present in the genome of all the strains from a specific species. So, the genes shared by all the strains comprise the core of the species, but the pangenome can be much greater and usually includes genes appearing in one only strain. Once the pangenome of a species is estimated, other studies can be undertaken to generate new knowledge, such as the study of the evolutionary selection for protein-coding genes. Most of the genes of a pangenome are expected to be subject to purifying selection that assures the conservation of function, especially those in the core group. However, some genes can be subject to selection pressure, such as genes involved in virulence that need to escape to the host immune system, which is more common in the accessory group of the pangenome. We analyzed 180 strains of Helicobacter pylori, a bacterium that colonizes the gastric mucosa of half the world population and presents a low number of genes (around 1500 in a strain and 3000 in the pangenome). After the estimation of the pangenome, the evolutionary selection for each gene has been calculated, and we found that 85% of them are subject to purifying selection and the remaining genes present some grade of selection pressure. As expected, the latter group is enriched with genes encoding for membrane proteins putatively involved in interaction to host tissues. In addition, this group also presents a high number of uncharacterized genes and genes encoding for putative spurious proteins. It suggests that they could be false positives from the gene finders used for identifying them. All these results propose that this kind of analyses can be useful to validate gene predictions and functionally characterize proteins in complete genomes.

2021 ◽  
Vol 22 (4) ◽  
pp. 1876
Author(s):  
Frida Belinky ◽  
Ishan Ganguly ◽  
Eugenia Poliakov ◽  
Vyacheslav Yurchenko ◽  
Igor B. Rogozin

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.


2019 ◽  
Author(s):  
Wei Fang ◽  
Yi Wen ◽  
Xiangyun Wei

AbstractTissue-specific or cell type-specific transcription of protein-coding genes is controlled by both trans-regulatory elements (TREs) and cis-regulatory elements (CREs). However, it is challenging to identify TREs and CREs, which are unknown for most genes. Here, we describe a protocol for identifying two types of transcription-activating CREs—core promoters and enhancers—of zebrafish photoreceptor type-specific genes. This protocol is composed of three phases: bioinformatic prediction, experimental validation, and characterization of the CREs. To better illustrate the principles and logic of this protocol, we exemplify it with the discovery of the core promoter and enhancer of the mpp5b apical polarity gene (also known as ponli), whose red, green, and blue (RGB) cone-specific transcription requires its enhancer, a member of the rainbow enhancer family. While exemplified with an RGB cone-specific gene, this protocol is general and can be used to identify the core promoters and enhancers of other protein-coding genes.


2021 ◽  
Author(s):  
Noah Dukler ◽  
Mehreen R Mughal ◽  
Ritika Ramani ◽  
Yi-Fei Huang ◽  
Adam Siepel

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Jingyao Ping ◽  
Jing Hao ◽  
Jinye Li ◽  
Yiqing Yang ◽  
Yingjuan Su ◽  
...  

2019 ◽  
Author(s):  
Guillaume Dumas ◽  
Simon Malesys ◽  
Thomas Bourgeron

AbstractThe human brain differs from that of other primates, but the genetic basis of these differences remains unclear. We investigated the evolutionary pressures acting on almost all human protein-coding genes (N=11,667; 1:1 orthologs in primates) on the basis of their divergence from those of early hominins, such as Neanderthals, and non-human primates. We confirm that genes encoding brain-related proteins are among the most strongly conserved protein-coding genes in the human genome. Combining our evolutionary pressure metrics for the protein-coding genome with recent datasets, we found that this conservation applied to genes functionally associated with the synapse and expressed in brain structures such as the prefrontal cortex and the cerebellum. Conversely, several of the protein-coding genes that diverge most in hominins relative to other primates are associated with brain-associated diseases, such as micro/macrocephaly, dyslexia, and autism. We also showed that cerebellum granule neurons express a set of divergent protein-coding genes that may have contributed to the emergence of fine motor skills and social cognition in humans. This resource is available from http://neanderthal.pasteur.fr and can be used to estimate evolutionary constraints acting on a set of genes and to explore their relative contributions to human traits.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Ryosuke Kakehashi ◽  
Atsushi Kurabayashi

There are two distinct lungless groups in caudate amphibians (salamanders and newts) (the family Plethodontidae and the genus Onychodactylus, from the family Hynobiidae). Lunglessness is considered to have evolved in response to environmental and/or ecological adaptation with respect to oxygen requirements. We performed selection analyses on lungless salamanders to elucidate the selective patterns of mitochondrial protein-coding genes associated with lunglessness. The branch model and RELAX analyses revealed the occurrence of relaxed selection (an increase of the dN/dS ratio = ω value) in most mitochondrial protein-coding genes of plethodontid salamander branches but not in those of Onychodactylus. Additional branch model and RELAX analyses indicated that direct-developing plethodontids showed the relaxed pattern for most mitochondrial genes, although metamorphosing plethodontids had fewer relaxed genes. Furthermore, aBSREL analysis detected positively selected codons in three plethodontid branches but not in Onychodactylus. One of these three branches corresponded to the most recent common ancestor, and the others corresponded with the most recent common ancestors of direct-developing branches within Hemidactyliinae. The positive selection of mitochondrial protein-coding genes in Plethodontidae is probably associated with the evolution of direct development.


Author(s):  
Nina Moravčíková ◽  
Radovan Kasarda ◽  
Ondrej Kadlečík ◽  
Anna Trakovická ◽  
Marko Halo ◽  
...  

The aim of this study was to analyse the genome-wide distribution of runs of homozygosity (ROH) segments in the genome of Norik of Muran horse and to identify the regions under strong selection pressure. Overall, 25 animals genotyped by the GGP Equine70k chip were included in the study. After SNP pruning, 54479 SNPs (75.72%) covering 2.25 Gb of the autosomal genome were retained for scan of ROH segments distribution. The ROHs were present in the genome of all animals and covered in average 13.17% (295.29 Mb) of autosomal genome expressed by the SNP loci. The highest number of ROHs was identified on autosome 1 (404), while the lowest proportion of autosome residing in ROH showed ECA31 (38). The footprints of selection, characterized by SNPs with extreme frequency in ROHs across specific genomic regions, were defined by the top 0.01 percentile of signals. Overall, nine genomic regions located on seven autosomes (3, 6, 9, 11, 15, 23) were identified. The strongest signal of selection showed three autosomes ECA3, ECA9 and ECA11. The protein-coding genes located within these regions suggested that the identified footprints of selection are most likely consequences of intensive breeding for traits of interest during the grading-up process of the Norik of Muran horse.


2020 ◽  
Author(s):  
Xiaoting Yao ◽  
Ming Pang ◽  
Tianxing Wang ◽  
Xi Chen ◽  
Xidian Tang ◽  
...  

Abstract Parapoxvirus (PPV) has been identified in most mammals and poses a great threat to both the livestock production and public health. However, it is still not fully understood the viral prevalence and evolution of PPV coding sequences. Here, we performed a comparative approach integrating viral genetics, molecular selection pressure and genomic structure to investigate the genomic features and evolution of PPVs. We noticed that although there were significant differences of GC contents between ORFV and other three species of PPVs, all PPVs showed almost identical nucleotide bias, that is GC richness. This reflected a common mechanism which determines GC compositions for virus with similar life cycles. The structural analysis of PPV genomes showed the divergence of different PPV species, which may due to the specific adaptation to their natural hosts. Additionally, we estimated the phylogenetic diversity of segmented genome of PPV. Our results suggested that during the 2010 – 2018 outbreak, the orf virus has been the dominant species under the selective pressure of the optimal gene patterns. Furthermore, we found the mean substitution rates were between 3.56×10-5 to 4.21×10-4 in different PPV segments, and the PPV VIR gene was evolved at the highest substitution rate. In these protein-coding regions, purifying selection was the major evolutionary pressure, while the GIF and VIR genes suffered the greatest positive selection pressure. These results may provide useful knowledge on the virus genetic evolution from a new perspective which could help create prevention and control strategies.


Sign in / Sign up

Export Citation Format

Share Document