scholarly journals Complete overview of protein-inactivating sequence variations in 36 sequenced mouse inbred strains

2017 ◽  
Vol 114 (34) ◽  
pp. 9158-9163 ◽  
Author(s):  
Steven Timmermans ◽  
Marc Van Montagu ◽  
Claude Libert

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.

2020 ◽  
Vol 21 (11) ◽  
pp. 1068-1077
Author(s):  
Xiaochao Sun ◽  
Bin Yang ◽  
Qunye Zhang

: Many studies have shown that the spatial distribution of genes within a single chromosome exhibits distinct patterns. However, little is known about the characteristics of inter-chromosomal distribution of genes (including protein-coding genes, processed transcripts and pseudogenes) in different genomes. In this study, we explored these issues using the available genomic data of both human and model organisms. Moreover, we also analyzed the distribution pattern of protein-coding genes that have been associated with 14 common diseases and the insert/deletion mutations and single nucleotide polymorphisms detected by whole genome sequencing in an acute promyelocyte leukemia patient. We obtained the following novel findings. Firstly, inter-chromosomal distribution of genes displays a nonstochastic pattern and the gene densities in different chromosomes are heterogeneous. This kind of heterogeneity is observed in genomes of both lower and higher species. Secondly, protein-coding genes involved in certain biological processes tend to be enriched in one or a few chromosomes. Our findings have added new insights into our understanding of the spatial distribution of genome and disease- related genes across chromosomes. These results could be useful in improving the efficiency of disease-associated gene screening studies by targeting specific chromosomes.


GigaScience ◽  
2019 ◽  
Vol 8 (10) ◽  
Author(s):  
Bo Song ◽  
Yue Song ◽  
Yuan Fu ◽  
Elizabeth Balyejusa Kizito ◽  
Sandra Ndagire Kamenya ◽  
...  

Abstract Background The African eggplant (Solanum aethiopicum) is a nutritious traditional vegetable used in many African countries, including Uganda and Nigeria. It is thought to have been domesticated in Africa from its wild relative, Solanum anguivi. S. aethiopicum has been routinely used as a source of disease resistance genes for several Solanaceae crops, including Solanum melongena. A lack of genomic resources has meant that breeding of S. aethiopicum has lagged behind other vegetable crops. Results We assembled a 1.02-Gb draft genome of S. aethiopicum, which contained predominantly repetitive sequences (78.9%). We annotated 37,681 gene models, including 34,906 protein-coding genes. Expansion of disease resistance genes was observed via 2 rounds of amplification of long terminal repeat retrotransposons, which may have occurred ∼1.25 and 3.5 million years ago, respectively. By resequencing 65 S. aethiopicum and S. anguivi genotypes, 18,614,838 single-nucleotide polymorphisms were identified, of which 34,171 were located within disease resistance genes. Analysis of domestication and demographic history revealed active selection for genes involved in drought tolerance in both “Gilo” and “Shum” groups. A pan-genome of S. aethiopicum was assembled, containing 51,351 protein-coding genes; 7,069 of these genes were missing from the reference genome. Conclusions The genome sequence of S. aethiopicum enhances our understanding of its biotic and abiotic resistance. The single-nucleotide polymorphisms identified are immediately available for use by breeders. The information provided here will accelerate selection and breeding of the African eggplant, as well as other crops within the Solanaceae family.


2021 ◽  
Vol 22 (4) ◽  
pp. 1876
Author(s):  
Frida Belinky ◽  
Ishan Ganguly ◽  
Eugenia Poliakov ◽  
Vyacheslav Yurchenko ◽  
Igor B. Rogozin

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.


2020 ◽  
Author(s):  
Zhong-Yin Zhou ◽  
Hang Liu ◽  
Yue-Dong Zhang ◽  
Yin-Qiao Wu ◽  
Min-Sheng Peng ◽  
...  

AbstractUnderstanding the mutational and evolutionary dynamics of SARS-CoV-2 is essential for treating COVID-19 and the development of a vaccine. Here, we analyzed publicly available 15,818 assembled SARS-CoV-2 genome sequences, along with 2,350 raw sequence datasets sampled worldwide. We investigated the distribution of inter-host single nucleotide polymorphisms (inter-host SNPs) and intra-host single nucleotide variations (iSNVs). Mutations have been observed at 35.6% (10,649/29,903) of the bases in the genome. The substitution rate in some protein coding regions is higher than the average in SARS-CoV-2 viruses, and the high substitution rate in some regions might be driven to escape immune recognition by diversifying selection. Both recurrent mutations and human-to-human transmission are mechanisms that generate fitness advantageous mutations. Furthermore, the frequency of three mutations (S protein, F400L; ORF3a protein, T164I; and ORF1a protein, Q6383H) has gradual increased over time on lineages, which provides new clues for the early detection of fitness advantageous mutations. Our study provides theoretical support for vaccine development and the optimization of treatment for COVID-19. We call researchers to submit raw sequence data to public databases.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2019 ◽  
Vol 8 (35) ◽  
Author(s):  
A. R. Stahlke ◽  
A. Z. Ozsoy ◽  
D. W. Bean ◽  
P. A. Hohenlohe

We announce the complete circularized mitochondrial genome assemblies of Diorhabda carinata and Diorhabda carinulata, beetle species introduced to North America for the biological control of invasive shrubs of the genus Tamarix L. (Tamaricaceae). The assemblies (16,232 and 16,298 bp, respectively) each comprise 13 protein-coding genes, 22 tRNAs, two rRNAs, and a noncoding region.


eLife ◽  
2013 ◽  
Vol 2 ◽  
Author(s):  
Hume Stroud ◽  
Bo Ding ◽  
Stacey A Simon ◽  
Suhua Feng ◽  
Maria Bellizzi ◽  
...  

Most transgenic crops are produced through tissue culture. The impact of utilizing such methods on the plant epigenome is poorly understood. Here we generated whole-genome, single-nucleotide resolution maps of DNA methylation in several regenerated rice lines. We found that all tested regenerated plants had significant losses of methylation compared to non-regenerated plants. Loss of methylation was largely stable across generations, and certain sites in the genome were particularly susceptible to loss of methylation. Loss of methylation at promoters was associated with deregulated expression of protein-coding genes. Analyses of callus and untransformed plants regenerated from callus indicated that loss of methylation is stochastically induced at the tissue culture step. These changes in methylation may explain a component of somaclonal variation, a phenomenon in which plants derived from tissue culture manifest phenotypic variability.


2006 ◽  
Vol 188 (12) ◽  
pp. 4453-4463 ◽  
Author(s):  
Patrick S. G. Chain ◽  
Ping Hu ◽  
Stephanie A. Malfatti ◽  
Lyndsay Radnedge ◽  
Frank Larimer ◽  
...  

ABSTRACT Yersinia pestis, the causative agent of bubonic and pneumonic plagues, has undergone detailed study at the molecular level. To further investigate the genomic diversity among this group and to help characterize lineages of the plague organism that have no sequenced members, we present here the genomes of two isolates of the “classical” antiqua biovar, strains Antiqua and Nepal516. The genomes of Antiqua and Nepal516 are 4.7 Mb and 4.5 Mb and encode 4,138 and 3,956 open reading frames, respectively. Though both strains belong to one of the three classical biovars, they represent separate lineages defined by recent phylogenetic studies. We compare all five currently sequenced Y. pestis genomes and the corresponding features in Yersinia pseudotuberculosis. There are strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. We found 453 single nucleotide polymorphisms in protein-coding regions, which were used to assess the evolutionary relationships of these Y. pestis strains. Gene reduction analysis revealed that the gene deletion processes are under selective pressure, and many of the inactivations are probably related to the organism's interaction with its host environment. The results presented here clearly demonstrate the differences between the two biovar antiqua lineages and support the notion that grouping Y. pestis strains based strictly on the classical definition of biovars (predicated upon two biochemical assays) does not accurately reflect the phylogenetic relationships within this species. A comparison of four virulent Y. pestis strains with the human-avirulent strain 91001 provides further insight into the genetic basis of virulence to humans.


2016 ◽  
Vol 4 (2) ◽  
Author(s):  
Lesibana A. Malinga ◽  
Thomas Abeel ◽  
Christopher A. Desjardins ◽  
Talent C. Dlamini ◽  
Gail Cassell ◽  
...  

We report the whole-genome sequencing of two extensively drug-resistant tuberculosis strains belonging to the Euro-American S lineage. The RSA 114 strain showed single-nucleotide polymorphisms predicted to have drug efflux activity.


Sign in / Sign up

Export Citation Format

Share Document