Complete overview of protein-inactivating sequence variations in 36 sequenced mouse inbred strains

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.

Download Full-text

Analysis of Inter-Chromosomal Distribution of Disease-Related Genes in Human Genome

Current Protein and Peptide Science ◽

10.2174/1389203721666200426233158 ◽

2020 ◽

Vol 21 (11) ◽

pp. 1068-1077

Author(s):

Xiaochao Sun ◽

Bin Yang ◽

Qunye Zhang

Keyword(s):

Spatial Distribution ◽

Model Organisms ◽

Nucleotide Polymorphisms ◽

Chromosomal Distribution ◽

Single Nucleotide ◽

Protein Coding ◽

Single Chromosome ◽

Deletion Mutations ◽

Protein Coding Genes ◽

Disease Related Genes

: Many studies have shown that the spatial distribution of genes within a single chromosome exhibits distinct patterns. However, little is known about the characteristics of inter-chromosomal distribution of genes (including protein-coding genes, processed transcripts and pseudogenes) in different genomes. In this study, we explored these issues using the available genomic data of both human and model organisms. Moreover, we also analyzed the distribution pattern of protein-coding genes that have been associated with 14 common diseases and the insert/deletion mutations and single nucleotide polymorphisms detected by whole genome sequencing in an acute promyelocyte leukemia patient. We obtained the following novel findings. Firstly, inter-chromosomal distribution of genes displays a nonstochastic pattern and the gene densities in different chromosomes are heterogeneous. This kind of heterogeneity is observed in genomes of both lower and higher species. Secondly, protein-coding genes involved in certain biological processes tend to be enriched in one or a few chromosomes. Our findings have added new insights into our understanding of the spatial distribution of genome and disease- related genes across chromosomes. These results could be useful in improving the efficiency of disease-associated gene screening studies by targeting specific chromosomes.

Download Full-text

Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome

GigaScience ◽

10.1093/gigascience/giz115 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 3

Author(s):

Bo Song ◽

Yue Song ◽

Yuan Fu ◽

Elizabeth Balyejusa Kizito ◽

Sandra Ndagire Kamenya ◽

...

Keyword(s):

Disease Resistance ◽

Single Nucleotide Polymorphisms ◽

Drought Tolerance ◽

Resistance Genes ◽

Draft Genome ◽

Disease Resistance Genes ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Protein Coding Genes

Abstract Background The African eggplant (Solanum aethiopicum) is a nutritious traditional vegetable used in many African countries, including Uganda and Nigeria. It is thought to have been domesticated in Africa from its wild relative, Solanum anguivi. S. aethiopicum has been routinely used as a source of disease resistance genes for several Solanaceae crops, including Solanum melongena. A lack of genomic resources has meant that breeding of S. aethiopicum has lagged behind other vegetable crops. Results We assembled a 1.02-Gb draft genome of S. aethiopicum, which contained predominantly repetitive sequences (78.9%). We annotated 37,681 gene models, including 34,906 protein-coding genes. Expansion of disease resistance genes was observed via 2 rounds of amplification of long terminal repeat retrotransposons, which may have occurred ∼1.25 and 3.5 million years ago, respectively. By resequencing 65 S. aethiopicum and S. anguivi genotypes, 18,614,838 single-nucleotide polymorphisms were identified, of which 34,171 were located within disease resistance genes. Analysis of domestication and demographic history revealed active selection for genes involved in drought tolerance in both “Gilo” and “Shum” groups. A pan-genome of S. aethiopicum was assembled, containing 51,351 protein-coding genes; 7,069 of these genes were missing from the reference genome. Conclusions The genome sequence of S. aethiopicum enhances our understanding of its biotic and abiotic resistance. The single-nucleotide polymorphisms identified are immediately available for use by breeders. The information provided here will accelerate selection and breeding of the African eggplant, as well as other crops within the Solanaceae family.

Download Full-text

Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

International Journal of Molecular Sciences ◽

10.3390/ijms22041876 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1876

Author(s):

Frida Belinky ◽

Ishan Ganguly ◽

Eugenia Poliakov ◽

Vyacheslav Yurchenko ◽

Igor B. Rogozin

Keyword(s):

Stop Codon ◽

Purifying Selection ◽

Protein Product ◽

Intermediate Step ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Synonymous Sites ◽

Prokaryotic Protein ◽

Sense Codon

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.

Download Full-text

Worldwide tracing of mutations and the evolutionary dynamics of SARS-CoV-2

10.1101/2020.08.07.242263 ◽

2020 ◽

Author(s):

Zhong-Yin Zhou ◽

Hang Liu ◽

Yue-Dong Zhang ◽

Yin-Qiao Wu ◽

Min-Sheng Peng ◽

...

Keyword(s):

Substitution Rate ◽

Evolutionary Dynamics ◽

Vaccine Development ◽

Sequence Data ◽

Immune Recognition ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Theoretical Support ◽

Recurrent Mutations

AbstractUnderstanding the mutational and evolutionary dynamics of SARS-CoV-2 is essential for treating COVID-19 and the development of a vaccine. Here, we analyzed publicly available 15,818 assembled SARS-CoV-2 genome sequences, along with 2,350 raw sequence datasets sampled worldwide. We investigated the distribution of inter-host single nucleotide polymorphisms (inter-host SNPs) and intra-host single nucleotide variations (iSNVs). Mutations have been observed at 35.6% (10,649/29,903) of the bases in the genome. The substitution rate in some protein coding regions is higher than the average in SARS-CoV-2 viruses, and the high substitution rate in some regions might be driven to escape immune recognition by diversifying selection. Both recurrent mutations and human-to-human transmission are mechanisms that generate fitness advantageous mutations. Furthermore, the frequency of three mutations (S protein, F400L; ORF3a protein, T164I; and ORF1a protein, Q6383H) has gradual increased over time on lineages, which provides new clues for the early detection of fitness advantageous mutations. Our study provides theoretical support for vaccine development and the optimization of treatment for COVID-19. We call researchers to submit raw sequence data to public databases.

Download Full-text

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Scientific Reports ◽

10.1038/s41598-019-49802-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Chao-Hsin Chen ◽

Chao-Yu Pan ◽

Wen-chang Lin

Keyword(s):

Human Genome ◽

Expression Profiles ◽

Tissue Expression ◽

Human Protein ◽

Clear Understanding ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Overlapping Gene

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Download Full-text

Multiple single-nucleotide polymorphisms in the methylenetetrahydrofolate reductase and its truncated pseudogene of 23 inbred strains of mice

Biochemical and Biophysical Research Communications ◽

10.1016/j.bbrc.2003.10.139 ◽

2003 ◽

Vol 312 (2) ◽

pp. 480-486

Author(s):

Chiaki Takeya ◽

Mariko Esumi ◽

Toshihiko Shiroishi ◽

Ryohei Hishida ◽

Tatsuo Yamamoto

Keyword(s):

Single Nucleotide Polymorphisms ◽

Methylenetetrahydrofolate Reductase ◽

Inbred Strains ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Inbred Strains Of Mice

Download Full-text

Mitochondrial Genome Sequences of Diorhabda carinata and Diorhabda carinulata, Two Beetle Species Introduced to North America for Biological Control

Microbiology Resource Announcements ◽

10.1128/mra.00690-19 ◽

2019 ◽

Vol 8 (35) ◽

Cited By ~ 1

Author(s):

A. R. Stahlke ◽

A. Z. Ozsoy ◽

D. W. Bean ◽

P. A. Hohenlohe

Keyword(s):

Biological Control ◽

North America ◽

Mitochondrial Genome ◽

Noncoding Region ◽

Beetle Species ◽

Genome Sequences ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Genome Assemblies

We announce the complete circularized mitochondrial genome assemblies of Diorhabda carinata and Diorhabda carinulata, beetle species introduced to North America for the biological control of invasive shrubs of the genus Tamarix L. (Tamaricaceae). The assemblies (16,232 and 16,298 bp, respectively) each comprise 13 protein-coding genes, 22 tRNAs, two rRNAs, and a noncoding region.

Download Full-text

Plants regenerated from tissue culture contain stable epigenome changes in rice

eLife ◽

10.7554/elife.00354 ◽

2013 ◽

Vol 2 ◽

Cited By ~ 131

Author(s):

Hume Stroud ◽

Bo Ding ◽

Stacey A Simon ◽

Suhua Feng ◽

Maria Bellizzi ◽

...

Keyword(s):

Tissue Culture ◽

Phenotypic Variability ◽

Whole Genome ◽

Single Nucleotide ◽

Protein Coding ◽

Protein Coding Genes ◽

Regenerated Plants ◽

Nucleotide Resolution ◽

The Impact ◽

Single Nucleotide Resolution

Most transgenic crops are produced through tissue culture. The impact of utilizing such methods on the plant epigenome is poorly understood. Here we generated whole-genome, single-nucleotide resolution maps of DNA methylation in several regenerated rice lines. We found that all tested regenerated plants had significant losses of methylation compared to non-regenerated plants. Loss of methylation was largely stable across generations, and certain sites in the genome were particularly susceptible to loss of methylation. Loss of methylation at promoters was associated with deregulated expression of protein-coding genes. Analyses of callus and untransformed plants regenerated from callus indicated that loss of methylation is stochastically induced at the tissue culture step. These changes in methylation may explain a component of somaclonal variation, a phenomenon in which plants derived from tissue culture manifest phenotypic variability.

Download Full-text

Complete Genome Sequence of Yersinia pestis Strains Antiqua and Nepal516: Evidence of Gene Reduction in an Emerging Pathogen

Journal of Bacteriology ◽

10.1128/jb.00124-06 ◽

2006 ◽

Vol 188 (12) ◽

pp. 4453-4463 ◽

Cited By ~ 114

Author(s):

Patrick S. G. Chain ◽

Ping Hu ◽

Stephanie A. Malfatti ◽

Lyndsay Radnedge ◽

Frank Larimer ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Yersinia Pestis ◽

Yersinia Pseudotuberculosis ◽

Open Reading Frames ◽

Genomic Diversity ◽

Avirulent Strain ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Definition Of

ABSTRACT Yersinia pestis, the causative agent of bubonic and pneumonic plagues, has undergone detailed study at the molecular level. To further investigate the genomic diversity among this group and to help characterize lineages of the plague organism that have no sequenced members, we present here the genomes of two isolates of the “classical” antiqua biovar, strains Antiqua and Nepal516. The genomes of Antiqua and Nepal516 are 4.7 Mb and 4.5 Mb and encode 4,138 and 3,956 open reading frames, respectively. Though both strains belong to one of the three classical biovars, they represent separate lineages defined by recent phylogenetic studies. We compare all five currently sequenced Y. pestis genomes and the corresponding features in Yersinia pseudotuberculosis. There are strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. We found 453 single nucleotide polymorphisms in protein-coding regions, which were used to assess the evolutionary relationships of these Y. pestis strains. Gene reduction analysis revealed that the gene deletion processes are under selective pressure, and many of the inactivations are probably related to the organism's interaction with its host environment. The results presented here clearly demonstrate the differences between the two biovar antiqua lineages and support the notion that grouping Y. pestis strains based strictly on the classical definition of biovars (predicated upon two biochemical assays) does not accurately reflect the phylogenetic relationships within this species. A comparison of four virulent Y. pestis strains with the human-avirulent strain 91001 provides further insight into the genetic basis of virulence to humans.

Download Full-text

Draft Genome Sequences of Two Extensively Drug-Resistant Strains ofMycobacterium tuberculosisBelonging to the Euro-American S Lineage

Genome Announcements ◽

10.1128/genomea.01771-15 ◽

2016 ◽

Vol 4 (2) ◽

Cited By ~ 2

Author(s):

Lesibana A. Malinga ◽

Thomas Abeel ◽

Christopher A. Desjardins ◽

Talent C. Dlamini ◽

Gail Cassell ◽

...

Keyword(s):

Draft Genome ◽

Drug Efflux ◽

Drug Resistant ◽

Nucleotide Polymorphisms ◽

Genome Sequences ◽

Single Nucleotide ◽

Resistant Tuberculosis ◽

Extensively Drug Resistant ◽

Resistant Strains ◽

Drug Resistant Strains

We report the whole-genome sequencing of two extensively drug-resistant tuberculosis strains belonging to the Euro-American S lineage. The RSA 114 strain showed single-nucleotide polymorphisms predicted to have drug efflux activity.

Download Full-text