gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

We report the draft genome sequence of a novel Rhodospirillales bacterium strain, TMPK1, isolated from a micropore-filtered soil suspension. This strain has a genome of 4,249,070 bp, comprising 4,151 protein-coding sequences. The genome sequence data further suggest that strain TMPK1 is an alphaproteobacterium capable of carotenoid production.

Download Full-text

Draft Genome Sequence of the UV-Resistant Antarctic Bacterium Sphingomonas sp. Strain UV9

Microbiology Resource Announcements ◽

10.1128/mra.01651-18 ◽

2019 ◽

Vol 8 (7) ◽

Cited By ~ 3

Author(s):

Juan J. Marizcurrena ◽

Danilo Morales ◽

Pablo Smircich ◽

Susana Castro-Sowinski

Keyword(s):

Genome Size ◽

Genome Sequence ◽

Draft Genome ◽

Gc Content ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences ◽

Antarctic Bacterium ◽

A Genome ◽

The Antarctic

We report the draft genome sequence of the Antarctic UV-resistant bacterium Sphingomonas sp. strain UV9. The strain has a genome size of 4.25 Mb, a 65.62% GC content, and 3,879 protein-coding sequences.

Download Full-text

Complete Genome Sequence of Leuconostoc kimchii Strain NKJ218, Isolated from Homemade Kimchi

Microbiology Resource Announcements ◽

10.1128/mra.00367-19 ◽

2019 ◽

Vol 8 (27) ◽

Author(s):

Ji Young Jung ◽

Jin-Woo Jeong ◽

Seung-Young Lee ◽

Hyun Mi Jin ◽

Hee Won Choi ◽

...

Keyword(s):

South Korea ◽

Genome Sequence ◽

Complete Genome Sequence ◽

Complete Genome ◽

Whole Genome ◽

Protein Coding ◽

Coding Sequences ◽

Content Type ◽

Pacbio Rs Ii ◽

A Genome

ABSTRACT Leuconostoc kimchii strain NKJ218 was isolated from homemade kimchi in South Korea. The whole genome was sequenced using the PacBio RS II and Illumina NovoSeq 6000 platforms. Here, we report a genome sequence of strain NKJ218, which consists of a 1.9-Mbp chromosome and three plasmid contigs. A total of 2,005 coding sequences (CDS) were predicted, including 1,881 protein-coding sequences.

Download Full-text

Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

Genome Research ◽

10.1101/gr.108753.110 ◽

2011 ◽

Vol 21 (11) ◽

pp. 1916-1928 ◽

Cited By ~ 60

Author(s):

M. F. Lin ◽

P. Kheradpour ◽

S. Washietl ◽

B. J. Parker ◽

J. S. Pedersen ◽

...

Keyword(s):

Protein Coding ◽

Coding Sequences ◽

Mammalian Genomes ◽

Selection For

Download Full-text

Study of triplet periodicity differences inside and between genomes

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2013-0063 ◽

2015 ◽

Vol 14 (2) ◽

Cited By ~ 2

Author(s):

Yulia M. Suvorova ◽

Eugene V. Korotkov

Keyword(s):

Distinctive Feature ◽

Protein Coding ◽

Coding Sequences ◽

A Genome ◽

Prokaryotic Genomes ◽

Triplet Periodicity ◽

Eukaryotic Genomes ◽

Artificial Datasets ◽

Genome Dataset

AbstractTriplet periodicity (TP) is a distinctive feature of the protein coding sequences of both prokaryotic and eukaryotic genomes. In this work, we explored the TP difference inside and between 45 prokaryotic genomes. We constructed two hypotheses of TP distribution on a set of coding sequences and generated artificial datasets that correspond to the hypotheses. We found that TP is more similar inside a genome than between genomes and that TP distribution inside a real genome dataset corresponds to the hypothesis which implies that a common TP pattern exists for the majority of sequences inside a genome. Additionally, we performed gene classification based on TP matrixes. This classification showed that TP allows identification of the genome to which a given gene belongs with more than 85% accuracy.

Download Full-text

A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers

GigaScience ◽

10.1093/gigascience/giz159 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Nikolai Hecker ◽

Michael Hiller

Keyword(s):

Comparative Genomic ◽

Evolutionary Constraint ◽

Genome Alignment ◽

Protein Coding ◽

Multiple Alignments ◽

A Genome ◽

Key Factor ◽

Genomic Studies ◽

Mammalian Genomes ◽

Enhancer Sequences

Abstract Background Multiple alignments of mammalian genomes have been the basis of many comparative genomic studies aiming at annotating genes, detecting regions under evolutionary constraint, and studying genome evolution. A key factor that affects the power of comparative analyses is the number of species included in a genome alignment. Results To utilize the increased number of sequenced genomes and to provide an accessible resource for genomic studies, we generated a mammalian genome alignment comprising 120 species. We used this alignment and the CESAR method to provide protein-coding gene annotations for 119 non-human mammals. Furthermore, we illustrate the utility of this alignment by 2 exemplary analyses. First, we quantified how variable ultraconserved elements (UCEs) are among placental mammals. Leveraging the high taxonomic coverage in our alignment, we estimate that UCEs contain on average 4.7%–15.6% variable alignment columns. Furthermore, we show that the center regions of UCEs are generally most constrained. Second, we identified enhancer sequences that are only conserved in placental mammals. We found that these enhancers are significantly associated with placenta-related genes, suggesting that some of these enhancers may be involved in the evolution of placental mammal-specific aspects of the placenta. Conclusion The 120-mammal alignment and all other data are available for analysis and visualization in a genome browser at https://genome-public.pks.mpg.de/and for download at https://bds.mpi-cbg.de/hillerlab/120MammalAlignment/.

Download Full-text

Stolbur Phytoplasma Genome Survey Achieved Using a Suppression Subtractive Hybridization Approach with High Specificity

Applied and Environmental Microbiology ◽

10.1128/aem.72.5.3274-3283.2006 ◽

2006 ◽

Vol 72 (5) ◽

pp. 3274-3283 ◽

Cited By ~ 21

Author(s):

Agnï¿½s Cimerman ◽

Guillaume Arnaud ◽

Xavier Foissac

Keyword(s):

Suppression Subtractive Hybridization ◽

Plant Pathogens ◽

Surface Protein ◽

High Specificity ◽

Subtractive Hybridization ◽

Protein Coding ◽

Coding Sequences ◽

Genome Survey ◽

A Genome ◽

Stolbur Phytoplasma

ABSTRACT Phytoplasmas are unculturable bacterial plant pathogens transmitted by phloem-feeding hemipteran insects. DNA of phytoplasmas is difficult to purify because of their exclusive phloem location and low abundance in plants. To overcome this constraint, suppression subtractive hybridization (SSH) was modified and used to selectively amplify DNA of the stolbur phytoplasma infecting a periwinkle plant. Plasmid libraries were constructed, and the origins of the DNA inserts were verified by hybridization and PCR screenings. After a single round of SSH, there was still a significant level of contamination with plant DNA (around 50%). However, the modified SSH, which included a second round of subtraction (double SSH), resulted in an increased phytoplasma DNA purity (97%). Results validated double SSH as an efficient way to produce a genome survey for microbial agents unavailable in culture. Assembly of 266 insert sequences revealed 181 phytoplasma genetic loci which were annotated. Comparative analysis of 113 kbp indicated that among 217 protein coding sequences, 83% were homologous to “Candidatus Phytoplasma asteris” (OY-M strain) genes, with hits widely distributed along the chromosome. Most of the stolbur-specific SSH sequences were orphan genes, with the exception of two partial coding sequences encoding proteins homologous to a mycoplasma surface protein and riboflavin kinase.

Download Full-text

VESPA: Very large-scale Evolutionary and Selective Pressure Analyses

PeerJ Computer Science ◽

10.7717/peerj-cs.118 ◽

2017 ◽

Vol 3 ◽

pp. e118 ◽

Cited By ~ 10

Author(s):

Andrew E. Webb ◽

Thomas A. Walsh ◽

Mary J. O’Connell

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Selective Pressure ◽

Gene Families ◽

Pressure Variation ◽

Phylogeny Reconstruction ◽

Protein Coding ◽

Coding Sequences ◽

A Genome ◽

Pressure Analysis

Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome) from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.

Download Full-text

Faculty Opinions recommendation of Role of low-complexity sequences in the formation of novel protein coding sequences.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718030532.793494763 ◽

2014 ◽

Author(s):

Erich Bornberg-Bauer ◽

Magdalena Heberlein

Keyword(s):

Low Complexity ◽

Protein Coding ◽

Coding Sequences ◽

Novel Protein

Download Full-text

Draft Genome Sequence of Urease-Producing Pseudorhodobacter sp. Strain E13, Isolated from the Yellow Sea in Gunsan, South Korea

Microbiology Resource Announcements ◽

10.1128/mra.00189-19 ◽

2019 ◽

Vol 8 (23) ◽

Author(s):

Si Chul Kim ◽

Hyo Jung Lee

Keyword(s):

South Korea ◽

Genome Sequence ◽

Yellow Sea ◽

Draft Genome ◽

The Yellow Sea ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences ◽

Gram Negative ◽

Content Type

Here, we report the draft genome sequence of Pseudorhodobacter sp. strain E13, a Gram-negative, aerobic, nonflagellated, and rod-shaped bacterium which was isolated from the Yellow Sea in South Korea. The assembled genome sequence is 3,878,578 bp long with 3,646 protein-coding sequences in 159 contigs.

Download Full-text