scholarly journals gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

Database ◽  
2016 ◽  
Vol 2016 ◽  
pp. baw087 ◽  
Author(s):  
So Nakagawa ◽  
Mahoko Ueda Takahashi
2021 ◽  
Vol 10 (28) ◽  
Author(s):  
Ryosuke Nakai ◽  
Hiroyuki Kusada ◽  
Fumihiro Sassa ◽  
Susumu Morigasaki ◽  
Hisayoshi Hayashi ◽  
...  

We report the draft genome sequence of a novel Rhodospirillales bacterium strain, TMPK1, isolated from a micropore-filtered soil suspension. This strain has a genome of 4,249,070 bp, comprising 4,151 protein-coding sequences. The genome sequence data further suggest that strain TMPK1 is an alphaproteobacterium capable of carotenoid production.


2019 ◽  
Vol 8 (7) ◽  
Author(s):  
Juan J. Marizcurrena ◽  
Danilo Morales ◽  
Pablo Smircich ◽  
Susana Castro-Sowinski

We report the draft genome sequence of the Antarctic UV-resistant bacterium Sphingomonas sp. strain UV9. The strain has a genome size of 4.25 Mb, a 65.62% GC content, and 3,879 protein-coding sequences.


2019 ◽  
Vol 8 (27) ◽  
Author(s):  
Ji Young Jung ◽  
Jin-Woo Jeong ◽  
Seung-Young Lee ◽  
Hyun Mi Jin ◽  
Hee Won Choi ◽  
...  

ABSTRACT Leuconostoc kimchii strain NKJ218 was isolated from homemade kimchi in South Korea. The whole genome was sequenced using the PacBio RS II and Illumina NovoSeq 6000 platforms. Here, we report a genome sequence of strain NKJ218, which consists of a 1.9-Mbp chromosome and three plasmid contigs. A total of 2,005 coding sequences (CDS) were predicted, including 1,881 protein-coding sequences.


2011 ◽  
Vol 21 (11) ◽  
pp. 1916-1928 ◽  
Author(s):  
M. F. Lin ◽  
P. Kheradpour ◽  
S. Washietl ◽  
B. J. Parker ◽  
J. S. Pedersen ◽  
...  

Author(s):  
Yulia M. Suvorova ◽  
Eugene V. Korotkov

AbstractTriplet periodicity (TP) is a distinctive feature of the protein coding sequences of both prokaryotic and eukaryotic genomes. In this work, we explored the TP difference inside and between 45 prokaryotic genomes. We constructed two hypotheses of TP distribution on a set of coding sequences and generated artificial datasets that correspond to the hypotheses. We found that TP is more similar inside a genome than between genomes and that TP distribution inside a real genome dataset corresponds to the hypothesis which implies that a common TP pattern exists for the majority of sequences inside a genome. Additionally, we performed gene classification based on TP matrixes. This classification showed that TP allows identification of the genome to which a given gene belongs with more than 85% accuracy.


GigaScience ◽  
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Nikolai Hecker ◽  
Michael Hiller

Abstract Background Multiple alignments of mammalian genomes have been the basis of many comparative genomic studies aiming at annotating genes, detecting regions under evolutionary constraint, and studying genome evolution. A key factor that affects the power of comparative analyses is the number of species included in a genome alignment. Results To utilize the increased number of sequenced genomes and to provide an accessible resource for genomic studies, we generated a mammalian genome alignment comprising 120 species. We used this alignment and the CESAR method to provide protein-coding gene annotations for 119 non-human mammals. Furthermore, we illustrate the utility of this alignment by 2 exemplary analyses. First, we quantified how variable ultraconserved elements (UCEs) are among placental mammals. Leveraging the high taxonomic coverage in our alignment, we estimate that UCEs contain on average 4.7%–15.6% variable alignment columns. Furthermore, we show that the center regions of UCEs are generally most constrained. Second, we identified enhancer sequences that are only conserved in placental mammals. We found that these enhancers are significantly associated with placenta-related genes, suggesting that some of these enhancers may be involved in the evolution of placental mammal-specific aspects of the placenta. Conclusion The 120-mammal alignment and all other data are available for analysis and visualization in a genome browser at https://genome-public.pks.mpg.de/and for download at https://bds.mpi-cbg.de/hillerlab/120MammalAlignment/.


2006 ◽  
Vol 72 (5) ◽  
pp. 3274-3283 ◽  
Author(s):  
Agn�s Cimerman ◽  
Guillaume Arnaud ◽  
Xavier Foissac

ABSTRACT Phytoplasmas are unculturable bacterial plant pathogens transmitted by phloem-feeding hemipteran insects. DNA of phytoplasmas is difficult to purify because of their exclusive phloem location and low abundance in plants. To overcome this constraint, suppression subtractive hybridization (SSH) was modified and used to selectively amplify DNA of the stolbur phytoplasma infecting a periwinkle plant. Plasmid libraries were constructed, and the origins of the DNA inserts were verified by hybridization and PCR screenings. After a single round of SSH, there was still a significant level of contamination with plant DNA (around 50%). However, the modified SSH, which included a second round of subtraction (double SSH), resulted in an increased phytoplasma DNA purity (97%). Results validated double SSH as an efficient way to produce a genome survey for microbial agents unavailable in culture. Assembly of 266 insert sequences revealed 181 phytoplasma genetic loci which were annotated. Comparative analysis of 113 kbp indicated that among 217 protein coding sequences, 83% were homologous to “Candidatus Phytoplasma asteris” (OY-M strain) genes, with hits widely distributed along the chromosome. Most of the stolbur-specific SSH sequences were orphan genes, with the exception of two partial coding sequences encoding proteins homologous to a mycoplasma surface protein and riboflavin kinase.


2017 ◽  
Vol 3 ◽  
pp. e118 ◽  
Author(s):  
Andrew E. Webb ◽  
Thomas A. Walsh ◽  
Mary J. O’Connell

Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome) from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.


2019 ◽  
Vol 8 (23) ◽  
Author(s):  
Si Chul Kim ◽  
Hyo Jung Lee

Here, we report the draft genome sequence of Pseudorhodobacter sp. strain E13, a Gram-negative, aerobic, nonflagellated, and rod-shaped bacterium which was isolated from the Yellow Sea in South Korea. The assembled genome sequence is 3,878,578 bp long with 3,646 protein-coding sequences in 159 contigs.


Sign in / Sign up

Export Citation Format

Share Document