ncbi refseq
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 22)

H-INDEX

7
(FIVE YEARS 5)

2021 ◽  
Vol 17 (11) ◽  
pp. e1009581
Author(s):  
Michael S. Robeson ◽  
Devon R. O’Rourke ◽  
Benjamin D. Kaehler ◽  
Michal Ziemski ◽  
Matthew R. Dillon ◽  
...  

Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.


2021 ◽  
Author(s):  
Guangcai Liang ◽  
Jia Chang ◽  
Tung On Yau ◽  
Xin Li ◽  
Bingjun He ◽  
...  

In the present study, we performed precise annotation of Drosophila melanogaster, D. simulans, D. grimshawi, Bactrocera oleae mitochondrial (mt) genomes by pan RNA-seq analysis. Our new annotations corrected or modified some of the previous annotations and two important findings were reported for the first time, including the discovery of the conserved polyA(+) and polyA(-) motifs in the control regions (CRs) of insect mt genomes and the adding of CCAs to the 3' ends of two antisense tRNAs in D. melanogaster mt genome. Using PacBio cDNA-seq data from D. simulans, we precisely annotated the Transcription Initiation Sites (TISs) of the mt Heavy and Light strands in Drosophila mt genomes and reported that the polyA(+) and polyA(-) motifs in the CRs are associated with TISs. The discovery of the conserved polyA(+) and polyA(-) motifs provides insights into many polyA and polyT sequences in CRs of insect mt genomes, leading to reveal the mt transcription and its regulation in invertebrates. In addition, we provided a high-quality, well-curated and precisely annotated D. simulans mt genome (GenBank: MN611461), which should be included into the NCBI RefSeq database to replace the current reference genome NC_005781.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Samuel M. Gerner ◽  
Alexandra B. Graf ◽  
Thomas Rattei

Abstract Background Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically. Results We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes. Conclusions Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study. Availability Source code, documentation and install instructions are freely available at GitHub (https://github.com/gerners/tamock).


2021 ◽  
Author(s):  
Xia Zhou ◽  
Xiaolan Huang ◽  
Zhihua Du

Abstract−1 programmed ribosomal frameshifting (−1 PRF) is a translational recoding mechanism used by many viral and cellular mRNAs. −1 PRF occurs at a heptanucleotide slippery sequence and is stimulated by a downstream RNA structure, most often in the form of a pseudoknot. The utilization of −1 PRF to produce proteins encoded by the −1 reading frame is wide-spread in RNA viruses, but relatively rare in cellular mRNAs. In human, only three such cases of −1 PRF events have been reported, all involving retroviral-like genes and protein products. To evaluate the extent of −1 PRF utilization in the human transcriptome, we have developed a computational scheme for identifying putative pseudoknot-dependent −1 PRF events and applied the method to a collection of 43,191 human mRNAs in the NCBI RefSeq database. In addition to the three reported cases, our study identified more than two dozen putative −1 PRF cases. The genes involved in these cases are genuine cellular genes without a viral origin. Moreover, in more than half of these cases, the frameshift site locates far upstream (>250 nt) from the stop codon of the 0 reading frame, which is nonviral-like. Using dual luciferase assays in HEK293T cells, we confirmed that the −1 PRF signals in the mRNAs of CDK5R2 and SEMA6C are functional in inducing efficient frameshifting. Our findings have significant implications in expanding the repertoire of the −1 PRF phenomenon and the protein-coding capacity of the human transcriptome.


mSystems ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Dinesh Subedi ◽  
Jeremy J. Barr

ABSTRACT T-series phages have been model organisms for molecular biology since the 1940s. Given that these phages have been stocked, distributed, and propagated for decades across the globe, there exists the potential for genetic drift to accumulate between stocks over time. Here, we compared the temporal stability and genetic relatedness of laboratory-maintained phage stocks with a T-series collection from 1972. Only the T-even phages produced viable virions. We obtained complete genomes of these T-even phages, along with two contemporary T4 stocks. Performing comparative genomics, we found 12 and 16 nucleotide variations, respectively, in the genomes of T2 and T6, whereas there were ∼172 nucleotide variations between T4 sublines compared with the NCBI RefSeq genome. To account for the possibility of artifacts in NCBI RefSeq, we used the 1972 T4 stock as a reference and compared genetic and phenotypic variations between T4 sublines. Genomic analysis predicted nucleotide variations in genes associated with DNA metabolism and structural proteins. We did not, however, observe any differences in growth characteristics or host range between the T4 sublines. Our study highlights the potential for genetic drift between individually maintained T-series phage stocks, yet after 48 years, this has not resulted in phenotypic alterations in these important model organisms. IMPORTANCE T-series bacteriophages have been used throughout the world for various molecular biology researches, which were critical for establishing the fundamentals of molecular biology, from the structure of DNA to advanced gene-editing tools. These model bacteriophages help keep research data consistent and comparable between laboratories. However, we observed genetic variability when we compared contemporary sublines of T4 phages to a 48-year-old stock of T4. This may have effects on the comparability of results obtained using T4 phage. Here, we highlight the genomic differences between T4 sublines and examined phenotypic differences in phage replication parameters. We observed limited genomic changes but no phenotypic variations between T4 sublines. Our research highlights the possibility of genetic drift in model bacteriophages.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1046-D1057 ◽  
Author(s):  
Jairo Navarro Gonzalez ◽  
Ann S Zweig ◽  
Matthew L Speir ◽  
Daniel Schmelter ◽  
Kate R Rosenbloom ◽  
...  

Abstract For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.


2020 ◽  
Vol 49 (D1) ◽  
pp. D380-D388 ◽  
Author(s):  
Marie A Brunet ◽  
Jean-François Lucier ◽  
Maxime Levesque ◽  
Sébastien Leblanc ◽  
Jean-Francois Jacques ◽  
...  

Abstract OpenProt (www.openprot.org) is the first proteogenomic resource supporting a polycistronic annotation model for eukaryotic genomes. It provides a deeper annotation of open reading frames (ORFs) while mining experimental data for supporting evidence using cutting-edge algorithms. This update presents the major improvements since the initial release of OpenProt. All species support recent NCBI RefSeq and Ensembl annotations, with changes in annotations being reported in OpenProt. Using the 131 ribosome profiling datasets re-analysed by OpenProt to date, non-AUG initiation starts are reported alongside a confidence score of the initiating codon. From the 177 mass spectrometry datasets re-analysed by OpenProt to date, the unicity of the detected peptides is controlled at each implementation. Furthermore, to guide the users, detectability statistics and protein relationships (isoforms) are now reported for each protein. Finally, to foster access to deeper ORF annotation independently of one’s bioinformatics skills or computational resources, OpenProt now offers a data analysis platform. Users can submit their dataset for analysis and receive the results from the analysis by OpenProt. All data on OpenProt are freely available and downloadable for each species, the release-based format ensuring a continuous access to the data. Thus, OpenProt enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.


2020 ◽  
Author(s):  
Dinesh Subedi ◽  
Jeremy J. Barr

AbstractT-series phages have been model organisms for molecular biology since the 1940s. Given that these phages have been stocked, distributed, and propagated for decades across the globe, there exists the potential for genetic drift to accumulate between stocks over time. Here we compared the temporal stability and genetic relatedness of laboratory-maintained phage stocks with a T-series collection from 1972. Only the T-even phages produced viable virions. We obtained complete genomes of these T-even phages, along with two contemporary T4 stocks. Performing comparative genomics, we found 12 and 16 nucleotide variations, respectively, in the genomes of T2 and T6; whereas there were ~172 nucleotide variations between T4-sublines when compared with NCBI RefSeq genome. To account for the possibility of artefacts in the NCBI RefSeq, we used the 1972 T4 stock as a reference and compared genetic and phenotypic variations between T4-sublines. Genomic analysis predicted nucleotide variations in genes associated with DNA metabolism and structural proteins. We did not however, observe any differences in growth characteristics or host range between the T4-sublines. Our study highlights the potential for genetic drift between individually maintained T-series phage stocks, yet after 48-years, this has not resulted in phenotypic alternations in these important model organisms.ImportanceT-series bacteriophages have been used throughout the world for various molecular biology researches, which were critical for establishing the fundamentals of molecular biology – from the structure of DNA to advanced gene-editing tools. These model bacteriophages help keep research data consistent and comparable between laboratories. However, we observed genetic variability when compared contemporary sublines of T4-phages to a 48-year old stock of T4. This may have effects in the comparability of results obtained using T4 phage. Here, we highlighted the genomic differences between T4 sublines and examined phenotypic differences in phage replication parameters. We observed limited genomic changes but no phenotypic variations between T4 sublines. Our research highlights the possibility of genetic drift in model bacteriophages.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 758
Author(s):  
Taebum Lee ◽  
Hee Young Na ◽  
Sun-ju Byeon ◽  
Kyoung-Mee Kim ◽  
Hey Seung Lee ◽  
...  

Background: Fungal organisms are frequently observed in surgical pathological diagnosis. In order to more accurately identify fungi in formalin-fixed and paraffin-embedded (FFPE) tissues, it is necessary to use genomic information. The purpose of our pilot study is to identify the factors to be considered for the identification of pathogenic fungi using mycobiome analysis in FFPE tissues. Methods: We selected 49 cases in five hospitals. In each case, FFPE tissue was cut into 50 µm and DNA was extracted. Multiplex PCR with four primers (ITS1, ITS2, ITS3 and ITS4) was performed. Multiplex sequencing was performed using a MinION device according to the manufacturer’s protocol. Sequences of each case were searched using BLASTN with an ITS database from NCBI RefSeq Targeted Loci Project with default parameters. Results: A total of 2,526 DNA sequences were sequenced. We were able to identify 342 fungal sequences in 24 (49.0%, 24/49) cases. The median number of detected fungal sequences per case was 3 (1Q: 1 and 3Q: 14.25). Of the fungal DNA sequences, 215 (62.87%) contained the entire region of ITS1 or ITS2. The remaining 127 fungal DNA sequences were identified as fungi using a partial sequence of ITS1, ITS2, 5.8S, LSU or SSU. Conclusion: In conclusion, we have identified the possibility of finding pathogenic fungi through mycobiome analysis in fungal infected FFPE tissues using nanopore sequencing. However, we have also found several limitations to be solved for further studies. If we develop a method to characterize pathogenic fungi in FFPE tissues in a follow-up study, we think it will help patients to use appropriate antifungal agents.


2020 ◽  
Vol 31 (7-8) ◽  
pp. 240-251
Author(s):  
Saki Aoto ◽  
Mayu Fushimi ◽  
Kei Yura ◽  
Kohji Okamura

Abstract While CpG dinucleotides are significantly reduced compared to other dinucleotides in mammalian genomes, they can congregate and form CpG islands, which localize around the 5ʹ regions of genes, where they function as promoters. CpG-island promoters are generally unmethylated and are often found in housekeeping genes. However, their nucleotide sequences and existence per se are not conserved between humans and mice, which may be due to evolutionary gain and loss of the regulatory regions. In this study, human and rhesus monkey genomes, with moderately conserved sequences, were compared at base resolution. Using transcription start site data, we first validated our methods’ ability to identify orthologous promoters and indicated a limitation using the 5ʹ end of curated gene models, such as NCBI RefSeq, as their transcription start sites. We found that, in addition to deamination mutations, insertions and deletions of bases, repeats, and long fragments contributed to the mutations of CpG dinucleotides. We also observed that the G + C contents tended to change in CpG-poor environments, while CpG content was altered in G + C-rich environments. While loss of CpG islands can be caused by gradual decreases in CpG sites, gain of these islands appear to require two distinct nucleotide altering steps. Taken together, our findings provide novel insights into the process of acquisition and diversification of CpG-island promoters in vertebrates.


Sign in / Sign up

Export Citation Format

Share Document