sequence read archive
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 41)

H-INDEX

10
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Ruth E Timme ◽  
Emma Griffiths ◽  
Lee Katz ◽  
Duncan MacCannell ◽  
Michael Weigand

This is a SARS-CoV-2 specific protocol that covers the steps needed to submit SARS-CoV-2 consensus sequence to GenBank. If you need a pipeline for frequent or large volume submissions, follow Step 1 in the SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject to get your NCBI submission environment established, then contact [email protected] to set up an account for submitting through the API. This protocol assumes (and requires) that the user has a BioProject and BioSamples(s) already registered. Complete in order:: 1. Populate your templates first. 2. SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject Step-by-step instructions for establishing a new NCBI laboratory submission account and for creating and linking a new BioProject to an existing umbrella effort. SARS-CoV-2 raw data submission to SRA (Sequence Read Archive) and metadata to BioSample. Users can modify this protocol to just create a BioSample with no linked raw data. 3. SARS-CoV-2 NCBI consensus submission protocol: GenBank (included protocol) Required: established BioProject and BioSamples Submit SARS-CoV-2 assemblies to NCBI GenBank, linking to existing BioProject, BioSamples, and raw data. Version history: V3: Direct links provided to download metadata templates (instead of hosting duplicate files). minor edits throughout the protocol.


2021 ◽  
Author(s):  
Ruth E Timme ◽  
Emma Griffiths ◽  
Lee Katz ◽  
Michael Weigand

PURPOSE: This protocol explains the metadata requirements for the following two protocols: 1. SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject Step-by-step instructions for establishing a new NCBI laboratory submission account and for creating and linking a new BioProject to an existing umbrella effort. SARS-CoV-2 raw data submission to SRA (Sequence Read Archive) and metadata to BioSample. Users can modify this protocol to just create a BioSample with no linked raw data. 2. SARS-CoV-2 NCBI consensus submission protocol: GenBank Required: established BioProject and BioSamples Submit SARS-CoV-2 assemblies to NCBI GenBank, linking to existing BioProject, BioSamples, and raw data. Version history: V4: Updated metadata templates to reflect updated PHA4GE templates (V3) plus minor text edits.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kenneth S. Katz ◽  
Oleg Shutov ◽  
Richard Lapoint ◽  
Michael Kimelman ◽  
J. Rodney Brister ◽  
...  

AbstractSequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.


Author(s):  
Jean-Michel Hily ◽  
Véronique Komar ◽  
Nils Poulicard ◽  
Amandine Velt ◽  
Lauriane Renault ◽  
...  

AbstractSince its identification in 2003, grapevine Pinot gris virus (GPGV, Trichovirus) has now been detected in most grape-growing countries. So far, little is known about the epidemiology of this newly emerging virus. In this work, we used datamining as a tool to monitor in-silico the sanitary status of three vineyards in Italy. All data used in the study were recovered from a work that was already published and for which data were publicly available as SRA (Sequence Read Archive, NCBI) files. While incomplete, knowledge gathered from this work was still important, with evidence of differential accumulation of the virus in grapevine according to year, location, and variety-rootstock association. Additional data regarding GPGV genetic diversity were collected. Some advantages and pitfalls of datamining are discussed.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ali Tajabadi ◽  
Ali Esmailizadeh

Abstract Objectives Pistacia genus belongs to the flowering plants in the cashew family and contains at least 11 species. The whole-genome resequencing data of different species from Pistacia genus are described herein. The data reported here will be useful for better understand the adaptive evolution, demographic history, genetic diversity, population structure, and domestication of pistachio. Data description Genomic DNA was isolated from fresh leaves and used to construct libraries with insert size of 350 bp. Sequence libraries were made and sequenced on the Illumina Hiseq 4000 platform to produce 150 bp paired-end reads. A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data which have been deposited in the Genome Sequence Archive (GSA) database with the Accession of CRA000978. All of the data are also available as the sequence read archive (SRA) format in the National Center for Biotechnology Information (NCBI) with identifier of SRP189222, mirroring our deposited data in GSA.


2021 ◽  
Author(s):  
Nicolas Bejerman ◽  
Humberto Debat

Tymovirales is an order of viruses with positive-sense, single-stranded RNA genomes that mostly infect plants, but also fungi and insects. The number of tymovirid sequences has been growing in the last few years with the extensive use of high-throughput sequencing platforms. Here we report the discovery of 31 novel tymovirid genomes associated with 27 different host plant species, which were hidden in public databases. These viral sequences were identified through a homology searches in more than 3,000 plant transcriptomes from the NCBI Sequence Read Archive (SRA) using known tymovirids sequences as query. Identification, assembly and curation of raw SRA reads resulted in 29 viral genome sequences with full-length coding regions, and two partial genomes. Highlights of the obtained sequences include viruses with unique and novel genome organizations among known tymovirids. Phylogenetic analysis showed that six of the novel viruses were related to alphaflexiviruses, seventeen to betaflexiviruses, two to deltaflexiviruses and six to tymoviruses. These findings resulted in the most complete phylogeny of tymovirids to date and shed new light on the phylogenetic relationships and evolutionary landscape of this group of viruses. Furthermore, this study illustrates the complexity and diversity of tymovirids genomes and demonstrates that analyzing SRA public data provides an invaluable tool to accelerate virus discovery and refine virus taxonomy.


2021 ◽  
Author(s):  
Jesse D Bloom

The origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH's Sequence Read Archive. I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic. Instead, the progenitor of known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2's bat coronavirus relatives.


2021 ◽  
Author(s):  
Ruth E Timme ◽  
Emma Griffiths ◽  
Duncan MacCannell ◽  
Lee Katz ◽  
Michael Weigand

PURPOSE: This is a SARS-CoV-2 specific protocol that covers the steps needed to establish a new NCBI submission environment for your laboratory, including the creation of new BioProject(s) and submission groups. Once these are step up, the protocol then walks through the process for submitting raw reads to SRA and sample metadata to BioSample through the Submission portal. For new submitters, there's quite a bit of groundwork that needs to be established before a laboratory can start its first data submission. We recommend that one person in the laboratory take a few days to get everything set up in advance of when you expect to do your first data submission. If you need a pipeline for frequent or large volume submissions, follow Step 1 in the SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject to get your NCBI submission environment established, then contact [email protected] to set up an account for submitting through the API. These protocols cover submission using NCBI's Submission Portal web-interface. Complete in order:: 1. Populate your templates first. 2. SARS-CoV-2 NCBI submission protocol: SRA, BioSample, and BioProject (included protocol) Step-by-step instructions for establishing a new NCBI laboratory submission account and for creating and linking a new BioProject to an existing umbrella effort. SARS-CoV-2 raw data submission to SRA (Sequence Read Archive) and metadata to BioSample. Users can modify this protocol to just create a BioSample with no linked raw data. 3. SARS-CoV-2 NCBI consensus submission protocol: GenBank Required: established BioProject and BioSamples Submit SARS-CoV-2 assemblies to NCBI GenBank, linking to existing BioProject, BioSamples, and raw data. Version history: V4: Direct links provided to download metadata templates (instead of hosting duplicate files). Other minor edits throughout the protocol.


2021 ◽  
Author(s):  
Daniel J Rawle ◽  
Thuy Le ◽  
Troy Dumenil ◽  
Cameron Bishop ◽  
Kexin Yan ◽  
...  

Granzyme A (GzmA) is a serine protease secreted by cytotoxic lymphocytes, with GzmA-/- mouse studies informing our understanding of GzmAs physiological function. We show herein that GzmA-/- mice have a mixed C57BL/6J and C57BL/6N background and retain the full length Nicotinamide Nucleotide Transhydrogenase (Nnt) gene, whereas Nnt is truncated in C57BL/6J mice. Chikungunya viral arthritis was substantially ameliorated in GzmA-/- mice; however, the presence of Nnt, rather than loss of GzmA, was responsible for this phenotype by constraining lymphocyte infiltration. A new CRISPR active site mutant C57BL/6J GzmAS211A mouse provided the first insights into GzmAs bioactivity free of background issues, with circulating proteolytically active GzmA promoting immune-stimulating and pro-inflammatory signatures. Remarkably, k-mer mining of the Sequence Read Archive illustrated that ~27% of Run Accessions and ~38% of Bioprojects listing C57BL/6J as the mouse strain, had Nnt sequencing reads inconsistent with a C57BL/6J background. The Nnt issue has clearly complicated our understanding of GzmA and may similarly have influenced studies across a broad range of fields.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huiguang Yi ◽  
Yanling Lin ◽  
Chengqi Lin ◽  
Wenfei Jin

AbstractHere, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. We show that it is the only method that can be used for large-scale dataset comparisons at population resolution on simulated and real data. Using Kssd, we prioritize references for all 1,019,179 bacteria whole genome sequencing (WGS) runs from NCBI Sequence Read Archive and find misidentification or contamination in 6164 of these. Additionally, we analyze WGS and exome runs of samples from the 1000 Genomes Project.


Sign in / Sign up

Export Citation Format

Share Document