sequence quality
Recently Published Documents


TOTAL DOCUMENTS

90
(FIVE YEARS 36)

H-INDEX

17
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Jasmine Amirzadegan ◽  
tunc.kayikcioglu not provided ◽  
hugh.rand not provided ◽  
Ruth E Timme ◽  
Maria Balkey

PURPOSE: Step-by-step instructions for checking sequence quality for SARS-CoV-2 wastewater samples using SSQuAWK: SARS - CoV - 2 Sequence Quality Assurance Workflow and Kontraption. The SSQuAWK workflow, implemented in a custom Galaxy instance, will produce quality assessments for raw reads (Illumina MiSeq paired-end fastq files). SCOPE: This protocol covers the following tasks: 1. Set up an account in GalaxyTrakr 2. Create a new history 3. Upload data 4. Execute the SSQuAWK workflow 5. Interpret the results Version history: V1: Basic protocol steps with screenshots V2: Addition of a detailed 12 minute video tutorial


2021 ◽  
Author(s):  
C. Gary Olds ◽  
Jessie W. Berta-Thompson ◽  
Justin J. Loucks ◽  
Richard A. Levy ◽  
Andrew W. Wilson

Premise: Fungaria are a largely untapped source for understanding fungal biodiversity. The effort and cost in producing DNA barcode sequence data for large numbers of fungal specimens can be prohibitive. This study applies a modified metabarcoding approach that provides a labor and cost-effective solution for sequencing the fungal DNA barcode from hundreds of specimens at once. Methods: A two-step PCR approach uses nested barcoded primers to nrITS2 sequence data. We applied this to 766 macrofungal specimens that represent a broad taxonomic sampling of the Dikarya, of which 382 Lactarius specimens are used to identify molecular operational taxonomic units (MOTUs) through a phylogenetic approach. Scripts in Python and R were used to organize sequence data and execute packages CutAdapt and DADA2 were used for primer removal and assessing sequence quality. Sequences were compared to NCBI and UNITE databases and Sanger-produced sequences. Results: Specimen taxonomic identities from nrITS2 sequence data are >90% accurate across all specimens sampled. Phylogenetic analysis of Lactarius sequences identified 20 MOTUs. Discussion: The results demonstrate the capacity of these methods to produce nrITS2 sequences from large numbers of fungarium specimens. This provides an opportunity to more effectively use fungarium collections in advancing fungal diversity identification and documentation.


2021 ◽  
Vol 5 ◽  
Author(s):  
Adriana E. Radulovici ◽  
Pedro E. Vieira ◽  
Sofia Duarte ◽  
Marcos A. L. Teixeira ◽  
Luisa M. S. Borges ◽  
...  

The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: a) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Daniel P. Dacey ◽  
Frédéric J. J. Chain

Abstract Background Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases. Results The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes. Conclusions Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.


2021 ◽  
Author(s):  
Avika Dixit ◽  
Luca Freschi ◽  
Roger Vargas ◽  
Matthias I Groeschel ◽  
Sabira Tahseen ◽  
...  

Background: Global tuberculosis (TB) drug resistance (DR) surveillance is largely focused on the drug rifampicin. We leveraged public and surveillance M. tuberculosis (Mtb) whole genome sequencing (WGS) data, to generate more comprehensive country-level resistance prevalence estimates (antibiograms) using in silico resistance prediction. Methods: We curated and quality-controlled Mtb WGS data. We used a validated random forest model to predict phenotypic resistance to twelve drugs and bias-corrected for model performance, outbreak sampling, and resistance oversampling. We validated our estimates using a national DR survey conducted in South Africa. Results: Mtb isolates from 29 countries (n=19,149) met sequence quality criteria. Marginal genotypic resistance estimates overlapped with the South African DR survey for all drugs except isoniazid and second-line injectables that were underestimated (n=3,134); among multi-drug resistant (MDR) TB, estimates overlapped for pyrazinamide and the fluoroquinolones. Globally, mono-resistance to isoniazid was estimated at 10.9% (95% CI: 10.2-11.7%, n = 14,012. Mono-levofloxacin resistance rates were highest in South Asia (Pakistan 3.4% [0.1-11%], n=111 and India 2.8% [0.08-9.4%], n=114). Rates of resistance discordance between isoniazid and ethionamide were high with 74.4% (IQR: 64.5-79.7%) of isoniazid resistant isolates predicted to be ethionamide susceptible. The global susceptibility rate to pyrazinamide and levofloxacin among MDR was 15.1% (95% CI: 10.2-19.9%, n=3,964). Conclusions: This is the first attempt at global Mtb antibiogram estimation. DR prevalence in Mtb can be reliably estimated using public WGS and phenotypic resistance prediction for key antibiotics. Our results raise concerns about the empiric use of short-course fluoroquinolone regimens for drug susceptible TB in South Asia and suggest that ethionamide is an under-utilized drug in MDR treatment.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ryohei Nakao ◽  
Ryutei Inui ◽  
Yoshihisa Akamatsu ◽  
Masuji Goto ◽  
Hideyuki Doi ◽  
...  

AbstractEnvironmental DNA (eDNA) analysis is a method of detecting DNA from environmental samples and is used as a biomonitoring tool. In recent studies, Illumina MiSeq has been the most extensively used tool for eDNA metabarcoding. The Illumina iSeq 100 (hereafter, iSeq), one of the high-throughput sequencers (HTS), has a relatively simple workflow and is potentially more affordable than other HTS. However, its utility in eDNA metabarcoding has still not been investigated. In the present study, we applied fish eDNA metabarcoding to 40 water samples from river and lake ecosystems to assess the difference in species detectability and composition between iSeq and MiSeq. To check differences in sequence quality and errors, we also assessed differences in read changes between the two HTS. There were similar sequence qualities between iSeq and MiSeq. Significant difference was observed in the number of species between two HTS, but no difference was observed in species composition between the two HTS. Additionally, the species compositions in common with the conventional method were the same between the two HTS. According to the results, using the same amplicon library for sequencing, two HTS would exhibit a similar performance of fish species detection using eDNA metabarcoding.


Author(s):  
Damien Jacot ◽  
Trestan Pillonel ◽  
Gilbert Greub ◽  
Claire Bertelli

Although many laboratories worldwide have developed their sequencing capacities in response to the need for SARS-CoV-2 genome-based surveillance of variants, only few reported some quality criteria to ensure sequence quality before lineage assignment and submission to public databases. Hence, we aimed here to provide simple quality control criteria for SARS-CoV-2 sequencing to prevent erroneous interpretation of low quality or contaminated data. We retrospectively investigated 647 SARS-CoV-2 genomes obtained over ten tiled amplicons sequencing runs. We extracted 26 potentially relevant metrics covering the entire workflow from sample selection to bioinformatics analysis. Based on data distribution, critical values were established for eleven selected metrics to prompt further quality investigations for problematic samples, in particular those with a low viral RNA quantity. Low frequency variants (<70% of supporting reads) can result from PCR amplification errors, sample cross contaminations or presence of distinct SARS-CoV2 genomes in the sample sequenced. The number and the prevalence of low frequency variants can be used as a robust quality criterion to identify possible sequencing errors or contaminations. Overall, we propose eleven metrics with fixed cutoff values as a simple tool to evaluate the quality of SARS-CoV-2 genomes, among which cycle thresholds, mean depth, proportion of genome covered at least 10x and the number of low frequency variants combined with mutation prevalence data.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Shiferaw Garoma Wayessa ◽  
Ayana Abera Beyene

Road constructing in Ethiopia is increasingly in demand to meet its medium and long term development programs. Most internal road of Oromia cities/town are cobblestone and gravel. Some portions along the alignment proposed and existing roads traversed low resistance of the subgrade that affect the stability of upper layers of cobblestones.  Structural failures are observed on cobblestones roads, would it be constructed by good quality or low quality of materials. Nekemte Cobblestones Projects have been started in 2014 widely which were failed in most area today as we observed that needs to be addressed and a corresponding remedial measures must be drawn. A possible remedial measures had been organized for every observed  failure  or  destroyed  to  obtain  normal  road  condition  of  the study  area. An assessment was made by using observation, interviews, laboratory and field test to determine the adequacy of the cobblestones, underlain material to serve as a subgrade for road construction based on project specifications and Ethiopian Road Authority (ERA) low volume road Specification. From the field tests and laboratory tests carried out, it is observed that the causes of cobblestone road failures of this road section are mainly due to the construction steps/sequence, quality of materials, road construction time, lack of proper design and quality control, absences of drainage structures, lack of highly compaction, lack of accurately fill fine aggregate and suddenly high loads vehicle applied on cobblestone road.


2021 ◽  
Author(s):  
Bryan Thornlow ◽  
Angie S. Hinrichs ◽  
Miten Jain ◽  
Namrita Dhillon ◽  
Scott La ◽  
...  

AbstractWe report a SARS-CoV-2 lineage that shares N501Y, P681H, and other mutations with known variants of concern, such as B.1.1.7. This lineage, which we refer to as B.1.x (COG-UK sometimes references similar samples as B.1.324.1), is present in at least 20 states across the USA and in at least six countries. However, a large deletion causes the sequence to be automatically rejected from repositories, suggesting that the frequency of this new lineage is underestimated using public data. Recent dynamics based on 339 samples obtained in Santa Cruz County, CA, USA suggest that B.1.x may be increasing in frequency at a rate similar to that of B.1.1.7 in Southern California. At present the functional differences between this variant B.1.x and other circulating SARS-CoV-2 variants are unknown, and further studies on secondary attack rates, viral loads, immune evasion and/or disease severity are needed to determine if it poses a public health concern. Nonetheless, given what is known from well-studied circulating variants of concern, it seems unlikely that the lineage could pose larger concerns for human health than many already globally distributed lineages. Our work highlights a need for rapid turnaround time from sequence generation to submission and improved sequence quality control that removes submission bias. We identify promising paths toward this goal.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0242294
Author(s):  
Julie Haendiges ◽  
Karen Jinneman ◽  
Narjol Gonzalez-Escalona

Whole genome sequencing (WGS) provides essential public health information and is used worldwide for pathogen surveillance, epidemiology, and source tracking. Foodborne pathogens are often sequenced using rapid library preparation chemistries based on transposon technology; however, this method may miss random segments of genomes that can be important for accurate downstream analyses. As new technologies become available, it may become possible to achieve better overall coverage. Here we compare the sequence quality obtained using libraries prepared from the Nextera XT and Nextera DNA Prep (Illumina, San Diego, CA) chemistries for 31 Shiga toxin-producing Escherichia coli (STEC) O121:H19 strains, which had been isolated from flour during a 2016 outbreak. The Nextera DNA Prep gave superior performance metrics including sequence quality, assembly quality, uniformity of genome coverage, and virulence gene identification, among other metrics. Comprehensive detection of virulence genes is essential for making educated assessments of STECs virulence potential. The phylogenetic SNP analysis did not show any differences in the variants detected by either library preparation method which allows isolates prepared from either library method to be analysed together. Our comprehensive comparison of these chemistries should assist researchers wishing to improve their sequencing workflow for STECs and other genomic risk assessments.


Sign in / Sign up

Export Citation Format

Share Document