libqcpp: A C++14 sequence quality control library

AbstractWe report a SARS-CoV-2 lineage that shares N501Y, P681H, and other mutations with known variants of concern, such as B.1.1.7. This lineage, which we refer to as B.1.x (COG-UK sometimes references similar samples as B.1.324.1), is present in at least 20 states across the USA and in at least six countries. However, a large deletion causes the sequence to be automatically rejected from repositories, suggesting that the frequency of this new lineage is underestimated using public data. Recent dynamics based on 339 samples obtained in Santa Cruz County, CA, USA suggest that B.1.x may be increasing in frequency at a rate similar to that of B.1.1.7 in Southern California. At present the functional differences between this variant B.1.x and other circulating SARS-CoV-2 variants are unknown, and further studies on secondary attack rates, viral loads, immune evasion and/or disease severity are needed to determine if it poses a public health concern. Nonetheless, given what is known from well-studied circulating variants of concern, it seems unlikely that the lineage could pose larger concerns for human health than many already globally distributed lineages. Our work highlights a need for rapid turnaround time from sequence generation to submission and improved sequence quality control that removes submission bias. We identify promising paths toward this goal.

Download Full-text

how_are_we_stranded_here: Quick determination of RNA-Seq strandedness

10.1101/2021.03.10.434861 ◽

2021 ◽

Cited By ~ 1

Author(s):

Beth Signal ◽

Tim Kahlke

Keyword(s):

Quality Control ◽

Rna Sequencing ◽

Published Data ◽

Sequencing Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Sequence Quality ◽

Quick Determination

ABSTRACTQuality control checks are the first step in RNA-Sequencing analysis, which enable the identification of common issues that occur in the sequenced reads. Checks for sequence quality, contamination, and complexity are commonplace, and allow users to implement steps downstream which can account for these issues. Strand-specificity of reads is frequently overlooked and is often unavailable even in published data, yet when unknown or incorrectly specified can have detrimental effects on the reproducibility and accuracy of downstream analyses. We present how_are_we_stranded_here, a Python library that helps to quickly infer strandedness of paired-end RNA-Sequencing data.

Download Full-text

Sequence Quality Control v1 (protocols.io.j2icqce)

protocols.io ◽

10.17504/protocols.io.j2icqce ◽

2017 ◽

Author(s):

James Thornton

Keyword(s):

Quality Control ◽

Sequence Quality

Download Full-text

HIV Databases video tutorial on use of the HIV sequence quality control tool.

10.2172/1566101 ◽

2019 ◽

Author(s):

Brian Thomas Foley

Keyword(s):

Quality Control ◽

Video Tutorial ◽

Quality Control Tool ◽

Sequence Quality ◽

Control Tool

Download Full-text

QuAdTrim: Overcoming computational bottlenecks in sequence quality control

10.1101/2019.12.18.870642 ◽

2019 ◽

Author(s):

Andrew J. Robinson ◽

Elizabeth M. Ross

Keyword(s):

Quality Control ◽

High Throughput Sequencing ◽

Source Code ◽

Poor Quality ◽

Data Quality Control ◽

Common Error ◽

Control Programs ◽

Sequence Quality ◽

Dot Com ◽

Adapter Trimming

AbstractWith the recent torrent of high throughput sequencing (HTS) data the necessity for highly efficient algorithms for common tasks is paramount. One task for which the basis for all further analysis of HTS data is initial data quality control, that is, the removal or trimming of poor quality reads from the dataset. Here we present QuAdTrim, a quality control and adapter trimming algorithm for HTS data that is up to 57 times faster and uses less than 0.06% of the memory of other commonly used HTS quality control programs. QuAdTrim will reduce the time and memory required for quality control of HTS data, and in doing, will reduce the computational demands of a fundamental step in HTS data analysis. Additionally, QuAdTrim impliments the removal of homopolymer Gs from the 3’ end of sequence reads, a common error generated on the NovaSeq, NextSeq and iSeq100 platforms.Availability and ImplementationThe source code is freely available on bitbucket under a BSD licence, see COPYING file for details: https://bitbucket.org/arobinson/quadtrimContactAndrew Robinson andrewjrobinson at gmail dot com

Download Full-text