Atropos: specific, sensitive, and speedy trimming of sequencing reads

10.7287/peerj.preprints.2452v3 ◽

2017 ◽

Author(s):

John P Didion ◽

Marcel Martin ◽

Francis S Collins

Keyword(s):

High Performance ◽

Simulated Data ◽

Leading Edge ◽

High Sensitivity ◽

Fold Increase ◽

Parallel Execution ◽

High Rate ◽

Data Sets ◽

Sequencing Data ◽

Broad Feature

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves a four-fold increase in trimming accuracy and a decrease in execution time of ~50% (using 16 parallel execution threads). Furthermore, Atropos maintains high accuracy even when trimming simulated data with a high rate of error. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of most current-generation sequencing data sets. Atropos is open source and free software written in Python and available at https://github.com/jdidion/atropos.

Download Full-text

Atropos: specific, sensitive, and speedy trimming of sequencing reads

10.7287/peerj.preprints.2452v1 ◽

2016 ◽

Author(s):

John P Didion ◽

Francis S Collins

Keyword(s):

High Performance ◽

Simulated Data ◽

Leading Edge ◽

High Sensitivity ◽

Fold Increase ◽

Parallel Execution ◽

High Rate ◽

Data Sets ◽

Sequencing Data ◽

Broad Feature

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves a four-fold increase in trimming accuracy and a decrease in execution time of ~50% (using 16 parallel execution threads). Furthermore, Atropos maintains high accuracy even when trimming simulated data with a high rate of error. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of most current-generation sequencing data sets. Atropos is open source and free software written in Python and available at https://github.com/jdidion/atropos.

Download Full-text

ScaR—a tool for sensitive detection of known fusion transcripts: establishing prevalence of fusions in testicular germ cell tumors

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqz025 ◽

2020 ◽

Vol 2 (1) ◽

Author(s):

Sen Zhao ◽

Andreas M Hoff ◽

Rolf I Skotheim

Keyword(s):

Germ Cell ◽

Germ Cell Tumors ◽

Simulated Data ◽

High Sensitivity ◽

Fusion Transcript ◽

Data Sets ◽

Testicular Germ Cell Tumors ◽

Sequencing Data ◽

Testicular Germ Cell ◽

Fusion Transcripts

Abstract Bioinformatics tools for fusion transcript detection from RNA-sequencing data are in general developed for identification of novel fusions, which demands a high number of supporting reads and strict filters to avoid false discoveries. As our knowledge of bona fide fusion genes becomes more saturated, there is a need to establish their prevalence with high sensitivity. We present ScaR, a tool that uses a supervised scaffold realignment approach for sensitive fusion detection in RNA-seq data. ScaR detects a set of 130 synthetic fusion transcripts from simulated data at a higher sensitivity compared to established fusion finders. Applied to fusion transcripts potentially involved in testicular germ cell tumors (TGCTs), ScaR detects the fusions RCC1-ABHD12B and CLEC6A-CLEC4D in 9% and 28% of 150 TGCTs, respectively. The fusions were not detected in any of 198 normal testis tissues. Thus, we demonstrate high prevalence of RCC1-ABHD12B and CLEC6A-CLEC4D in TGCTs, and their cancer specific features. Further, we find that RCC1-ABHD12B and CLEC6A-CLEC4D are predominantly expressed in the seminoma and embryonal carcinoma histological subtypes of TGCTs, respectively. In conclusion, ScaR is useful for establishing the frequency of known and validated fusion transcripts in larger data sets and detecting clinically relevant fusion transcripts with high sensitivity.

Download Full-text

Atropos: specific, sensitive, and speedy trimming of sequencing reads

PeerJ ◽

10.7717/peerj.3720 ◽

2017 ◽

Vol 5 ◽

pp. e3720 ◽

Cited By ~ 70

Author(s):

John P. Didion ◽

Marcel Martin ◽

Francis S. Collins

Keyword(s):

High Performance ◽

State Of The Art ◽

Leading Edge ◽

High Sensitivity ◽

High Accuracy ◽

Free Software ◽

Short Read Sequencing ◽

Sequencing Errors ◽

Quality And Reliability ◽

Broad Feature

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

Download Full-text

Atropos: specific, sensitive, and speedy trimming of sequencing reads

10.7287/peerj.preprints.2452v4 ◽

2017 ◽

Author(s):

John P Didion ◽

Marcel Martin ◽

Francis S Collins

Keyword(s):

High Performance ◽

State Of The Art ◽

Leading Edge ◽

High Sensitivity ◽

High Accuracy ◽

Free Software ◽

Short Read Sequencing ◽

Sequencing Errors ◽

Quality And Reliability ◽

Broad Feature

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Availability. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

Download Full-text

Atropos: specific, sensitive, and speedy trimming of sequencing reads

10.7287/peerj.preprints.2452 ◽

2017 ◽

Author(s):

John P Didion ◽

Marcel Martin ◽

Francis S Collins

Keyword(s):

High Performance ◽

State Of The Art ◽

Leading Edge ◽

High Sensitivity ◽

High Accuracy ◽

Free Software ◽

Short Read Sequencing ◽

Sequencing Errors ◽

Quality And Reliability ◽

Broad Feature

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Availability. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

Download Full-text

ScaR - A tool for sensitive detection of known fusion transcripts: Establishing prevalence of fusions in testicular germ cell tumors

10.1101/518316 ◽

2019 ◽

Author(s):

Sen Zhao ◽

Andreas M. Hoff ◽

Rolf I. Skotheim

Keyword(s):

Germ Cell ◽

Germ Cell Tumors ◽

Simulated Data ◽

High Sensitivity ◽

Fusion Transcript ◽

Data Sets ◽

Testicular Germ Cell Tumors ◽

Sequencing Data ◽

Testicular Germ Cell ◽

Fusion Transcripts

AbstractBioinformatics tools for fusion transcript detection from RNA-sequencing data are in general developed for identification of novel fusions, which demands a high number of supporting reads and strict filters to avoid false discoveries. As our knowledge of bona-fide fusion genes becomes more saturated, there is a need to establish their prevalence with high sensitivity. We present ScaR, a tool that uses a scaffold realignment approach for sensitive fusion detection in RNA-seq data. ScaR detects a set of 50 synthetic fusion transcripts from simulated data at a higher sensitivity compared to established fusion finders. Applied to fusion transcripts potentially involved in testicular germ cell tumors (TGCTs), ScaR detects the fusions RCC1-ABHD12B and CLEC6A-CLEC4D in 9% and 28% of 150 TGCTs, respectively. The fusions were not detected in any of 198 normal testis tissues. Thus, we demonstrate high prevalence of RCC1-ABHD12B and CLEC6A-CLEC4D in TGCTs, and their cancer specific features. Further, we find that RCC1-ABHD12B and CLEC6A-CLEC4D are predominantly expressed in the seminoma and embryonal carcinoma histological subtypes of TGCTs, respectively. In conclusion, ScaR is useful for establishing the frequency of known fusion transcripts in larger data sets and detecting clinically relevant fusion transcripts with high sensitivity.Availabilityhttps://github.com/senzhaocode/ScaR

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

Circall: fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04418-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dat Thanh Nguyen ◽

Quang Thinh Trac ◽

Thi-Hau Nguyen ◽

Ha-Nam Nguyen ◽

Nir Ohad ◽

...

Keyword(s):

Rna Sequencing ◽

Simulated Data ◽

High Sensitivity ◽

Circular Rna ◽

Computational Time ◽

Circular Rnas ◽

Rna Seq ◽

Sequencing Data ◽

Mapping Algorithm ◽

False Discovery Rate Method

Abstract Background Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. Results We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. Conclusions With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.

Download Full-text

Complementary Metaresonator Sensor with Dual Notch Resonance for Evaluation of Vegetable Oils in C and X Bands

Applied Sciences ◽

10.3390/app11125734 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5734

Author(s):

Ammar Armghan

Keyword(s):

Dielectric Constant ◽

Vegetable Oils ◽

High Performance ◽

Ground Plane ◽

Simulated Data ◽

High Sensitivity ◽

Network Analyzer ◽

X Band ◽

Compact Size ◽

The Difference

This paper investigates the effect of complementary metaresonator for evaluation of vegetable oils in C and X bands. Tremendously increasing technology demands the exploration of complementary metaresonators for high performance in the related bands. This research probes the complementary mirror-symmetric S resonator (CMSSR) that can operate in two bands with compact size and high sensitivity features. The prime motivation behind the proposed technique is to utilize the dual notch resonance to estimate the dielectric constant of the oil under test (OUT). The proposed sensor is designed on a compact 30×25 mm2 and 1.6 mm thick FR-4 substrate. A 50 Ω microstrip transmission line is printed on one side, while a unit cell of CMSSR is etched on the other side of the substrate to achieve dual notch resonance. A Teflon container is attached to CMSSR in the ground plane to act as a pool for the OUT. According to the simulated transmission spectrum, the proposed design manifested dual notch resonance precisely at 7.21 GHz (C band) and 8.97 GHz (X band). A prototype of complementary metaresonator sensor is fabricated and tested using CEYEAR AV3672D vector network analyzer. The comparison of measured and simulated data shows that the difference between the first resonance frequency is 0.01 GHz and the second is 0.04 GHz. Furthermore, a mathematical model is developed for the complementary metaresonator sensor to evaluate dielectric constant of the OUT in terms of the relevant, resonant frequency.

Download Full-text