PostSV: A Post–Processing Approach for Filtering Structural Variations

Genomic structural variations are significant causes of genome diversity and complex diseases. With advances in sequencing technologies, many algorithms have been designed to identify structural differences using next-generation sequencing (NGS) data. Due to repetitions in the human genome and the short reads produced by NGS, the discovery of structural variants (SVs) by state-of-the-art SV callers is not always accurate. To improve performance, multiple SV callers are often used to detect variants. However, most SV callers suffer from high false-positive rates, which diminishes the overall performance, especially in low-coverage genomes. In this article, we propose a post-processing classification–based algorithm that can be used to filter structural variation predictions produced by SV callers. Novel features are defined from putative SV predictions using reads at the local regions around the breakpoints. Several classifiers are employed to classify the candidate predictions and remove false positives. We test our classifier models on simulated and real genomes and show that the proposed approach improves the performance of state-of-the-art algorithms.

Download Full-text

RETRACTED: LFQC: a lossless compression algorithm for FASTQ files

Bioinformatics ◽

10.1093/bioinformatics/btu701 ◽

2014 ◽

Vol 35 (9) ◽

pp. e1-e7

Author(s):

Sudipta Pathak ◽

Sanguthevar Rajasekaran

Keyword(s):

State Of The Art ◽

Compression Algorithm ◽

Genomic Research ◽

Sequencing Technology ◽

Compression Algorithms ◽

Average Improvement ◽

The Cost ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Motivation Next-generation sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole-genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this article, we address the problem of storage and transmission of large Fastq files using innovative compression techniques. Results We introduce a new lossless non-reference-based fastq compression algorithm named lossless FastQ compressor. We have compared our algorithm with other state of the art big data compression algorithms namely gzip, bzip2, fastqz, fqzcomp, G-SQZ, SCALCE, Quip, DSRC, DSRC-LZ etc. This comparison reveals that our algorithm achieves better compression ratios. The improvement obtained is up to 225%. For example, on one of the datasets (SRR065390_1), the average improvement (over all the algorithms compared) is 74.62%. Availability and implementation The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/∼rajasek/FastqPrograms.zip.

Download Full-text

SW#db: GPU-accelerated exact sequence similarity database search

10.1101/013805 ◽

2015 ◽

Cited By ~ 1

Author(s):

Matija Korpar ◽

Martin Sosic ◽

Dino Blazeka ◽

Mile Sikic

Keyword(s):

Similarity Search ◽

State Of The Art ◽

Sequence Similarity ◽

Database Search ◽

Multiple Queries ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Programming Algorithms ◽

New Algorithms ◽

Generation Sequencing

The deluge of next-generation sequencing (NGS) data and expanding database poses higher requirements for protein similarity search. State-of-the-art tools such as BLAST are not fast enough to cope with these requirements. Because of that it is necessary to create new algorithms that will be faster while keeping similar sensitivity levels. The majority of protein similarity search methods are based on a seed-and-extend approach which uses standard dynamic programming algorithms in the extend phase. In this paper we present a SW#db tool and library for exact similarity search. Although its running times, as standalone tool, are comparable to running times of BLAST it is primarily designed for the extend phase where there are reduced number of candidates in the database. It uses both GPU and CPU parallelization and when we measured multiple queries on Swiss-prot and Uniref90 databases SW#db was 4 time faster than SSEARCH, 6-10 times faster than CUDASW++ and more than 20 times faster than SSW.

Download Full-text

CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

BioMed Research International ◽

10.1155/2017/6375059 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Jing Wang ◽

Cheng Ling ◽

Jingyang Gao

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Sequence Data ◽

High Sensitivity ◽

False Positives ◽

Detection Methods ◽

Structural Variations ◽

Next Generation Sequencing Ngs ◽

Low Coverage ◽

Generation Sequencing

Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously.

Download Full-text

Patient Derived Xenografts for Genome-Driven Therapy of Osteosarcoma

Cells ◽

10.3390/cells10020416 ◽

2021 ◽

Vol 10 (2) ◽

pp. 416

Author(s):

Lorena Landuzzi ◽

Maria Cristina Manara ◽

Pier-Luigi Lollini ◽

Katia Scotlandi

Keyword(s):

Clinical Trials ◽

Tumor Heterogeneity ◽

Functional Studies ◽

Ngs Data Analysis ◽

The Many ◽

Orthotopic Xenografts ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing ◽

Somatic Copy Number Alterations

Osteosarcoma (OS) is a rare malignant primary tumor of mesenchymal origin affecting bone. It is characterized by a complex genotype, mainly due to the high frequency of chromothripsis, which leads to multiple somatic copy number alterations and structural rearrangements. Any effort to design genome-driven therapies must therefore consider such high inter- and intra-tumor heterogeneity. Therefore, many laboratories and international networks are developing and sharing OS patient-derived xenografts (OS PDX) to broaden the availability of models that reproduce OS complex clinical heterogeneity. OS PDXs, and new cell lines derived from PDXs, faithfully preserve tumor heterogeneity, genetic, and epigenetic features and are thus valuable tools for predicting drug responses. Here, we review recent achievements concerning OS PDXs, summarizing the methods used to obtain ectopic and orthotopic xenografts and to fully characterize these models. The availability of OS PDXs across the many international PDX platforms and their possible use in PDX clinical trials are also described. We recommend the coupling of next-generation sequencing (NGS) data analysis with functional studies in OS PDXs, as well as the setup of OS PDX clinical trials and co-clinical trials, to enhance the predictive power of experimental evidence and to accelerate the clinical translation of effective genome-guided therapies for this aggressive disease.

Download Full-text

Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants

Molecules ◽

10.3390/molecules23020399 ◽

2018 ◽

Vol 23 (2) ◽

pp. 399 ◽

Cited By ~ 41

Author(s):

Sima Taheri ◽

Thohirah Lee Abdullah ◽

Mohd Yusop ◽

Mohamed Hanafi ◽

Mahbod Sahebi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Ssr Markers ◽

Next Generation ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

Appendix A: Common File Types Used in Next-Generation Sequencing (NGS) Data Analysis

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-20 ◽

2016 ◽

pp. 199-202

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Next Generation ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data

Wellcome Open Research ◽

10.12688/wellcomeopenres.11689.1 ◽

2017 ◽

Vol 2 ◽

pp. 35 ◽

Cited By ~ 7

Author(s):

Shazia Mahamdallie ◽

Elise Ruark ◽

Shawn Yost ◽

Emma Ramsay ◽

Imran Uddin ◽

...

Keyword(s):

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Negative Results ◽

Targeted Ngs ◽

Predisposition Genes ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Validation Series ◽

Generation Sequencing ◽

Dependent Probe

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.

Download Full-text

AbDiver - A tool to explore the natural antibody landscape to aid therapeutic design

10.1101/2021.11.03.467080 ◽

2021 ◽

Author(s):

Jakub Mlokosiewicz ◽

Piotr Deszynski ◽

Wiktoria Wilman ◽

Igor Jaszczyszyn ◽

Rajkumar Ganesan ◽

...

Keyword(s):

Rational Design ◽

Therapeutic Antibody ◽

Human Antibody ◽

Use Case ◽

Therapeutic Antibodies ◽

Road Map ◽

Antibody Sequence ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Motivation: Rational design of therapeutic antibodies can be improved by harnessing the natural sequence diversity of these molecules. Our understanding of the diversity of antibodies has recently been greatly facilitated through the deposition of hundreds of millions of human antibody sequences in next-generation sequencing (NGS) repositories. Contrasting a query therapeutic antibody sequence to naturally observed diversity in similar antibody sequences from NGS can provide a mutational road-map for antibody engineers designing biotherapeutics. Because of the sheer scale of the antibody NGS datasets, performing queries across them is computationally challenging. Results: To facilitate harnessing antibody NGS data, we developed AbDiver (http://naturalantibody.com/abdiver), a free portal allowing users to compare their query sequences to those observed in the natural repertoires. AbDiver offers three antibody-specific use-cases: 1) compare a query antibody to positional variability statistics precomputed from multiple independent studies 2) retrieve close full variable sequence matches to a query antibody and 3) retrieve CDR3 or clonotype matches to a query antibody. We applied our system to a set of 742 therapeutic antibodies, demonstrating that for each use-case our system can retrieve relevant results for most sequences. AbDiver facilitates the navigation of vast antibody mutation space for the purpose of rational therapeutic antibody de-sign and engineering. Availability: AbDiver is freely accessible at http://naturalantibody.com/abdiver

Download Full-text

ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines

F1000Research ◽

10.12688/f1000research.6157.2 ◽

2015 ◽

Vol 4 ◽

pp. 50 ◽

Cited By ~ 7

Author(s):

Michael T. Wolfinger ◽

Jörg Fallmann ◽

Florian Eggenhofer ◽

Fabian Amman

Keyword(s):

Next Generation Sequencing ◽

Sequence Motif ◽

Software Components ◽

Sequencing Analysis ◽

Next Generation ◽

File Formats ◽

Next Generation Sequencing Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.

Download Full-text

Whole genome sequencing suggests transmission of Corynebacterium diphtheriae-caused cutaneous diphtheria in two siblings, Germany, 2018

Eurosurveillance ◽

10.2807/1560-7917.es.2019.24.2.1800683 ◽

2019 ◽

Vol 24 (2) ◽

Cited By ~ 3

Author(s):

Anja Berger ◽

Alexandra Dangel ◽

Tilmann Schober ◽

Birgit Schmidbauer ◽

Regina Konrad ◽

...

Keyword(s):

Next Generation Sequencing ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Corynebacterium Diphtheriae ◽

Whole Genome ◽

Next Generation ◽

Insect Bites ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

In September 2018, a child who had returned from Somalia to Germany presented with cutaneous diphtheria by toxigenic Corynebacterium diphtheriae biovar mitis. The child’s sibling had superinfected insect bites harbouring also toxigenic C. diphtheriae. Next generation sequencing (NGS) revealed the same strain in both patients suggesting very recent human-to-human transmission. Epidemiological and NGS data suggest that the two cutaneous diphtheria cases constitute the first outbreak by toxigenic C. diphtheriae in Germany since the 1980s.

Download Full-text