spliced alignment Latest Research Papers

Detecting gene fusions involving driver oncogenes is pivotal in clinical diagnosis and treatment of cancer patients. Recent developments in next-generation sequencing (NGS) technologies have enabled improved assays for bioinformatics-based gene fusions detection. In clinical applications, where a small number of fusions are clinically actionable, targeted polymerase chain reaction (PCR)-based NGS chemistries, such as the QIAseq RNAscan assay, aim to improve accuracy compared to standard RNA sequencing. Existing informatics methods for gene fusion detection in NGS-based RNA sequencing assays traditionally use a transcriptome-based spliced alignment approach or a de-novo assembly approach. Transcriptome-based spliced alignment methods face challenges with short read mapping yielding low quality alignments. De-novo assembly-based methods yield longer contigs from short reads that can be more sensitive for genomic rearrangements, but face performance and scalability challenges. Consequently, there exists a need for a method to efficiently and accurately detect fusions in targeted PCR-based NGS chemistries. We describe SeekFusion, a highly accurate and computationally efficient pipeline enabling identification of gene fusions from PCR-based NGS chemistries. Utilizing biological samples processed with the QIAseq RNAscan assay and in-silico simulated data we demonstrate that SeekFusion gene fusion detection accuracy outperforms popular existing methods such as STAR-Fusion, TOPHAT-Fusion and JAFFA-hybrid. We also present results from 4,484 patient samples tested for neurological tumors and sarcoma, encompassing details on some novel fusions identified.

Download Full-text

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Genome Biology ◽

10.1186/s13059-021-02296-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew T. Parker ◽

Katarzyna Knop ◽

Geoffrey J. Barton ◽

Gordon G. Simpson

Keyword(s):

Machine Learning ◽

Transcriptome Assembly ◽

Error Rates ◽

Sequence Information ◽

Sequencing Technologies ◽

Alternative Processing ◽

Spliced Alignment ◽

Long Reads ◽

Long Read ◽

Splice Junctions

AbstractTranscription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.

Download Full-text

ClipSV: improving structural variation detection by read extension, spliced alignment and tree-based decision rules

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab003 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Peng Xu ◽

Yu chen ◽

Min Gao ◽

Zechen Chong

Keyword(s):

Complex Traits ◽

Structural Variation ◽

Decision Rules ◽

Genomic Variation ◽

Read Length ◽

Sequencing Data ◽

Sequence Alignments ◽

Spliced Alignment ◽

Genomic Studies ◽

Low Sensitivity

Abstract Structural variation (SV), which consists of genomic variation from 50 to millions of base pairs, confers considerable impacts on human diseases, complex traits and evolution. Accurately detecting SV is a fundamental step to characterize the features of individual genomes. Currently, several methods have been proposed to detect SVs using the next-generation sequencing (NGS) platform. However, due to the short length of sequencing reads and the complexity of SV content, the SV-detecting tools are still limited by low sensitivity, especially for insertion detection. In this study, we developed a novel tool, ClipSV, to improve SV discovery. ClipSV utilizes a read extension and spliced alignment approach to overcoming the limitation of read length. By reconstructing long sequences from SV-associated short reads, ClipSV discovers deletions and short insertions from the long sequence alignments. To comprehensively characterize insertions, ClipSV implements tree-based decision rules that can efficiently utilize SV-containing reads. Based on the evaluations of both simulated and real sequencing data, ClipSV exhibited an overall better performance compared to currently popular tools, especially for insertion detection. As NGS platform represents the mainstream sequencing capacity for routine genomic applications, we anticipate ClipSV will serve as an important tool for SV characterization in future genomic studies.

Download Full-text

HISAT-3N: a rapid and accurate three-nucleotide sequence aligner

10.1101/2020.12.15.422906 ◽

2020 ◽

Author(s):

Yun Zhang ◽

Chanhee Park ◽

Christopher Bennett ◽

Micah Thornton ◽

Daehwan Kim

Keyword(s):

Nucleotide Sequence ◽

Simulated Data ◽

Alignment Accuracy ◽

Data Sets ◽

Cellular Processes ◽

Sequencing Technologies ◽

Spliced Alignment ◽

Hierarchical Index ◽

Simulated Data Sets ◽

The Ideal

Nucleotide conversion sequencing technologies such as bisulfite-seq and SLAM-seq are powerful tools to explore the intricacies of cellular processes. In this paper, we describe HISAT-3N (hierarchical indexing for spliced alignment of transcripts - 3 nucleotides), which rapidly and accurately aligns sequences consisting of nucleotide conversions by leveraging powerful hierarchical index and repeat index algorithms originally developed for the HISAT software. Tests on real and simulated data sets demonstrate that HISAT-3N is over 7 times faster, has greater alignment accuracy, and has smaller memory requirements than other modern systems. Taken together HISAT-3N is the ideal aligner for use with converted sequence technologies.

Download Full-text

LongTron: Automated Analysis of Long Read Spliced Alignment Accuracy

10.1101/2020.11.10.376871 ◽

2020 ◽

Author(s):

Christopher Wilks ◽

Michael C. Schatz

Keyword(s):

Random Forest ◽

Cancer Cell Line ◽

Automated Analysis ◽

Error Rates ◽

Supplementary Information ◽

Splice Sites ◽

Link Type ◽

Spliced Alignment ◽

Oxford Nanopore ◽

Long Read

AbstractMotivationLong read sequencing has increased the accuracy and completeness of assemblies of various organisms’ genomes in recent months. Similarly, spliced alignments of long read RNA sequencing hold the promise of delivering much longer transcripts of existing and novel isoforms in known genes without the need for error-prone transcript assemblies from short reads. However, low coverage and high-error rates potentially hamper the widespread adoption of long-read spliced alignments in annotation updates and isoform-level expression quantifications.ResultsAddressing these issues, we first develop a simulation of error modes for both Oxford Nanopore and PacBio CCS spliced-alignments. Based on this we train a Random Forest classifier to assign new long-read alignments to one of two error categories, a novel category, or label them as non-error. We use this classifier to label reads from the spliced-alignments of the popular aligner minimap2, run on three long read sequencing datasets, including NA12878 from Oxford Nanopore and PacBio CCS, as well as a PacBio SKBR3 cancer cell line. Finally, we compare the intron chains of the three long read alignments against individual splice sites, short read assemblies, and the output from the FLAIR pipeline on the same samples.Our results demonstrate a substantial lack of precision in determining exact splice sites for long reads during alignment on both platforms while showing some benefit from postprocessing. This work motivates the need for both better aligners and additional post-alignment processing to adjust incorrectly called putative splice-sites and clarify novel transcripts support.Availability and implementationSource code for the random forest implemented in python is available at https://github.com/schatzlab/LongTron under the MIT license. The modified version of GffCompare used to construct Table 3 and related is here: https://github.com/ChristopherWilks/gffcompare/releases/tag/0.11.2LTSupplementary InformationSupplementary notes and figures are available online.

Download Full-text

Accurate spliced alignment of long RNA sequencing reads

10.1101/2020.09.02.279208 ◽

2020 ◽

Author(s):

Kristoffer Sahlin ◽

Veli Mäkinen

Keyword(s):

Rna Sequencing ◽

State Of The Art ◽

Synthetic Data ◽

Biological Data ◽

Alignment Method ◽

Spliced Alignment ◽

Sequencing Technique ◽

Long Read ◽

New Challenges ◽

Novel Isoforms

AbstractLong-read RNA sequencing techniques are quickly establishing themselves as the primary sequencing technique to study the transcriptome landscape. Many such analyses are dependent upon splice alignment of reads to the genome. However, the error rate and sequencing length of long-read technologies create new challenges for accurately aligning these reads. We present an alignment method uLTRA that, on simulated and synthetic data, shows higher accuracy over state-of-the-art with substantially higher accuracy for small exons. We show several examples on biological data where uLTRA aligns to known and novel isoforms with exon structures that are not detected with other aligners. uLTRA is available at https://github.com/ksahlin/ultra.

Download Full-text

Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

10.1101/2020.05.27.118679 ◽

2020 ◽

Cited By ~ 1

Author(s):

Matthew T. Parker ◽

Katarzyna Knop ◽

Geoffrey J. Barton ◽

Gordon G. Simpson

Keyword(s):

Machine Learning ◽

Transcriptome Assembly ◽

Error Rates ◽

Sequence Information ◽

Sequencing Technologies ◽

Alternative Processing ◽

Spliced Alignment ◽

Long Reads ◽

Long Read ◽

Splice Junctions

AbstractTranscription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.

Download Full-text

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

BMC Bioinformatics ◽

10.1186/s12859-019-2647-2 ◽

2019 ◽

Vol 20 (S3) ◽

Cited By ~ 2

Author(s):

Safa Jammali ◽

Jean-David Aguilar ◽

Esaie Kuitche ◽

Aïda Ouangraoua

Keyword(s):

Spliced Alignment

Download Full-text

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

10.1101/420307 ◽

2018 ◽

Author(s):

Safa Jammali ◽

Jean-David Aguilar ◽

Esaie Kuitche ◽

Aïda Ouangraoua

Keyword(s):

Gene Family ◽

Dna Sequences ◽

Cdna Sequence ◽

Genomic Sequence ◽

Sequence Similarity ◽

Basic Step ◽

Alignment Algorithms ◽

Spliced Alignment ◽

Gene Structures ◽

Family Based

AbstractMotivationThe inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments.ResultsThe experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. We show its usefulness for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses.AvailabilitySplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/[email protected]

Download Full-text

Magic-BLAST, an accurate DNA and RNA-seq aligner for long and short reads

10.1101/390013 ◽

2018 ◽

Cited By ~ 10

Author(s):

Grzegorz M Boratyn ◽

Jean Thierry-Mieg ◽

Danielle Thierry-Mieg ◽

Ben Busby ◽

Thomas L Madden

Keyword(s):

Data Sets ◽

Rna Seq ◽

Seed Selection ◽

Sequencing Technologies ◽

Dna And Rna ◽

Spliced Alignment ◽

Long Reads ◽

Wide Range ◽

Blast Database ◽

Innovative Techniques

ABSTRACTNext-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. It uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.

Download Full-text

spliced alignment
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SeekFusion - A Clinically Validated Fusion Transcript Detection Pipeline for PCR-Based Next-Generation Sequencing of RNA

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

ClipSV: improving structural variation detection by read extension, spliced alignment and tree-based decision rules

HISAT-3N: a rapid and accurate three-nucleotide sequence aligner

LongTron: Automated Analysis of Long Read Spliced Alignment Accuracy

Accurate spliced alignment of long RNA sequencing reads

Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

Magic-BLAST, an accurate DNA and RNA-seq aligner for long and short reads

Export Citation Format

spliced alignmentRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SeekFusion - A Clinically Validated Fusion Transcript Detection Pipeline for PCR-Based Next-Generation Sequencing of RNA

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

ClipSV: improving structural variation detection by read extension, spliced alignment and tree-based decision rules

HISAT-3N: a rapid and accurate three-nucleotide sequence aligner

LongTron: Automated Analysis of Long Read Spliced Alignment Accuracy

Accurate spliced alignment of long RNA sequencing reads

Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

Magic-BLAST, an accurate DNA and RNA-seq aligner for long and short reads

spliced alignment
Recently Published Documents