ABRIDGE: An ultra-compression software for SAM alignment files

Mapping Intimacies ◽

10.1101/2022.01.04.474935 ◽

2022 ◽

Author(s):

Sagnik Banerjee ◽

Carson Andorf

Keyword(s):

Compression Ratio ◽

State Of The Art ◽

Transcriptome Assembly ◽

Genetic Data ◽

Redundant Information ◽

Rna Seq ◽

New Approach ◽

Gene Count ◽

Lossless And Lossy Compression ◽

Novel Algorithm

Advancement in technology has enabled sequencing machines to produce vast amounts of genetic data, causing an increase in storage demands. Most genomic software utilizes read alignments for several purposes including transcriptome assembly and gene count estimation. Herein we present, ABRIDGE, a state-of-the-art compressor for SAM alignment files offering users both lossless and lossy compression options. This reference-based file compressor achieves the best compression ratio among all compression software ensuring lower space demand and faster file transmission. Central to the software is a novel algorithm that retains non-redundant information. This new approach has allowed ABRIDGE to achieve a compression 16% higher than the second-best compressor for RNA-Seq reads and over 35% for DNA-Seq reads. ABRIDGE also offers users the option to randomly access location without having to decompress the entire file. ABRIDGE is distributed under MIT license and can be obtained from GitHub and docker hub. We anticipate that the user community will adopt ABRIDGE within their existing pipeline encouraging further research in this domain.

Download Full-text

Leveraging multiple transcriptome assembly methods for improved gene structure annotation

10.1101/216994 ◽

2017 ◽

Cited By ~ 5

Author(s):

Luca Venturini ◽

Shabhonam Caim ◽

Gemy G Kaithakottil ◽

Daniel L Mapleson ◽

David Swarbreck

Keyword(s):

Open Source ◽

Gene Structure ◽

Transcriptome Assembly ◽

Rna Seq ◽

Multiple Methods ◽

Link Type ◽

Transcript Reconstruction ◽

Structure Annotation ◽

Optimal Approach ◽

Novel Algorithm

AbstractThe performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.

Download Full-text

Exact transcript quantification over splice graphs

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00184-7 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Cong Ma ◽

Hongyu Zheng ◽

Carl Kingsford

Keyword(s):

Transcriptome Assembly ◽

Generation Model ◽

Rna Seq ◽

New Approach ◽

Transcript Quantification ◽

Splice Graph ◽

Quantification Model ◽

Splice Junctions ◽

Model Graph ◽

Expression Quantification

Abstract Background The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447–55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. Results We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. Conclusion We propose a new approach for graph quantification, which is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.

Download Full-text

S-IRFindeR: stable and accurate measurement of intron retention

10.1101/2020.06.25.164699 ◽

2020 ◽

Author(s):

Lucile Broseus ◽

William Ritchie

Keyword(s):

Intron Retention ◽

Rna Seq ◽

Sequencing Data ◽

New Approach ◽

Retention Ratio ◽

Link Type ◽

Long Read ◽

Retained Introns ◽

Accurate Quantification ◽

Novel Algorithm

AbstractAccurate quantification of intron retention levels is currently the crux for detecting and interpreting the function of retained introns. Using both simulated and real RNA-seq datasets, we show that current methods suffer from several biases and artefacts, which impair the analysis of intron retention. We designed a new approach to measure intron retention levels called the Stable Intron Retention ratio that we have implemented in a novel algorithm to detect and measure intron retention called S-IRFindeR. We demonstrate that it provides a significant improvement in accuracy, higher consistency between replicates and agreement with IR-levels computed from long-read sequencing data.S-IRFindeR is freely available at: https://github.com/lbroseus/SIRFindeR/.

Download Full-text

Faculty Opinions recommendation of Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13296969.14657090 ◽

2011 ◽

Author(s):

Steven Salzberg ◽

Michael Schatz

Keyword(s):

Reference Genome ◽

Transcriptome Assembly ◽

Full Length ◽

Rna Seq

Download Full-text

Heuristic rank selection with progressively searching tensor ring network

Complex & Intelligent Systems ◽

10.1007/s40747-021-00308-x ◽

2021 ◽

Author(s):

Nannan Li ◽

Yu Pan ◽

Yaran Chen ◽

Zixiang Ding ◽

Dongbin Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Compression Ratio ◽

State Of The Art ◽

Heuristic Method ◽

Ring Networks ◽

Ring Network ◽

Narrow Region ◽

Network Search ◽

Deep Networks ◽

Evolutionary Phase

AbstractRecently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.

Download Full-text

A new approach based on graph matching and evolutionary approach for sport scheduling problem

Intelligent Decision Technologies ◽

10.3233/idt-190114 ◽

2020 ◽

pp. 1-16

Author(s):

Meriem Khelifa ◽

Dalila Boughaci ◽

Esma Aïmeur

Keyword(s):

Graph Matching ◽

State Of The Art ◽

Travel Cost ◽

Round Robin ◽

New Approach ◽

Traveling Tournament Problem ◽

Significant Interest ◽

National League ◽

Better Than

The Traveling Tournament Problem (TTP) is concerned with finding a double round-robin tournament schedule that minimizes the total distances traveled by the teams. It has attracted significant interest recently since a favorable TTP schedule can result in significant savings for the league. This paper proposes an original evolutionary algorithm for TTP. We first propose a quick and effective constructive algorithm to construct a Double Round Robin Tournament (DRRT) schedule with low travel cost. We then describe an enhanced genetic algorithm with a new crossover operator to improve the travel cost of the generated schedules. A new heuristic for ordering efficiently the scheduled rounds is also proposed. The latter leads to significant enhancement in the quality of the schedules. The overall method is evaluated on publicly available standard benchmarks and compared with other techniques for TTP and UTTP (Unconstrained Traveling Tournament Problem). The computational experiment shows that the proposed approach could build very good solutions comparable to other state-of-the-art approaches or better than the current best solutions on UTTP. Further, our method provides new valuable solutions to some unsolved UTTP instances and outperforms prior methods for all US National League (NL) instances.

Download Full-text

Bin2vec: learning representations of binary executable programs for security tasks

Cybersecurity ◽

10.1186/s42400-021-00088-4 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Shushan Arakelyan ◽

Sima Arasteh ◽

Christophe Hauser ◽

Erik Kline ◽

Aram Galstyan

Keyword(s):

Program Analysis ◽

State Of The Art ◽

Classification Error ◽

New Approach ◽

Convolutional Networks ◽

Computational Program ◽

Functional Algorithm ◽

Binary Program ◽

Vulnerability Discovery ◽

Executable Programs

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

Download Full-text

The state of the art in soybean transcriptomics resources and gene coexpression networks

in silico Plants ◽

10.1093/insilicoplants/diab005 ◽

2021 ◽

Author(s):

Fabricio Almeida-Silva ◽

Kanhu C Moharana ◽

Thiago M Venancio

Keyword(s):

State Of The Art ◽

The State ◽

Gene Coexpression Network ◽

Rna Seq ◽

Transcriptomic Data ◽

The Past ◽

Gene Coexpression ◽

Genomics Research ◽

Public Repositories ◽

Coexpression Networks

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.

Download Full-text

Target Localization via Integrated and Segregated Ranging Based on RSS and TOA Measurements

Sensors ◽

10.3390/s19020230 ◽

2019 ◽

Vol 19 (2) ◽

pp. 230 ◽

Cited By ~ 5

Author(s):

Slavisa Tomic ◽

Marko Beko

Keyword(s):

State Of The Art ◽

Hybrid Approach ◽

Critical Distance ◽

Heuristic Approach ◽

Target Localization ◽

Line Of Sight ◽

Time Of Arrival ◽

Individual Measurement ◽

New Approach ◽

Non Line Of Sight

This work addresses the problem of target localization in adverse non-line-of-sight (NLOS) environments by using received signal strength (RSS) and time of arrival (TOA) measurements. It is inspired by a recently published work in which authors discuss about a critical distance below and above which employing combined RSS-TOA measurements is inferior to employing RSS-only and TOA-only measurements, respectively. Here, we revise state-of-the-art estimators for the considered target localization problem and study their performance against their counterparts that employ each individual measurement exclusively. It is shown that the hybrid approach is not the best one by default. Thus, we propose a simple heuristic approach to choose the best measurement for each link, and we show that it can enhance the performance of an estimator. The new approach implicitly relies on the concept of the critical distance, but does not assume certain link parameters as given. Our simulations corroborate with findings available in the literature for line-of-sight (LOS) to a certain extent, but they indicate that more work is required for NLOS environments. Moreover, they show that the heuristic approach works well, matching or even improving the performance of the best fixed choice in all considered scenarios.

Download Full-text

Accurate transcriptome assembly by Nanopore RNA‐seq reveals novel functional transcripts in hepatocellular carcinoma

Cancer Science ◽

10.1111/cas.15058 ◽

2021 ◽

Author(s):

Yuanchang Fang ◽

Geng Chen ◽

Feng Chen ◽

En Hu ◽

Xiuqing Dong ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Transcriptome Assembly ◽

Rna Seq

Download Full-text