scholarly journals ABRIDGE: An ultra-compression software for SAM alignment files

2022 ◽  
Author(s):  
Sagnik Banerjee ◽  
Carson Andorf

Advancement in technology has enabled sequencing machines to produce vast amounts of genetic data, causing an increase in storage demands. Most genomic software utilizes read alignments for several purposes including transcriptome assembly and gene count estimation. Herein we present, ABRIDGE, a state-of-the-art compressor for SAM alignment files offering users both lossless and lossy compression options. This reference-based file compressor achieves the best compression ratio among all compression software ensuring lower space demand and faster file transmission. Central to the software is a novel algorithm that retains non-redundant information. This new approach has allowed ABRIDGE to achieve a compression 16% higher than the second-best compressor for RNA-Seq reads and over 35% for DNA-Seq reads. ABRIDGE also offers users the option to randomly access location without having to decompress the entire file. ABRIDGE is distributed under MIT license and can be obtained from GitHub and docker hub. We anticipate that the user community will adopt ABRIDGE within their existing pipeline encouraging further research in this domain.

2017 ◽  
Author(s):  
Luca Venturini ◽  
Shabhonam Caim ◽  
Gemy G Kaithakottil ◽  
Daniel L Mapleson ◽  
David Swarbreck

AbstractThe performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Cong Ma ◽  
Hongyu Zheng ◽  
Carl Kingsford

Abstract Background The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447–55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. Results We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. Conclusion We propose a new approach for graph quantification, which is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.


2020 ◽  
Author(s):  
Lucile Broseus ◽  
William Ritchie

AbstractAccurate quantification of intron retention levels is currently the crux for detecting and interpreting the function of retained introns. Using both simulated and real RNA-seq datasets, we show that current methods suffer from several biases and artefacts, which impair the analysis of intron retention. We designed a new approach to measure intron retention levels called the Stable Intron Retention ratio that we have implemented in a novel algorithm to detect and measure intron retention called S-IRFindeR. We demonstrate that it provides a significant improvement in accuracy, higher consistency between replicates and agreement with IR-levels computed from long-read sequencing data.S-IRFindeR is freely available at: https://github.com/lbroseus/SIRFindeR/.


Author(s):  
Nannan Li ◽  
Yu Pan ◽  
Yaran Chen ◽  
Zixiang Ding ◽  
Dongbin Zhao ◽  
...  

AbstractRecently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.


2020 ◽  
pp. 1-16
Author(s):  
Meriem Khelifa ◽  
Dalila Boughaci ◽  
Esma Aïmeur

The Traveling Tournament Problem (TTP) is concerned with finding a double round-robin tournament schedule that minimizes the total distances traveled by the teams. It has attracted significant interest recently since a favorable TTP schedule can result in significant savings for the league. This paper proposes an original evolutionary algorithm for TTP. We first propose a quick and effective constructive algorithm to construct a Double Round Robin Tournament (DRRT) schedule with low travel cost. We then describe an enhanced genetic algorithm with a new crossover operator to improve the travel cost of the generated schedules. A new heuristic for ordering efficiently the scheduled rounds is also proposed. The latter leads to significant enhancement in the quality of the schedules. The overall method is evaluated on publicly available standard benchmarks and compared with other techniques for TTP and UTTP (Unconstrained Traveling Tournament Problem). The computational experiment shows that the proposed approach could build very good solutions comparable to other state-of-the-art approaches or better than the current best solutions on UTTP. Further, our method provides new valuable solutions to some unsolved UTTP instances and outperforms prior methods for all US National League (NL) instances.


Cybersecurity ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Shushan Arakelyan ◽  
Sima Arasteh ◽  
Christophe Hauser ◽  
Erik Kline ◽  
Aram Galstyan

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 230 ◽  
Author(s):  
Slavisa Tomic ◽  
Marko Beko

This work addresses the problem of target localization in adverse non-line-of-sight (NLOS) environments by using received signal strength (RSS) and time of arrival (TOA) measurements. It is inspired by a recently published work in which authors discuss about a critical distance below and above which employing combined RSS-TOA measurements is inferior to employing RSS-only and TOA-only measurements, respectively. Here, we revise state-of-the-art estimators for the considered target localization problem and study their performance against their counterparts that employ each individual measurement exclusively. It is shown that the hybrid approach is not the best one by default. Thus, we propose a simple heuristic approach to choose the best measurement for each link, and we show that it can enhance the performance of an estimator. The new approach implicitly relies on the concept of the critical distance, but does not assume certain link parameters as given. Our simulations corroborate with findings available in the literature for line-of-sight (LOS) to a certain extent, but they indicate that more work is required for NLOS environments. Moreover, they show that the heuristic approach works well, matching or even improving the performance of the best fixed choice in all considered scenarios.


2021 ◽  
Author(s):  
Yuanchang Fang ◽  
Geng Chen ◽  
Feng Chen ◽  
En Hu ◽  
Xiuqing Dong ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document