scholarly journals RENANO: a REference-based compressor for NANOpore FASTQ files

2021 ◽  
Author(s):  
Guillermo Dufort y Alvarez ◽  
Gadiel Seroussi ◽  
Pablo Smircich ◽  
Jose Roberto Sotelo ◽  
Idoia Ochoa ◽  
...  

Nanopore sequencing technologies are rapidly gaining popularity, in part, due to the massive amounts of genomic data they produce in short periods of time (up to 8.5 TB of data in less than 72 hs). In order to reduce the costs of transmission and storage, efficient compression methods for this type of data are needed. Unlike short-read technologies, nanopore sequencing generates long noisy reads of variable length. In this note we introduce RENANO, a reference-based lossless FASTQ data compressor, specifically tailored to compress FASTQ files generated with nanopore sequencing technologies. RENANO builds on the recent compressor ENANO, which is currently state of the art. It focuses on improving the compression of the base call sequence portion of the FASTQ file, leaving the other parts of ENANO intact. Two novel reference-based compression algorithms are introduced, contemplating different scenarios: in the first scenario, a reference genome is available without cost to both the compressor and the decompressor; in the second, the reference genome is available only on the compressor side, and a compacted version of the reference is transmitted to the decompressor as part of the compressed file. To evaluate the proposed algorithms, we compare RENANO against ENANO on several publicly available nanopore datasets. In the first scenario considered, RENANO improves the base call sequences compression of ENANO by 40.8%, on average, over all the datasets. As for total compression (including the other parts of the FASTQ file), the average improvement is 13.1%. In the second scenario considered, the base call compression improvements of RENANO over ENANO range from 15.2% to 49.0%, depending on the coverage of the compressed dataset, while in terms of total size, the improvements range from 5.1% to 16.5%.

2020 ◽  
Vol 36 (16) ◽  
pp. 4506-4507 ◽  
Author(s):  
Guillermo Dufort y Álvarez ◽  
Gadiel Seroussi ◽  
Pablo Smircich ◽  
José Sotelo ◽  
Idoia Ochoa ◽  
...  

Abstract Motivation The amount of genomic data generated globally is seeing explosive growth, leading to increasing needs for processing, storage and transmission resources, which motivates the development of efficient compression tools for these data. Work so far has focused mainly on the compression of data generated by short-read technologies. However, nanopore sequencing technologies are rapidly gaining popularity due to the advantages offered by the large increase in the average size of the produced reads, the reduction in their cost and the portability of the sequencing technology. We present ENANO (Encoder for NANOpore), a novel lossless compression algorithm especially designed for nanopore sequencing FASTQ files. Results The main focus of ENANO is on the compression of the quality scores, as they dominate the size of the compressed file. ENANO offers two modes, Maximum Compression and Fast (default), which trade-off compression efficiency and speed. We tested ENANO, the current state-of-the-art compressor SPRING and the general compressor pigz on several publicly available nanopore datasets. The results show that the proposed algorithm consistently achieves the best compression performance (in both modes) on every considered nanopore dataset, with an average improvement over pigz and SPRING of >24.7% and 6.3%, respectively. In addition, in terms of encoding and decoding speeds, ENANO is 2.9× and 1.7× times faster than SPRING, respectively, with memory consumption up to 0.2 GB. Availability and implementation ENANO is freely available for download at: https://github.com/guilledufort/EnanoFASTQ. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Author(s):  
Travis Gagie ◽  
Simon J Puglisi

The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.


2010 ◽  
Vol 6 (1) ◽  
pp. 36
Author(s):  
Silvana Dinaintang Harikedua

The objective of this study was to investigate the effect of ginger extract addition and refrigerate storage on sensory quality of Tuna through panelist’s perception. Panelists (n=30) evaluated samples for overall appearance and flavor attribute using hedonic scale 1–7. The sample which is more acceptable by panelists on flavor attributes having 3% gingers extract and storage for 3 days. The less acceptable sample on flavor attribute having 0% ginger extract and storage for 9 days. On the other hand, the sample which is more acceptable by panelists on overall appearance having 0% ginger extract without storage treatment. The less acceptable sample on overall appearance having 3% ginger extract and storage for 9 days.


Author(s):  
Maryam Hammami ◽  
Hatem Bellaaj

The Cloud storage is the most important issue today. This is due to a rapidly changing needs and a huge mass of varied and important data to back up. In this paper, we describe a work in progress and propose a flexible system architecture for data storage in the Cloud. This system is centered on the Data Manager module. This module provides various functions such as the dispersion of data in fragments, encryption and storage of fragments... etc. This architecture proves to be very relevant. It ensures consistency between different components. On the other hand, it ensures the security and availability of data.


1979 ◽  
Vol 44 (6) ◽  
pp. 1828-1834
Author(s):  
Asja Šiševa ◽  
Jiřina Slaninová ◽  
Tomislav Barth ◽  
Stephan P. Ditzov ◽  
Luben M. Sirakov

Isoelectric focusing on polyacrylamide gel columns of three native crystalline commercial preparations of insulin and 125I-labelled insulin was carried out. All the compounds studied contained three components of different isoelectric points. The largest fraction, having pI 5.60 ± 0.05, was common to all preparations. The other two fractions were situated in the acid region of pH between pI 4.5 and 5.2. The presence of these fractions is explained by the contamination of crystalline insulins by proinsulin and by the formation of des-amido derivatives during the dissolving and storage of insulin samples, and, in case of labelled insulin, also by the presence of heavily iodinated insulin and contaminating components. The isoelectric focusing of the complex 125I-insulin-antibody showed a peak of radioactivity having pI 6.15 ± 0.05.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2016 ◽  
Vol 819 ◽  
pp. 202-206
Author(s):  
Reza Maziar ◽  
Kasni Sumeru ◽  
M.Y. Senawi ◽  
Farid Nasir Ani

In this study, two experiments were performed, one with the conventional compression refrigeration cycle (CRC) and the other with an ejector refrigeration cycle (ERC). The CRC system for automotive air conditioning was designed, fabricated and experiments were conducted. The system was then retrofitted with an ejector as the expansion device and experiments were repeated for the ERC system. Calculations of the entrainment ratio, compressor compression ratio and coefficient of performance (COP) were made for each cycle. The calculations showed that ERC has some advantages over the CRC. In this study, an average improvement of 5% in COP has been obtained for the ERC compared with the CRC.


1997 ◽  
Vol 60 (9) ◽  
pp. 1029-1033 ◽  
Author(s):  
NORMA S. LÁZARO ◽  
ANITA TIBANA ◽  
ERNESTO HOFER

Tonsils and inguinal, mesenteric, and prescapular lymph node samples collected from 115 swine carcasses from two abattoirs and a family-run operation in the State of Rio de Janeiro, Brazil, were cultured for the presence of Salmonella species. Salmonella spp. were detected in 40 (34.8%) of the swine samples with the following distribution; tonsils (31/40, 77.5%), mesenteric lymph nodes (16/40, 40.0%), inguinal lymph nodes, (9/40, 22.5%), and prescapular lymph nodes (7/40, 17.5%), Scalding tank water and environmental swabs collected from the abattoirs were also analyzed. Salmonella spp. were recovered from 13 of 51 (22.5%) of the environmental samples from one of the two abattoirs, none from those from the other abattoir. Salmonella spp. were recovered from the evisceration tables (5/11, 45.5%), the killing room (3/10, 30.0%), the holding pen (2/10, 20.0%), the butchering saw (2/10, 20.0%), and the scalding tank (1/10, 10.0%). The most frequently detected serovar was Salmonella Muenster. The results show the necessity of adopting more effective hygienic measures in the abattoirs as well as in the areas where swine are raised in order to reduce the role of abattoirs and storage facilities in the spread of Salmonella contamination.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Krisztian Buza ◽  
Bartek Wilczynski ◽  
Norbert Dojer

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used.Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge.Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.


2019 ◽  
Author(s):  
Teresa L. Street ◽  
Leanne Barker ◽  
Nicholas D. Sanderson ◽  
James Kavanagh ◽  
Sarah Hoosdally ◽  
...  

AbstractBackgroundEmpirical gonorrhoea treatment at initial diagnosis reduces onward transmission. However, increasing resistance to multiple antibiotics may necessitate waiting for culture-based diagnostics to select an effective treatment. There is a need for same-day culture-free diagnostics that identify infection and detect antimicrobial resistance.MethodsWe investigated if Nanopore sequencing can detect sufficient N. gonorrhoeae DNA to reconstruct whole genomes directly from urine samples. We used N. gonorrhoeae spiked urine samples and samples from gonorrhoea infections to determine optimal DNA extraction methods that maximize the amount of N. gonorrhoeae DNA sequenced whilst minimizing contaminating host DNA.ResultsIn simulated infections the Qiagen UCP Pathogen Mini kit provided the highest ratio N. gonorrhoeae to human DNA and the most consistent results. Depletion of human DNA with saponin increased N. gonorrhoeae yields in simulated infections, but decreased yields in clinical samples. In ten urine samples from men with symptomatic urethral gonorrhoea, ≥87% coverage of an N. gonorrhoeae reference genome was achieved in all samples, with ≥92% coverage breath at ≥10-fold depth in 7 (70%) samples. In simulated infections if ≥104 CFU/ml of N. gonorrhoeae was present, sequencing of the large majority of the genome was frequently achieved. N. gonorrhoeae could also be detected from urine in cobas PCR Media tubes and from urethral swabs, and in the presence of simulated Chlamydia co-infection.ConclusionUsing Nanopore sequencing of urine samples from men with urethral gonorrhoea sufficient data can be obtained to reconstruct whole genomes in the majority of samples without the need for culture.


Sign in / Sign up

Export Citation Format

Share Document