scholarly journals Fastq-pair: efficient synchronization of paired-end fastq files

2019 ◽  
Author(s):  
John A. Edwards ◽  
Robert A. Edwards

AbstractPaired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files.Fastq-pair is freely available from https://github.com/linsalrob/fastq-pair and is released under the MIT license.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Heleen Plaisier ◽  
Thomas R. Meagher ◽  
Daniel Barker

Abstract Objective Visualisation methods, primarily color-coded representation of sequence data, have been a predominant means of representation of DNA data. Algorithmic conversion of DNA sequence data to sound—sonification—represents an alternative means of representation that uses a different range of human sensory perception. We propose that sonification has value for public engagement with DNA sequence information because it has potential to be entertaining as well as informative. We conduct preliminary work to explore the potential of DNA sequence sonification in public engagement with bioinformatics. We apply a simple sonification technique for DNA, in which each DNA base is represented by a specific note. Additionally, a beat may be added to indicate codon boundaries or for musical effect. We report a brief analysis from public engagement events we conducted that featured this method of sonification. Results We report on use of DNA sequence sonification at two public events. Sonification has potential in public engagement with bioinformatics, both as a means of data representation and as a means to attract audience to a drop-in stand. We also discuss further directions for research on integration of sonification into bioinformatics public engagement and education.


Zootaxa ◽  
2020 ◽  
Vol 4766 (3) ◽  
pp. 472-484
Author(s):  
HANNAH E. SOM ◽  
L. LEE GRISMER ◽  
PERRY L. JR. WOOD ◽  
EVAN S. H. QUAH ◽  
RAFE M. BROWN ◽  
...  

Liopeltis is a genus of poorly known, infrequently sampled species of colubrid snakes in tropical Asia. We collected a specimen of Liopeltis from Pulau Tioman, Peninsular Malaysia, that superficially resembled L. philippina, a rare species that is endemic to the Palawan Pleistocene Aggregate Island Complex, western Philippines. We analyzed morphological and mitochondrial DNA sequence data from the Pulau Tioman specimen and found distinct differences to L. philippina and all other congeners. On the basis of these corroborated lines of evidence, the Pulau Tioman specimen is described as a new species, L. tiomanica sp. nov. The new species occurs in sympatry with L. tricolor on Pulau Tioman, and our description of L. tiomanica sp. nov. brings the number of endemic amphibians and reptiles on Pulau Tioman to 12. 


2007 ◽  
Vol 3 ◽  
pp. 193-197 ◽  
Author(s):  
Kou Amano ◽  
Hiroaki Ichikawa ◽  
Hidemitsu Nakamura ◽  
Hisataka Numa ◽  
Kaoru Fukami-Kobayashi ◽  
...  

Genome ◽  
1998 ◽  
Vol 41 (2) ◽  
pp. 148-153 ◽  
Author(s):  
Monique Abadon ◽  
Eric Grenier ◽  
Christian Laumond ◽  
Pierre Abad

An AluI satellite DNA family has been cloned from the entomopathogenic nematode Heterorhabditis indicus. This repeated sequence appears to be an unusually abundant satellite DNA, since it constitutes about 45% of the H. indicus genome. The consensus sequence is 174 nucleotides long and has an A + T content of 56%, with the presence of direct and inverted repeat clusters. DNA sequence data reveal that monomers are quite homogeneous. Such homogeneity suggests that some mechanism is acting to maintain the homogeneity of this satellite DNA, despite its abundance, or that this repeated sequence could have appeared recently in the genome of H. indicus. Hybridization analysis of genomic DNAs from different Heterorhabditis species shows that this satellite DNA sequence is specific to the H. indicus genome. Considering the species specificity and the high copy number of this AluI satellite DNA sequence, it could provide a rapid and powerful tool for identifying H. indicus strains.Key words: AluI repeated DNA, tandem repeats, species-specific sequence, nucleotide sequence analysis.


Sign in / Sign up

Export Citation Format

Share Document