scholarly journals SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files

2021 ◽  
Vol 8 (5) ◽  
pp. 59
Author(s):  
Andrea Telatin ◽  
Piero Fariselli ◽  
Giovanni Birolo

Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.

2021 ◽  
Vol 11 (6) ◽  
pp. 454
Author(s):  
Adekunle Adekile ◽  
Nagihan Akbulut-Jeradi ◽  
Rasha Al Khaldi ◽  
Maria Jinky Fernandez ◽  
Jalaja Sukumaran

Hemoglobin genotype and HBB haplotype are established genetic factors that modify the clinical phenotype in sickle cell disease (SCD). Current methods of establishing these two factors are cumbersome and/or prone to errors. The throughput capability of next generation sequencing (NGS) makes it ideal for simultaneous interrogation of the many genes of interest in SCD. This study was designed to confirm the diagnosis in patients with HbSS and Sβ-thalassemia, identify any ß-thal mutations and simultaneously determine the ßS HBB haplotype. Illumina Ampliseq custom DNA panel was used to genotype the DNA samples. Haplotyping was based on the alleles on five haplotype-specific SNPs. The patients studied included 159 HbSS patients and 68 Sβ-thal patients, previously diagnosed using high performance liquid chromatography (HPLC). There was considerable discordance between HPLC and NGS results, giving a false +ve rate of 20.5% with a sensitivity of 79% for the identification of Sβthal. Arab/India haplotype was found in 81.5% of βS chromosomes, while the two most common, of the 13 β-thal mutations detected, were IVS-1 del25 and IVS-II-1 (G>A). NGS is very versatile and can be deployed to simultaneously screen multiple gene loci for modifying polymorphisms, to afford personalized, evidence-based counselling and early intervention.


2021 ◽  
Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats is an essential but difficult task in biological data analysis. The easyfm ( easy f ile m anipulation) toolkit ( https://github.com/TaekAndBrendan/easyfm ) makes manipulating commonly used NGS files more accessible to biologists. It enables them to perform end-to-end reproducible data analyses using a free standalone desktop application (available on Windows, Mac and Linux). Unlike existing tools (e.g. Galaxy), the Graphical User Interface (GUI)-based easyfm is not dependent on any high-performance computing (HPC) system and can be operated without an internet connection. This specific benefit allow easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


2021 ◽  
Vol 512 ◽  
pp. 40-48
Author(s):  
Linnea Pettersson ◽  
Francesco Vezzi ◽  
Sofie Vonlanthen ◽  
Karin Alwegren ◽  
Anders Hedrum ◽  
...  

2019 ◽  
Vol 28 (2) ◽  
pp. 202-212 ◽  
Author(s):  
Maria Weronika Gutowska-Ding ◽  
Zandra C. Deans ◽  
Christophe Roos ◽  
Jukka Matilainen ◽  
Farrah Khawaja ◽  
...  

Abstract Next-generation sequencing (NGS) is replacing other molecular techniques to become the de facto gene diagnostics approach, transforming the speed of diagnosis for patients and expanding opportunities for precision medicine. Consequently, for accredited laboratories as well as those seeking accreditation, both objective measures of quality and external review of laboratory processes are required. External quality assessment (EQA), or Proficiency Testing (PT), can assess a laboratory’s service through an independent external agency, the EQA provider. The analysis of a growing number of genes and whole exome and genomes is now routine; therefore, an EQA must be delivered to enable all testing laboratories to participate. In this paper, we describe the development of a unique platform and gene target independent EQA scheme for NGS, designed to scale from current to future requirements of clinical diagnostic laboratories testing for germline and somatic variants. The EQA results from three annual rounds indicate that clinical diagnostic laboratories are providing an increasingly high-quality NGS service and variant calling abilities are improving. From an EQA provider perspective, challenges remain regarding delivery and performance criteria, as well as in analysing similar NGS approaches between cohorts with meaningful metrics, sample sourcing and data formats.


Author(s):  
Altuğ Koç ◽  
Elçin Bora ◽  
Tayfun Cinleti ◽  
Gizem Yıldız ◽  
Meral Torun Bayram ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document