scholarly journals GENERATIONS OF DNA SEQUENCING METHODS (REVIEW)

2020 ◽  
Vol 30 (4) ◽  
pp. 3-20
Author(s):  
A. G. Borodinov ◽  
◽  
V. V. Manoilov ◽  
I. V. Zarutsky ◽  
A. I. Petrov ◽  
...  

Several decades have passed since the development of the revolutionary DNA sequencing method by Frederick Sanger and his colleagues. After the Human Genome Project, the time interval between sequencing technologies began to shrink, while the volume of scientific knowledge continued to grow exponentially. Following Sanger sequencing, considered as the first generation, new generations of DNA sequencing were consistently introduced into practice. Advances in next generation sequencing (NGS) technologies have contributed significantly to this trend by reducing costs and generating massive sequencing data. To date, there are three generations of sequencing technologies. Second generation se-quencing, which is currently the most commonly used NGS technology, consists of library preparation, amplification and sequencing steps, while in third generation sequencing, individual nucleic acids are sequenced directly to avoid bias and have higher throughput. The development of new generations of sequencing has made it possible to overcome the limitations of traditional DNA sequencing methods and has found application in a wide range of projects in molecular biology. On the other hand, with the development of next generation technologies, many technical problems arise that need to be deeply analyzed and solved. Each generation and sequencing platform, due to its methodological approach, has specific advantages and disadvantages that determine suitability for certain applications. Thus, the assessment of these characteristics, limitations and potential applications helps to shape the directions for further research on sequencing technologies.

PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1419 ◽  
Author(s):  
Jose E. Kroll ◽  
Jihoon Kim ◽  
Lucila Ohno-Machado ◽  
Sandro J. de Souza

Motivation.Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many different samples need to be compared. Some popular tools for the analysis of ASEs are known to report thousands of events without annotations and/or graphical representations. A new tool for the identification and visualization of ASEs is here described, which can be used by biologists without a solid bioinformatics background.Results.A software suite namedSplicing Expresswas created to perform ASEs analysis from transcriptome sequencing data derived from next-generation DNA sequencing platforms. Its major goal is to serve the needs of biomedical researchers who do not have bioinformatics skills.Splicing Expressperforms automatic annotation of transcriptome data (GTF files) using gene coordinates available from the UCSC genome browser and allows the analysis of data from all available species. The identification of ASEs is done by a known algorithm previously implemented in another tool namedSplooce. As a final result,Splicing Expresscreates a set of HTML files composed of graphics and tables designed to describe the expression profile of ASEs among all analyzed samples. By using RNA-Seq data from the Illumina Human Body Map and the Rat Body Map, we show thatSplicing Expressis able to perform all tasks in a straightforward way, identifying well-known specific events.Availability and Implementation.Splicing Expressis written in Perl and is suitable to run only in UNIX-like systems. More details can be found at:http://www.bioinformatics-brazil.org/splicingexpress.


2018 ◽  
Vol 56 (7) ◽  
pp. 1046-1053 ◽  
Author(s):  
Anne Bergougnoux ◽  
Valeria D’Argenio ◽  
Stefanie Sollfrank ◽  
Fanny Verneau ◽  
Antonella Telese ◽  
...  

Abstract Background: Many European laboratories offer molecular genetic analysis of the CFTR gene using a wide range of methods to identify mutations causative of cystic fibrosis (CF) and CFTR-related disorders (CFTR-RDs). Next-generation sequencing (NGS) strategies are widely used in diagnostic practice, and CE marking is now required for most in vitro diagnostic (IVD) tests in Europe. The aim of this multicenter study, which involved three European laboratories specialized in CF molecular analysis, was to evaluate the performance of Multiplicom’s CFTR MASTR Dx kit to obtain CE-IVD certification. Methods: A total of 164 samples, previously analyzed with well-established “reference” methods for the molecular diagnosis of the CFTR gene, were selected and re-sequenced using the Illumina MiSeq benchtop NGS platform. Sequencing data were analyzed using two different bioinformatic pipelines. Annotated variants were then compared to the previously obtained reference data. Results and conclusions: The analytical sensitivity, specificity and accuracy rates of the Multiplicom CFTR MASTR assay exceeded 99%. Because different types of CFTR mutations can be detected in a single workflow, the CFTR MASTR assay simplifies the overall process and is consequently well suited for routine diagnostics.


2017 ◽  
Author(s):  
Merly Escalona ◽  
Sara Rocha ◽  
David Posada

AbstractMotivationAdvances in sequencing technologies have made it feasible to obtain massive datasets for phylogenomic inference, often consisting of large numbers of loci from multiple species and individuals. The phylogenomic analysis of next-generation sequencing (NGS) data implies a complex computational pipeline where multiple technical and methodological decisions are necessary that can influence the final tree obtained, like those related to coverage, assembly, mapping, variant calling and/or phasing.ResultsTo assess the influence of these variables we introduce NGSphy, an open-source tool for the simulation of Illumina reads/read counts obtained from haploid/diploid individual genomes with thousands of independent gene families evolving under a common species tree. In order to resemble real NGS experiments, NGSphy includes multiple options to model sequencing coverage (depth) heterogeneity across species, individuals and loci, including off-target or uncaptured loci. For comprehensive simulations covering multiple evolutionary scenarios, parameter values for the different replicates can be sampled from user-defined statistical distributions.AvailabilitySource code, full documentation and tutorials including a quick start guide are available at http://github.com/merlyescalona/[email protected]. [email protected]


2011 ◽  
Vol 152 (2) ◽  
pp. 55-62 ◽  
Author(s):  
Zsuzsanna Mihály ◽  
Balázs Győrffy

In the past ten years the development of next generation sequencing technologies brought a new era in the field of quick and efficient DNA sequencing. In our study we give an overview of the methodological achievements from Sanger’s chain-termination sequencing in 1975 to those allowing real-time DNA sequencing today. Sequencing methods that utilize clonal amplicons for parallel multistrand sequencing comprise the basics of currently available next generation sequencing techniques. Nowadays next generation sequencing is mainly used for basic research in functional genomics, providing quintessential information in the meta-analyses of data from signal transduction pathways, onthologies, proteomics and metabolomics. Although next generation sequencing is yet sparsely used in clinical practice, cardiology, oncology and epidemiology already show an immense need for the additional knowledge obtained by this new technology. The main barrier of its spread is the lack of standardization of analysis evaluation methods, which obscure objective assessment of the results. Orv. Hetil., 2011, 152, 55–62.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 505
Author(s):  
Manfred Grabherr ◽  
Bozena Kaminska ◽  
Jan Komorowski

The massive increase in computational power over the recent years and wider applicationsof machine learning methods, coincidental or not, were paralleled by remarkable advances inhigh-throughput DNA sequencing technologies.[...]


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Ludwig Mann ◽  
Kathrin M. Seibt ◽  
Beatrice Weber ◽  
Tony Heitkam

Abstract Background Extrachromosomal circular DNAs (eccDNAs) are ring-like DNA structures physically separated from the chromosomes with 100 bp to several megabasepairs in size. Apart from carrying tandemly repeated DNA, eccDNAs may also harbor extra copies of genes or recently activated transposable elements. As eccDNAs occur in all eukaryotes investigated so far and likely play roles in stress, cancer, and aging, they have been prime targets in recent research—with their investigation limited by the scarcity of computational tools. Results Here, we present the ECCsplorer, a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing techniques. Following Illumina-sequencing of amplified circular DNA (circSeq), the ECCsplorer enables an easy and automated discovery of eccDNA candidates. The data analysis encompasses two major procedures: first, read mapping to the reference genome allows the detection of informative read distributions including high coverage, discordant mapping, and split reads. Second, reference-free comparison of read clusters from amplified eccDNA against control sample data reveals specifically enriched DNA circles. Both software parts can be run separately or jointly, depending on the individual aim or data availability. To illustrate the wide applicability of our approach, we analyzed semi-artificial and published circSeq data from the model organisms Homo sapiens and Arabidopsis thaliana, and generated circSeq reads from the non-model crop plant Beta vulgaris. We clearly identified eccDNA candidates from all datasets, with and without reference genomes. The ECCsplorer pipeline specifically detected mitochondrial mini-circles and retrotransposon activation, showcasing the ECCsplorer’s sensitivity and specificity. Conclusion The ECCsplorer (available online at https://github.com/crimBubble/ECCsplorer) is a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing data. The derived eccDNA targets are valuable for a wide range of downstream investigations—from analysis of cancer-related eccDNAs over organelle genomics to identification of active transposable elements.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255257
Author(s):  
Daichi Sadato ◽  
Chizuko Hirama ◽  
Ai Kaiho-Soma ◽  
Ayaka Yamaguchi ◽  
Hiroko Kogure ◽  
...  

Gene abnormalities, including mutations and fusions, are important determinants in the molecular diagnosis of myeloid neoplasms. The use of bone marrow (BM) smears as a source of DNA and RNA for next-generation sequencing (NGS) enables molecular diagnosis to be done with small amounts of bone marrow and is especially useful for patients without stocked cells, DNA or RNA. The present study aimed to analyze the quality of DNA and RNA derived from smear samples and the utility of NGS for diagnosing myeloid neoplasms. Targeted DNA sequencing using paired BM cells and smears yielded sequencing data of adequate quality for variant calling. The detected variants were analyzed using the bioinformatics approach to detect mutations reliably and increase sensitivity. Noise deriving from variants with extremely low variant allele frequency (VAF) was detected in smear sample data and removed by filtering. Consequently, various driver gene mutations were detected across a wide range of allele frequencies in patients with myeloid neoplasms. Moreover, targeted RNA sequencing successfully detected fusion genes using smear-derived, very low-quality RNA, even in a patient with a normal karyotype. These findings demonstrated that smear samples can be used for clinical molecular diagnosis with adequate noise-reduction methods even if the DNA and RNA quality is inferior.


2015 ◽  
Vol 9 ◽  
pp. BBI.S12462 ◽  
Author(s):  
Anastasis Oulas ◽  
Christina Pavloudi ◽  
Paraskevi Polymenakou ◽  
Georgios A. Pavlopoulos ◽  
Nikolas Papanikolaou ◽  
...  

Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.


2021 ◽  
Author(s):  
Jochen Bathke ◽  
Gesine Lühken

Background Next generation sequencing technologies are opening new doors to researchers. One application is the direct discovery of sequence variants that are causative for a phenotypic trait or a disease. The detection of an organisms alterations from a reference genome is know as variant calling, a computational task involving a complex chain of software applications. One key player in the field is the Genome Analysis Toolkit (GATK). The GATK Best Practices are commonly referred recipe for variant calling on human sequencing data. Still the fact the Best Practices are highly specialized on human sequencing data and are permanently evolving is often ignored. Reproducibility is thereby aggravated, leading to continuous reinvention of pretended GATK Best Practice workflows. Results Here we present an automatized variant calling workflow, for the detection of SNPs and indels, that is broadly applicable for model as well as non-model diploid organisms. It is derived from the GATK Best Practice workflow for "Germline short variant discovery", without being focused on human sequencing data. The workflow has been highly optimized to achieve parallelized data evaluation and also maximize performance of individual applications to shorten overall analysis time. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller and GatherVcfs were determined by thorough benchmarking. In doing so, runtimes of an example data evaluation could be reduced from 67 h to less than 35 h. Conclusions The demand for standardized variant calling workflows is proportionally growing with the dropping costs of next generation sequencing methods. Our workflow perfectly fits into this niche, offering automatization, reproducibility and documentation of the variant calling process. Moreover resource usage is lowered to a minimum. Thereby variant calling projects should become more standardized, reducing the barrier further for smaller institutions or groups.


2019 ◽  
Vol 46 (5) ◽  
pp. 312-325 ◽  
Author(s):  
Steffen Klasberg ◽  
Vineeth Surendranath ◽  
Vinzenz Lange ◽  
Gerhard Schöfl

The advent of next generation sequencing (NGS) has altered the face of genotyping the human leukocyte antigen (HLA) system in clinical, stem cell donor registry, and research contexts. NGS has led to a dramatically increased sequencing throughput at high accuracy, while being more time and cost efficient than precursor technologies. This has led to a broader and deeper profiling of the key genes in the human immunogenetic make-up. The rapid evolution of sequencing technologies is evidenced by the development of varied short-read sequencing platforms with differing read lengths and sequencing capacities to long-read sequencing platforms capable of profiling full genes without fragmentation. Concomitantly, there has been development of a diverse set of computational analyses and software tools developed to deal with the various strengths and limitations of the sequencing data generated by the different sequencing platforms. This review surveys the different modalities involved in generating NGS HLA profiling sequence data. It systematically describes various computational approaches that have been developed to achieve HLA genotyping to different degrees of resolution. At each stage, this review enumerates the drawbacks and advantages of each of the platforms and analysis approaches, thus providing a comprehensive picture of the current state of HLA genotyping technologies.


Sign in / Sign up

Export Citation Format

Share Document