SeqDev: An Algorithm for Constructing Genetic Elements Using Comparative Assembly

With the availability of recent next generation sequencing technologies and their low cost, genomes of different organisms are being sequenced frequently. Therefore, quick assembly of genome, transcriptome, and target contigs from the raw data generated through the sequencing technologies has become necessary for better understanding of different biological systems. This article proposes an algorithm, namely SeqDev (Sequence Developer) for constructing contigs from raw reads using reference sequences. For this, we considered a weighted frequency?based consensus mechanism named BlastAssemb for primary construction of a sequence with gaps. Then, we adopted suffix array and proposed a gap filling search (GFS) algorithm for searching the missing sequences in the primary construct. For evaluating our algorithm, we have chosen Pokkali (rice) raw genome and Japonica (rice) as our reference data. Experimental results demonstrated that our proposed algorithm accurately constructs promoter sequences of Pokkali from its raw genome data. These constructed promoter sequences were 93 ? 100% identical with the reference and also aligned with 96 ? 100% of corresponding reference sequences with eValue ranging from 0.0 ? 2e-14. All these results indicated that our proposed method could be a potential algorithm to construct target contigs from raw sequences with the help of reference sequences. Further wet lab validation with specific Pokkali promoter sequence will boost this method as a robust algorithm for target contig assembly.Plant Tissue Cult. & Biotech. 26(1): 105-121, 2016 (June)

Download Full-text

Education in the genomics era: Generating high-quality genome assemblies in university courses

GigaScience ◽

10.1093/gigascience/giaa058 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 3

Author(s):

Stefan Prost ◽

Sven Winter ◽

Jordi De Raad ◽

Raphael T F Coimbra ◽

Magnus Wolf ◽

...

Keyword(s):

Low Cost ◽

Genomic Data ◽

Master's Level ◽

Genome Data ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

University Courses ◽

Hands On ◽

Genome Assemblies ◽

High Quality Genome

Abstract Recent advances in genome sequencing technologies have simplified the generation of genome data and reduced the costs for genome assemblies, even for complex genomes like those of vertebrates. More practically oriented genomic courses can prepare university students for the increasing importance of genomic data used in biological and medical research. Low-cost third-generation sequencing technology, along with publicly available data, can be used to teach students how to process genomic data, assemble full chromosome-level genomes, and publish the results in peer-reviewed journals, or preprint servers. Here we outline experiences gained from 2 master's-level courses and discuss practical considerations for teaching hands-on genome assembly courses.

Download Full-text

A strategy for complete telomere-to-telomere assembly of ciliate macronuclear genome using ultra-high coverage Nanopore data

10.1101/2020.01.08.898502 ◽

2020 ◽

Cited By ~ 1

Author(s):

Guangying Wang ◽

Xiaocui Chai ◽

Jing Zhang ◽

Wentao Yang ◽

Chuanqi Jiang ◽

...

Keyword(s):

High Coverage ◽

Genome Data ◽

Sequencing Technologies ◽

Simple Strategy ◽

Third Generation Sequencing ◽

Population Genomic ◽

Ciliate Species ◽

Genomic Studies ◽

High Quality Genome ◽

Generation Sequencing

ABSTRACTCiliates contain two kinds of nuclei: the germline micronucleus (MIC) and the somatic macronucleus (MAC) in a single cell. The MAC usually have fragmented chromosomes. These fragmented chromosomes, capped with telomeres at both ends, could be gene size to several megabases in length among different ciliate species. So far, no telomere-to-telomere assembly of entire MAC genome in ciliate species is finished. Development of the third generation sequencing technologies allows to generate sequencing reads up to megabases in length that could possibly span an entire MAC chromosome. Taking advantage of ultra-long Nanopore reads, we established a simple strategy for the complete assembly of ciliate MAC genomes. Using this strategy, we assembled the complete MAC genomes of two ciliate species Tetrahymena thermophila and Tetrahymena shanghaiensis, composed of 181 and 214 chromosomes telomere-to-telomere respectively. The established strategy as well as the high-quality genome data will provide a useful approach for ciliate genome assembly, and a valuable community resource for further biological, evolutionary and population genomic studies.

Download Full-text

A low-cost platform suitable for sequencing-based recovery of natural variation in understudied plants

BioTechniques ◽

10.2144/btn-2020-0132 ◽

2020 ◽

Author(s):

Rachel Howard-Till ◽

Claudia E Osorio ◽

Bradley J Till

Keyword(s):

Natural Variation ◽

Genetic Characterization ◽

Low Cost ◽

Cultivated Plants ◽

Southern Chile ◽

Documentation System ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Proof Of Principle

Genetic characterization of wild and cultivated plants provides valuable knowledge for conservation and agriculture. DNA sequencing technologies are improving, and costs are dropping. Yet analysis of many species is hindered because they grow in regions that lack infrastructure for advanced molecular biology. The authors developed and adapted low-cost methods that address these issues. Tissue was collected and stored in silica gel, avoiding the need for liquid nitrogen and freezers. The authors optimized low-cost, homemade DNA extraction to increase yields, reduce costs and produce DNA suitable for next-generation sequencing. The authors describe how to build a gel documentation system for DNA quantification. As a proof of principle, the authors used these methods to evaluate wild Berberis darwinii, native to Southern Chile.

Download Full-text

A low-cost platform suitable for sequencing-based recovery of natural variation in understudied plants

10.1101/2020.06.24.169276 ◽

2020 ◽

Author(s):

Rachel Howard-Till ◽

Claudia E. Osorio ◽

Bradley J. Till

Keyword(s):

Next Generation Sequencing ◽

Low Cost ◽

Genomic Analysis ◽

Dna Quantification ◽

Cultivated Plants ◽

Next Generation ◽

Documentation System ◽

Sequencing Technologies ◽

Do It Yourself ◽

Generation Sequencing

AbstractGenetic characterization of wild and cultivated plants provides valuable knowledge for conservation and agriculture. DNA sequencing technologies are improving and costs are dropping. Yet, analysis of many species is hindered because they grow in regions that lack infrastructure for advanced molecular biology. We developed and adapted low-cost methods that address these issues. Tissue is collected and stored in silica-gel, avoiding the need for liquid nitrogen and freezers. We have optimized low-cost home-made DNA extraction to increase yields, reduce costs, and produce DNA suitable for next generation sequencing. We also describe how to build a gel documentation system for DNA quantification. As a proof of principle, we use these methods to evaluate wild Berberis darwinii, native to Southern Chile.Method summaryWe describe a suite of low-cost do-it-yourself methods for field collection of plant tissues, extraction of genomic DNA suitable for next generation sequencing, and home-made agarose gel documentation suitable for DNA quantification. These methods enable the collection and preparation of samples for genomic analysis in regions with limited infrastructure.

Download Full-text

The Influence of Memory-Aware Computation on Distributed BLAST

Current Bioinformatics ◽

10.2174/1574893613666180601080811 ◽

2019 ◽

Vol 14 (2) ◽

pp. 157-163

Author(s):

Majid Hajibaba ◽

Mohsen Sharifi ◽

Saeid Gorgin

Keyword(s):

Search Time ◽

Genomic Research ◽

Local Alignment ◽

Negative Effects ◽

Sequencing Technologies ◽

Percent Improvement ◽

Fast Processing ◽

Search Tool ◽

Memory Awareness ◽

Generation Sequencing

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.

Download Full-text

Clinical Implications of Polymicrobial Synergism Effects on Antimicrobial Susceptibility

Pathogens ◽

10.3390/pathogens10020144 ◽

2021 ◽

Vol 10 (2) ◽

pp. 144

Author(s):

William Little ◽

Caroline Black ◽

Allie Clinton Smith

Keyword(s):

Antimicrobial Susceptibility ◽

Chronic Wounds ◽

Clinical Laboratory ◽

Patient Treatment ◽

Clinical Implications ◽

Clinical Environment ◽

Tolerance Mechanisms ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Polymicrobial Infections

With the development of next generation sequencing technologies in recent years, it has been demonstrated that many human infectious processes, including chronic wounds, cystic fibrosis, and otitis media, are associated with a polymicrobial burden. Research has also demonstrated that polymicrobial infections tend to be associated with treatment failure and worse patient prognoses. Despite the importance of the polymicrobial nature of many infection states, the current clinical standard for determining antimicrobial susceptibility in the clinical laboratory is exclusively performed on unimicrobial suspensions. There is a growing body of research demonstrating that microorganisms in a polymicrobial environment can synergize their activities associated with a variety of outcomes, including changes to their antimicrobial susceptibility through both resistance and tolerance mechanisms. This review highlights the current body of work describing polymicrobial synergism, both inter- and intra-kingdom, impacting antimicrobial susceptibility. Given the importance of polymicrobial synergism in the clinical environment, a new system of determining antimicrobial susceptibility from polymicrobial infections may significantly impact patient treatment and outcomes.

Download Full-text

Pancreatic cancer prognosis is predicted by an ATAC-array technology for assessing chromatin accessibility

Nature Communications ◽

10.1038/s41467-021-23237-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

S. Dhara ◽

S. Chhangawala ◽

H. Chintalapudi ◽

G. Askan ◽

V. Aveson ◽

...

Keyword(s):

Low Cost ◽

Disease Free Survival ◽

Chromatin Accessibility ◽

Cancer Prognosis ◽

Ductal Adenocarcinoma ◽

Binding Motifs ◽

Free Survival ◽

Array Technology ◽

Treatment Naïve ◽

Generation Sequencing

AbstractUnlike other malignancies, therapeutic options in pancreatic ductal adenocarcinoma (PDAC) are largely limited to cytotoxic chemotherapy without the benefit of molecular markers predicting response. Here we report tumor-cell-intrinsic chromatin accessibility patterns of treatment-naïve surgically resected PDAC tumors that were subsequently treated with (Gem)/Abraxane adjuvant chemotherapy. By ATAC-seq analyses of EpCAM+ PDAC malignant epithelial cells sorted from 54 freshly resected human tumors, we show here the discovery of a signature of 1092 chromatin loci displaying differential accessibility between patients with disease free survival (DFS) < 1 year and patients with DFS > 1 year. Analyzing transcription factor (TF) binding motifs within these loci, we identify two TFs (ZKSCAN1 and HNF1b) displaying differential nuclear localization between patients with short vs. long DFS. We further develop a chromatin accessibility microarray methodology termed “ATAC-array”, an easy-to-use platform obviating the time and cost of next generation sequencing. Applying this methodology to the original ATAC-seq libraries as well as independent libraries generated from patient-derived organoids, we validate ATAC-array technology in both the original ATAC-seq cohort as well as in an independent validation cohort. We conclude that PDAC prognosis can be predicted by ATAC-array, which represents a low-cost, clinically feasible technology for assessing chromatin accessibility profiles.

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Assessing the Value of Next-Generation Sequencing Technologies: An Introduction

Value in Health ◽

10.1016/j.jval.2018.06.012 ◽

2018 ◽

Vol 21 (9) ◽

pp. 1031-1032 ◽

Cited By ~ 3

Author(s):

Kathryn A. Phillips

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Sequencing Technologies ◽

Generation Sequencing

Download Full-text

An Optimized Method for the Preparation of Monascus purpureus DNA for Genome Sequencing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.563.379 ◽

2014 ◽

Vol 563 ◽

pp. 379-383 ◽

Cited By ~ 1

Author(s):

Yue Yang ◽

Xin Jun Du ◽

Ping Li ◽

Bin Liang ◽

Shuo Wang

Keyword(s):

Genome Sequencing ◽

Genomic Dna ◽

Benzyl Chloride ◽

Monascus Purpureus ◽

Sequencing Technologies ◽

Fungal Evolution ◽

Ctab Method ◽

Fungal Dna ◽

Gene Functional Analysis ◽

Generation Sequencing

More and more attention has been paid to filamentous fungal evolution, metabolic pathway and gene functional analysis via genome sequencing. However, the published methods for the extraction of fungal genomic DNA were usually costly or inefficient. In the present study, we compared five different DNA extraction protocols: CTAB protocol with some modifications, benzyl chloride protocol with some modifications, snailase protocol, SDS protocol and extraction with the E.Z.N.A. Fungal DNA Maxi Kit (Omega Bio-Tek, USA). The CTAB method which we established with some modification in several steps is not only economical and convenient, but also can be reliably used to obtain large amounts of highly pure genomic DNA fromMonascus purpureusfor sequencing with next-generation sequencing technologies (Illumina and 454) successfully.

Download Full-text