PhaseME: Automatic rapid assessment of phasing quality and phasing improvement

Sina Majidian; Fritz J Sedlazeck

doi:10.1093/gigascience/giaa078

PhaseME: Automatic rapid assessment of phasing quality and phasing improvement

GigaScience ◽

10.1093/gigascience/giaa078 ◽

2020 ◽

Vol 9 (7) ◽

Cited By ~ 1

Author(s):

Sina Majidian ◽

Fritz J Sedlazeck

Keyword(s):

Rapid Assessment ◽

Linkage Data ◽

Universal Method ◽

High Quality ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Linkage Information ◽

Quality Assessments

Abstract Background The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess. Findings Here we present PhaseME, a versatile method to provide insights into and improvement of sample phasing results based on linkage data. We showcase the performance and the importance of PhaseME by comparing phasing information obtained from Pacific Biosciences including both continuous long reads and high-quality consensus reads, Oxford Nanopore Technologies, 10x Genomics, and Illumina sequencing technologies. We found that 10x Genomics and Oxford Nanopore phasing can be significantly improved while retaining a high N50 and completeness of phase blocks. PhaseME generates reports and summary plots to provide insights into phasing performance and correctness. We observed unique phasing issues for each of the sequencing technologies, highlighting the necessity of quality assessments. PhaseME is able to decrease the Hamming error rate significantly by 22.4% on average across all 5 technologies. Additionally, a significant improvement is obtained in the reduction of long switch errors. Especially for high-quality consensus reads, the improvement is 54.6% in return for only a 5% decrease in phase block N50 length. Conclusions PhaseME is a universal method to assess the phasing quality and accuracy and improves the quality of phasing using linkage information. The package is freely available at https://github.com/smajidian/phaseme.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Long-read and chromosome-scale assembly of the hexaploid wheat genome achieves higher resolution for research and breeding

10.1101/2021.08.24.457458 ◽

2021 ◽

Author(s):

Jean-Marc Aury ◽

Stefan Engelen ◽

Benjamin Istace ◽

Cécile Monat ◽

Pauline Lasserre-Zuber ◽

...

Keyword(s):

Bread Wheat ◽

Rapid Evolution ◽

Wheat Genome ◽

High Quality ◽

Sequencing Technologies ◽

Repeat Content ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Reference Quality

AbstractThe sequencing of the wheat (Triticum aestivum) genome has been a methodological challenge for many years due to its large size (15.5 Gb), repeat content, and hexaploidy. Many initiatives aiming at obtaining a reference genome of cultivar Chinese Spring have been launched in the past years and it was achieved in 2018 as the result of a huge effort to combine short-read whole genome sequencing with many other resources. Reference-quality genome assemblies were then produced for other accessions but the rapid evolution of sequencing technologies offers opportunities to reach high-quality standards at lower cost. Here, we report on an optimized procedure based on long-reads produced on the ONT (Oxford Nanopore Technology) PromethION device to assemble the genome of the French bread wheat cultivar Renan. We provide the most contiguous and complete chromosome-scale assembly of a bread wheat genome to date, a resource that will be valuable for the crop community and will facilitate the rapid selection of agronomically important traits. We also provide the methodological standards to generate high-quality assemblies of complex genomes.

Download Full-text

Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master’s Course

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401205 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2179-2183 ◽

Cited By ~ 1

Author(s):

Stefan Prost ◽

Malte Petersen ◽

Martin Grethlein ◽

Sarah Joy Hahn ◽

Nina Kuschik-Maczollek ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Siamese Fighting Fish ◽

Betta Splendens ◽

High Quality ◽

Sequencing Platform ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Chromosome Level

Ever decreasing costs along with advances in sequencing and library preparation technologies enable even small research groups to generate chromosome-level assemblies today. Here we report the generation of an improved chromosome-level assembly for the Siamese fighting fish (Betta splendens) that was carried out during a practical university master’s course. The Siamese fighting fish is a popular aquarium fish and an emerging model species for research on aggressive behavior. We updated the current genome assembly by generating a new long-read nanopore-based assembly with subsequent scaffolding to chromosome-level using previously published Hi-C data. The use of ∼35x nanopore-based long-read data sequenced on a MinION platform (Oxford Nanopore Technologies) allowed us to generate a baseline assembly of only 1,276 contigs with a contig N50 of 2.1 Mbp, and a total length of 441 Mbp. Scaffolding using the Hi-C data resulted in 109 scaffolds with a scaffold N50 of 20.7 Mbp. More than 99% of the assembly is comprised in 21 scaffolds. The assembly showed the presence of 96.1% complete BUSCO genes from the Actinopterygii dataset indicating a high quality of the assembly. We present an improved full chromosome-level assembly of the Siamese fighting fish generated during a university master’s course. The use of ∼35× long-read nanopore data drastically improved the baseline assembly in terms of continuity. We show that relatively in-expensive high-throughput sequencing technologies such as the long-read MinION sequencing platform can be used in educational settings allowing the students to gain practical skills in modern genomics and generate high quality results that benefit downstream research projects.

Download Full-text

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads

Genes ◽

10.3390/genes10010044 ◽

2019 ◽

Vol 10 (1) ◽

pp. 44 ◽

Cited By ~ 1

Author(s):

Wenjing Zhang ◽

Neng Huang ◽

Jiantao Zheng ◽

Xingyu Liao ◽

Jianxin Wang ◽

...

Keyword(s):

Quality Evaluation ◽

Training Data ◽

Third Generation ◽

Contig Assembly ◽

High Quality ◽

Promising Alternative ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

BleTIES: Annotation of natural genome editing in ciliates using long read sequencing

10.1101/2021.05.18.444610 ◽

2021 ◽

Author(s):

Brandon K. B. Seah ◽

Estienne C. Swart

Keyword(s):

Dna Sequences ◽

Sequence Data ◽

Low Complexity ◽

Supplementary Information ◽

Neighboring Element ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Element Elimination

Ciliates are single-celled eukaryotes that eliminate specific, interspersed DNA sequences (internally eliminated sequences, IESs) from their genomes during development. These are challenging to annotate and assemble because IES-containing sequences are much less abundant in the cell than those without, and IES sequences themselves often contain repetitive and low-complexity sequences. Long read sequencing technologies from Pacific Biosciences and Oxford Nanopore have the potential to reconstruct longer IESs than has been possible with short reads, and also the ability to detect correlations of neighboring element elimination. Here we present BleTIES, a software toolkit for detecting, assembling, and analyzing IESs using mapped long reads. Availability and implementation: BleTIES is implemented in Python 3. Source code is available at https://github.com/Swart-lab/bleties (MIT license), and also distributed via Bioconda. Contact: [email protected] Supplementary information: Benchmarking of BleTIES with published sequence data.

Download Full-text

A universal method for high-quality RNA extraction from plant tissues rich in starch, proteins and fiber

Scientific Reports ◽

10.1038/s41598-020-73958-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Amaranatha R. Vennapusa ◽

Impa M. Somayanda ◽

Colleen J. Doherty ◽

S. V. Krishna Jagadish

Keyword(s):

Rna Extraction ◽

Poor Quality ◽

High Yield ◽

Plant Tissues ◽

Universal Method ◽

High Quality ◽

Yield And Quality ◽

Extraction Buffer ◽

Co Precipitation

Abstract Using existing protocols, RNA extracted from seeds rich in starch often results in poor quality RNA, making it inappropriate for downstream applications. Though some methods are proposed for extracting RNA from plant tissue rich in starch and other polysaccharides, they invariably yield less and poor quality RNA. In order to obtain high yield and quality RNA from seeds and other plant tissues including roots a modified SDS-LiCl method was compared with existing methods, including TRIZOL kit (Invitrogen), Plant RNeasy mini kit (Qiagen), Furtado (2014) method, and CTAB-LiCl method. Modifications in the extraction buffer and solutions used for RNA precipitation resulted in a robust method for extracting RNA in seeds and roots, where extracting quality RNA is challenging. The modified SDS-LiCl method revealed intense RNA bands through gel electrophoresis and a nanodrop spectrophotometer detected ratios of ≥ 2 and 1.8 for A260/A230 and A260/A280, respectively. The absence of starch co-precipitation during RNA extraction resulted in enhanced yield and quality of RNA with RIN values of 7–9, quantified using a bioanalyzer. The high-quality RNA obtained was demonstrated to be suitable for downstream applications, such as cDNA synthesis, gene amplification, and RT-qPCR. The method was also effective in extracting RNA from seeds of other cereals including field-grown sorghum and corn. The modified SDS-LiCl method is a robust and highly reproducible RNA extraction method for plant tissues rich in starch and other secondary metabolites. The modified SDS-LiCl method successfully extracted high yield and quality RNA from mature, developing, and germinated seeds, leaves, and roots exposed to different abiotic stresses.

Download Full-text

Construction of a chromosome-scale long-read reference genome assembly for potato

GigaScience ◽

10.1093/gigascience/giaa100 ◽

2020 ◽

Vol 9 (9) ◽

Cited By ~ 3

Author(s):

Gina M Pham ◽

John P Hamilton ◽

Joshua C Wood ◽

Joseph T Burke ◽

Hainan Zhao ◽

...

Keyword(s):

Genome Sequence ◽

Reference Genome ◽

Agronomic Traits ◽

Solanum Tuberosum L ◽

Fold Increase ◽

High Quality ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

Abstract Background Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1–3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality chromosome-scale genome assemblies at minimal cost. Findings Here, we present an updated version of the DM1–3 516 R44 genome sequence (v6.1) using Oxford Nanopore Technologies long reads coupled with proximity-by-ligation scaffolding (Hi-C), yielding a chromosome-scale assembly. The new (v6.1) assembly represents 741.6 Mb of sequence (87.8%) of the estimated 844 Mb genome, of which 741.5 Mb is non-gapped with 731.2 Mb anchored to the 12 chromosomes. Use of Oxford Nanopore Technologies full-length complementary DNA sequencing enabled annotation of 32,917 high-confidence protein-coding genes encoding 44,851 gene models that had a significantly improved representation of conserved orthologs compared with the previous annotation. The new assembly has improved contiguity with a 595-fold increase in N50 contig size, 99% reduction in the number of contigs, a 44-fold increase in N50 scaffold size, and an LTR Assembly Index score of 13.56, placing it in the category of reference genome quality. The improved assembly also permitted annotation of the centromeres via alignment to sequencing reads derived from CENH3 nucleosomes. Conclusions Access to advanced sequencing technologies and improved software permitted generation of a high-quality, long-read, chromosome-scale assembly and improved annotation dataset for the reference genotype of potato that will facilitate research aimed at improving agronomic traits and understanding genome evolution.

Download Full-text

Using of Ohlson Model for Express Valuation of the Company

Auditor ◽

10.12737/18995 ◽

2016 ◽

Vol 2 (4) ◽

pp. 34-37 ◽

Cited By ~ 1

Author(s):

Ордов ◽

K. Ordov

Keyword(s):

High Reliability ◽

Rapid Assessment ◽

Market Value ◽

Added Value ◽

High Quality ◽

Ohlson Model

Th e article discusses the prospect of using the model of Ohlson to conduct a rapid assessment of the market value of the company. Model of added value have a high reliability of the fi nal calculations with a high quality of accounting in the company.

Download Full-text

Fast and accurate de novo genome assembly from long uncorrected reads

10.1101/068122 ◽

2016 ◽

Cited By ~ 8

Author(s):

Robert Vaser ◽

Ivan Sović ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

Error Correction ◽

De Novo ◽

High Quality ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Long Reads ◽

Oxford Nanopore ◽

Order Of Magnitude ◽

Correction Step ◽

Consensus Module

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.Racon is available open source under the MIT license at https://github.com/isovic/racon.git.

Download Full-text