diploid genome
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 27)

H-INDEX

22
(FIVE YEARS 3)

2021 ◽  
pp. gr.275579.121
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


Author(s):  
Carolina Gómez-Márquez ◽  
Dania Sandoval-Nuñez ◽  
Anne Gschaedler ◽  
Teresa Romero-Gutiérrez ◽  
Lorena Amaya-Delgado ◽  
...  

Abstract The yeast Kluyveromyces marxianus SLP1 has the potential for application in biotechnological processes because it can metabolize several sugars and produce high-value metabolites. K. marxianus SLP1 is a thermotolerant yeast isolated from the mezcal process, and it is tolerant to several cell growth inhibitors such as saponins, furan aldehydes, weak acids, and phenolics compounds. The genomic differences between dairy and non-dairy strains related to K. marxianus variability are a focus of research attention, particularly the pathways leading this species toward polyploidy. We report the diploid genome assembly of K. marxianus SLP1 non-lactide strain into 32 contigs to reach a size of ∼12 Mb (N50 = 1.3Mb) and a ∼39% GC content. Genome size is consistent with the k-mer frequency results. Genome annotation by Funannotate estimated 5000 genes in haplotype A and 4910 in haplotype B. The enriched annotated genes by ontology show differences between alleles in biological processes and cellular component. The analysis of variants related to DMKU3 and between haplotypes shows changes in LAC12 and INU1, which we hypothesize can impact carbon source performance. This report presents the first polyploid K. marxianus strain recovered from non-lactic fermenting medium.


2021 ◽  
Author(s):  
Yunfei Hu ◽  
Sanidhya V Mangal ◽  
Lu Zhang ◽  
Xin Zhou

AbstractThe detection of structural variants (SVs) remains challenging due to inconsistencies in detected breakpoints and biological complexity of some rearrangements. Linked-reads have demonstrated their superiority in diploid genome assembly and SV detection. Recently developed tools Aquila and Aquila_stLFR use a reference sequence and linked-reads to generate a high quality diploid genome assembly, using which they then detect and phase personal genetic variations. However, they both produce a substantial proportion of false positive deletion SV calls. To take full advantage of linked-reads, an effective downstream filtering and refinement framework is needed pressingly. In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating six state-of-the-art CNN backbones. The filtering of deletion SVs is formulated as a binary classification task on image data that are generated through the extraction of multiple alignment signals, including read depth, split reads and discordant read pairs. Three linked-reads libraries sequenced from the well-studied sample NA24385 and the gold standard of GiaB benchmark were used to perform thorough experiments on our proposed method. The results demonstrated that AquilaDeepFilter could increase the precision rate of Aquila while the recall rate of Aquila decreased only slightly, and the overall F1 improved by 20%. Furthermore, AquilaDeepFilter outperformed another deep learning based method for SV filtering, DeepSVFilter. Even though we designed AquilaDeepFilter for linked-reads, the framework could also be used to improve SV detection on short reads.


Plant Omics ◽  
2021 ◽  
pp. 50-56
Author(s):  
Dessireé Patricia Zerpa-Catanho ◽  
Tahira Jatt ◽  
Ray Ming

Jarilla chocola is an herbaceous plant species that belongs to the Jarilla genus and the Caricaceae family. No information on chromosome number or genome size has been reported for J. chocola that confirms the occurrence of dysploidy events and explore the existence of heteromorphic sex chromosomes. Therefore, the total number of chromosomes of this species was determined by karyotyping and counting the number of chromosomes observed, and the genome size of female and male plants was estimated separately by flow cytometry. Results showed that J. chocola has eight pairs of chromosomes (2n = 2x = 16), and its chromosomes are classified as metacentric for five pairs, submetacentric for two pairs and telocentric for one pair. The nuclear DNA content (1C-value) in picograms and diploid genome size was estimated separately from female and male plants using two species as the standards, Phaseolus vulgaris (1C = 0.60 pg) and Carica papaya (1C = 0.325 pg), to look for the possible existence of heteromorphic sex chromosomes. C. papaya proved to be a better standard for the determination of J. chocola DNA content and diploid genome size. No significant difference on the DNA content was observed between female (1C = 1.02 ± 0.003 pg) and male (1C = 1.02 ± 0.008 pg) plants. The estimated genome size of J. chocola per haploid genome in base pairs was calculated from the obtained C-values. Results showed an estimated genome size per haploid genome of 1018.44 ± 3.07 Mb and 1022.08 ± 7.76 Mb for female and male plants, respectively. Due to the observed chromosome number and genome size, only the occurrence of one of two previously reported dysploidy events in Jarilla could be confirmed for J. chocola and no evidence of heteromorphic sex chromosomes was found. These results provide fundamental information of the J. chocola genome and will expedite investigation of sex chromosomes and genome evolution in this species, the Jarilla genus and the Caricaceae family


Author(s):  
Lining Wang ◽  
Baosheng Liao ◽  
Lu Gong ◽  
Shuiming Xiao ◽  
Zhihai Huang

Heat stress is one of the most frequently encountered environmental stresses for most mushroom-forming fungi. Currently available fungal genomes are mostly haploid because high heterozygosity hinders diploid genome assembly.


2021 ◽  
Author(s):  
Xiao Luo ◽  
Xiongbin Kang ◽  
Alexander Schoenhuth

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly thanks to advantages of read length. However, current long-read assemblers usually introduce disturbing biases or fail to capture the haplotype diversity of the diploid genome. Here, we present phasebook, a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo. Benchmarking experiments demonstrate that our method outperforms other approaches in terms of haplotype coverage by large margins, while preserving competitive performance or even achieving advantages in terms of all other aspects relevant for genome assembly.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nadège Guiglielmoni ◽  
Antoine Houtain ◽  
Alessandro Derzelle ◽  
Karine Van Doninck ◽  
Jean-François Flot

Abstract Background Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. Results We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. Conclusions We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies.


2021 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method – Octopus – that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


Author(s):  
David Heller ◽  
Martin Vingron

Abstract Motivation With the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes. Results We introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual. Availability and Implementation SVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/svim-asm. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document