scholarly journals Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zev N. Kronenberg ◽  
Arang Rhie ◽  
Sergey Koren ◽  
Gregory T. Concepcion ◽  
Paul Peluso ◽  
...  

AbstractHaplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80–91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.

2018 ◽  
Author(s):  
Zev N. Kronenberg ◽  
Arang Rhie ◽  
Sergey Koren ◽  
Gregory T. Concepcion ◽  
Paul Peluso ◽  
...  

ABSTRACTHaplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. These assemblies can be created in various ways, such as use of tissues that contain single-haplotype (haploid) genomes, or by co-sequencing of parental genomes, but these approaches can be impractical in many situations. We present FALCON-Phase, which integrates long-read sequencing data and ultra-long-range Hi-C chromatin interaction data of a diploid individual to create high-quality, phased diploid genome assemblies. The method was evaluated by application to three datasets, including human, cattle, and zebra finch, for which high-quality, fully haplotype resolved assemblies were available for benchmarking. Phasing algorithm accuracy was affected by heterozygosity of the individual sequenced, with higher accuracy for cattle and zebra finch (>97%) compared to human (82%). In addition, scaffolding with the same Hi-C chromatin contact data resulted in phased chromosome-scale scaffolds.


2016 ◽  
Author(s):  
Derek M. Bickhart ◽  
Benjamin D. Rosen ◽  
Sergey Koren ◽  
Brian L. Sayre ◽  
Alex R. Hastie ◽  
...  

AbstractThe decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus), based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most contiguous de novo mammalian assembly to date, with chromosome-length scaffolds and only 663 gaps. Our assembly represents a >250-fold improvement in contiguity compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, supporting the most complete repeat family and immune gene complex representation ever produced for a ruminant species.


2017 ◽  
Author(s):  
Heng Li ◽  
Jonathan M Bloom ◽  
Yossi Farjoun ◽  
Mark Fleharty ◽  
Laura Gauthier ◽  
...  

Constructed from the consensus of multiple variant callers based on short-read data, existing benchmark datasets for evaluating variant calling accuracy are biased toward easy regions accessible by known algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two human cell lines that are homozygous across the whole genome. This benchmark provides a more accurate and less biased estimate of the error rate of small variant calls in a realistic context.


2016 ◽  
Author(s):  
Jay Ghurye ◽  
Mihai Pop ◽  
Sergey Koren ◽  
Chen-Shan Chin

AbstractMotivationLong read technologies have made a revolution in de novo genome assembly by generating contigs of size orders of magnitude more than that of short read assemblies. Although the assembly contiguity has increased, it still does not span a chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To address this problem, we develop a scalable and computationally efficient scaffolding method that can boost the contiguity of the assembly by a large extent using genome wide chromatin interaction data such as Hi-C. Particularly, we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies.ResultsWe tested our methods on two long read assemblies of different organisms. We compared our method with previously developed method and show that our approach performs better in terms of accuracy of scaffolding.AvailabilityThe software is available for free use and can be downloaded from here: https://github.com/machinegun/[email protected]


2018 ◽  
Author(s):  
Shivani Mahajan ◽  
Kevin Wei ◽  
Matthew Nalley ◽  
Lauren Giblisco ◽  
Doris Bachtrog

While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, optical mapping and BAC clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including ~41.5 Mb of pericentromeric and telomeric regions, and >100Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold, due to the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.


2021 ◽  
Author(s):  
Miguel A Naranjo-Ortiz ◽  
Manu Molina ◽  
Veronica Mixao ◽  
Toni Gabaldon

Recent technological developments have made genome sequencing and assembly accessible to many groups. However, the presence in sequenced organisms of certain genomic features such as high heterozygosity, polyploidy, aneuploidy, or heterokaryosis can challenge current standard assembly procedures and result in highly fragmented assemblies. Hence, we hypothesized that genome databases must contain a non-negligible fraction of low-quality assemblies that result from such type of intrinsic genomic factors. Here we present Karyon, a Python-based toolkit that uses raw sequencing data and de novo genome assembly to assess several parameters and generate informative plots to assist in the identification of non-chanonical genomic traits. Karyon includes automated de novo genome assembly and variant calling pipelines. We tested Karyon by diagnosing 35 highly fragmented publicly available assemblies from 19 different Mucorales (Fungi) species. Our results show that 6 (17%) of the assemblies presented signs of unusual genomic configurations, suggesting that these are common, at least within the Fungi.


Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 904
Author(s):  
Saif ur Rehman ◽  
Faiz-ul Hassan ◽  
Xier Luo ◽  
Zhipeng Li ◽  
Qingyou Liu

The buffalo was domesticated around 3000–6000 years ago and has substantial economic significance as a meat, dairy, and draught animal. The buffalo has remained underutilized in terms of the development of a well-annotated and assembled reference genome de novo. It is mandatory to explore the genetic architecture of a species to understand the biology that helps to manage its genetic variability, which is ultimately used for selective breeding and genomic selection. Morphological and molecular data have revealed that the swamp buffalo population has strong geographical genomic diversity with low gene flow but strong phenotypic consistency, while the river buffalo population has higher phenotypic diversity with a weak phylogeographic structure. The availability of recent high-quality reference genome and genotyping marker panels has invigorated many genome-based studies on evolutionary history, genetic diversity, functional elements, and performance traits. The increasing molecular knowledge syndicate with selective breeding should pave the way for genetic improvement in the climatic resilience, disease resistance, and production performance of water buffalo populations globally.


2021 ◽  
Author(s):  
Hans-Georg Sprenger ◽  
Thomas MacVicar ◽  
Amir Bahat ◽  
Kai Uwe Fiedler ◽  
Steffen Hermans ◽  
...  

AbstractCytosolic mitochondrial DNA (mtDNA) elicits a type I interferon response, but signals triggering the release of mtDNA from mitochondria remain enigmatic. Here, we show that mtDNA-dependent immune signalling via the cyclic GMP–AMP synthase‒stimulator of interferon genes‒TANK-binding kinase 1 (cGAS–STING–TBK1) pathway is under metabolic control and is induced by cellular pyrimidine deficiency. The mitochondrial protease YME1L preserves pyrimidine pools by supporting de novo nucleotide synthesis and by proteolysis of the pyrimidine nucleotide carrier SLC25A33. Deficiency of YME1L causes inflammation in mouse retinas and in cultured cells. It drives the release of mtDNA and a cGAS–STING–TBK1-dependent inflammatory response, which requires SLC25A33 and is suppressed upon replenishment of cellular pyrimidine pools. Overexpression of SLC25A33 is sufficient to induce immune signalling by mtDNA. Similarly, depletion of cytosolic nucleotides upon inhibition of de novo pyrimidine synthesis triggers mtDNA-dependent immune responses in wild-type cells. Our results thus identify mtDNA release and innate immune signalling as a metabolic response to cellular pyrimidine deficiencies.


Author(s):  
Quan-Kuan Shen ◽  
Min-Sheng Peng ◽  
Adeniyi C Adeola ◽  
Ling Kui ◽  
Shengchang Duan ◽  
...  

Abstract Domestication of the helmeted guinea fowl (HGF; Numida meleagris) in Africa remains elusive. Here we report a high-quality de novo genome assembly for domestic HGF generated by long and short-reads sequencing together with optical and chromatin interaction mapping. Using this assembly as the reference, we performed population genomic analyses for newly sequenced whole-genomes for 129 birds from Africa, Asia, and Europe, including domestic animals (n = 89), wild progenitors (n = 34), and their closely related wild species (n = 6). Our results reveal domestication of HGF in West Africa around 1,300-5,500 years ago. Scanning for selective signals characterized the functional genes in behavior and locomotion changes involved in domestication of HGF. The pleiotropy and linkage in genes affecting plumage color and fertility were revealed in the recent breeding of Italian domestic HGF. In addition to presenting a missing piece to the jigsaw puzzle of domestication in poultry, our study provides valuable genetic resources for researchers and breeders to improve production in this species.


Author(s):  
Christine Tschoe ◽  
Teddy E. Kim ◽  
Kyle M. Fargen ◽  
Stacey Q. Wolfe

Until recently, cerebral arteriopathy due to heterozygous mutations of the ACTA2 gene was considered a variant of moyamoya disease. However, radiographic analysis of patients with these mutations reveals a distinctive angiographic appearance from that seen in moyamoya disease. Several heterozygous missense ACTA2 mutations have been implicated in the development of this distinct cerebrovascular entity; however, the penetrance and systemic manifestations of these mutations vary based on the location of the amino acid replacement within the α–smooth muscle actin protein. The severity of the phenotype may also differ among patients within a single mutation type. There is limited literature on the safety and efficacy of revascularization procedures for ACTA2 arteriopathy, which have been limited to those patients with known Arg179His mutations. The authors provide a review of the breadth of mutations within the ACTA2 literature and report a case of two siblings with de novo ACTA2 Arg258Cys mutations with differing clinical courses, highlighting the utility of indirect revascularization with 8-year follow-up data. This case highlights the importance of early recognition of the angiographic appearance of ACTA2 cerebral arteriopathy and performance of genetic testing, as the location of the mutation impacts clinical presentation and outcomes.


Sign in / Sign up

Export Citation Format

Share Document