PARC: A Processing-in-CAM Architecture for Genomic Long Read Pairwise Alignment using ReRAM

Author(s):  
Fan Chen ◽  
Linghao Song ◽  
Hai Li ◽  
Yiran Chen
Keyword(s):  
2019 ◽  
Author(s):  
Lu Zhang ◽  
Xin Zhou ◽  
Ziming Weng ◽  
Arend Sidow

AbstractStructural variants (SVs) in a personal genome are important but, for all practical purposes, impossible to detect comprehensively by standard short-fragment sequencing. De novo assembly, traditionally used to generate reference genomes, offers an alternative means for variant detection and phasing but has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10x linked-read sequencing, which has been applied to assemble human diploid genomes into high quality contigs, supports accurate SV detection. We examined variants in six de novo 10x assemblies with diverse experimental parameters from two commonly used human cell lines, NA12878 and NA24385. The assemblies are effective in detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the accuracy of SV breakpoint at base-pair level is high, with a majority (80% for deletion and 70% for insertion) of SVs having precisely correct sizes and breakpoints (<2bp difference). Finally, setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation, which in about half of cases is opposite to that of the reference-based call. Interestingly, we uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10x linked-read data can achieve cost-effective SV detection for personal genomes.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Lu Zhang ◽  
Xin Zhou ◽  
Ziming Weng ◽  
Arend Sidow

Abstract Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ayako Nishizawa ◽  
Kazuki Kumada ◽  
Keiko Tateno ◽  
Maiko Wagata ◽  
Sakae Saito ◽  
...  

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.


2021 ◽  
Vol 2 (2) ◽  
pp. 100023
Author(s):  
Susan M. Hiatt ◽  
James M.J. Lawlor ◽  
Lori H. Handley ◽  
Ryne C. Ramaker ◽  
Brianne B. Rogers ◽  
...  

Methods ◽  
2021 ◽  
Author(s):  
Blondal Thorarinn ◽  
Gamba Cristina ◽  
Jagd Lea Møller ◽  
Su Ling ◽  
Demirov Dimiter ◽  
...  
Keyword(s):  

Author(s):  
Shannon J Sibbald ◽  
Maggie Lawton ◽  
John M Archibald

Abstract The Pelagophyceae are marine stramenopile algae that include Aureoumbra lagunensis and Aureococcus anophagefferens, two microbial species notorious for causing harmful algal blooms. Despite their ecological significance, relatively few genomic studies of pelagophytes have been carried out. To improve understanding of the biology and evolution of pelagophyte algae, we sequenced complete mitochondrial genomes for A. lagunensis (CCMP1510), Pelagomonas calceolata (CCMP1756) and five strains of A. anophagefferens (CCMP1707, CCMP1708, CCMP1850, CCMP1984 and CCMP3368) using Nanopore long-read sequencing. All pelagophyte mitochondrial genomes assembled into single, circular mapping contigs between 39,376 base-pairs (bp) (P. calceolata) and 55,968 bp (A. lagunensis) in size. Mitochondrial genomes for the five A. anophagefferens strains varied slightly in length (42,401 bp—42,621 bp) and were 99.4%-100.0% identical. Gene content and order was highly conserved between the A. anophagefferens and P. calceolata genomes, with the only major difference being a unique region in A. anophagefferens containing DNA adenine and cytosine methyltransferase (dam/dcm) genes that appear to be the product of lateral gene transfer from a prokaryotic or viral donor. While the A. lagunensis mitochondrial genome shares seven distinct syntenic blocks with the other pelagophyte genomes, it has a tandem repeat expansion comprising ∼40% of its length, and lacks identifiable rps19 and glycine tRNA genes. Laterally acquired self-splicing introns were also found in the 23S rRNA (rnl) gene of P. calceolata and the coxI gene of the five A. anophagefferens genomes. Overall, these data provide baseline knowledge about the genetic diversity of bloom-forming pelagophytes relative to non-bloom-forming species.


Sign in / Sign up

Export Citation Format

Share Document