Targeted Long-Read Sequencing Reveals Comprehensive Architecture, Burden and Transcriptional Signatures from HBV-Associated Integrations and Translocations in HCC Cell Lines

2021 ◽  
Author(s):  
Ricardo Ramirez ◽  
Nicholas van Buuren ◽  
Lindsay Gamelin ◽  
Cameron Soulette ◽  
Lindsey May ◽  
...  

Hepatitis B virus (HBV) can integrate into the chromosomes of infected hepatocytes, creating potentially oncogenic lesions that can lead to hepatocellular carcinoma (HCC). However, our current understanding of integrated HBV DNA architecture, burden and transcriptional activity is incomplete due to technical limitations. A combination of genomics approaches was used to describe HBV integrations and corresponding transcriptional signatures in three HCC cell lines: huH-1, PLC/PRF/5 and Hep3B. To generate high coverage long-read sequencing data, a custom panel of HBV-targeting biotinylated oligonucleotide probes was designed. Targeted long-read DNA sequencing captured entire HBV integration events within individual reads, revealing that integrations may include deletions and inversions of viral sequences. Surprisingly, all three HCC cell lines contain integrations that are associated with host chromosomal translocations. In addition, targeted long-read RNA sequencing allowed for the assignment of transcriptional activity to specific integrations and resolved the contribution of overlapping HBV transcripts. HBV transcripts chimeric with host sequences were resolved in their entirety and often included >1000bp of host sequence. This study provides the first comprehensive description of HBV integrations and associated transcriptional activity in three commonly utilized HCC-derived cell lines. The application of novel methods sheds new light on the complexity of these integrations, including HBV bidirectional transcription, nested transcripts, silent integrations and host genomic rearrangements. The observation of multiple HBV-associated chromosomal translocations gives rise to the hypothesis that HBV may be a driver of genetic instability and provides a potential new mechanism for HCC development. Importance HCC-derived cell lines have served as practical models to study HBV biology for decades. These cell lines harbor multiple HBV integrations and express only HBV surface antigen (HBsAg). To date, an accurate description of the integration burden, architecture and transcriptional profile of these cell lines has been limited due to technical constraints. We have developed a targeted long-read sequencing assay which reveals the entire architecture of integrations in these cell lines. In addition, we identified five chromosomal translocations with integrated HBV DNA at the inter-chromosomal junctions. Incorporation of long-read RNA-Seq data indicated that many integrations and translocations were transcriptionally silent. The observation of multiple HBV-associated translocations has strong implications regarding the potential mechanisms for the development of HBV-associated HCC.

2018 ◽  
Author(s):  
Li Fang ◽  
Charlly Kao ◽  
Michael V Gonzalez ◽  
Fernanda A Mafra ◽  
Renata Pellegrino da Silva ◽  
...  

AbstractLinked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve the detection and breakpoint identification for structural variants (SVs). We present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrates that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.


2021 ◽  
Author(s):  
Raga Krishnakumar ◽  
Anne M Ruffing

Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine whether the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high coverage of operons or high accuracy of the calls. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded on as additional data is generated and incorporated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility and adaptability.


2021 ◽  
Author(s):  
Stephanie M. Yan ◽  
Rachel M. Sherman ◽  
Dylan J. Taylor ◽  
Divya R. Nair ◽  
Andrew N. Bortvin ◽  
...  

AbstractLarge genomic insertions, deletions, and inversions are a potent source of functional and fitness-altering variation, but are challenging to resolve with short-read DNA sequencing alone. While recent long-read sequencing technologies have greatly expanded the catalog of structural variants (SVs), their costs have so far precluded their application at population scales. Given these limitations, the role of SVs in human adaptation remains poorly characterized. Here, we used a graph-based approach to genotype 107,866 long-read-discovered SVs in short-read sequencing data from diverse human populations. We then applied an admixture-aware method to scan these SVs for patterns of population-specific frequency differentiation—a signature of local adaptation. We identified 220 SVs exhibiting extreme frequency differentiation, including several SVs that were among the lead variants at their corresponding loci. The top two signatures traced to separate insertion and deletion polymorphisms at the immunoglobulin heavy chain locus, together tagging a 325 Kbp haplotype that swept to high frequency and was subsequently fragmented by recombination. Alleles defining this haplotype are nearly fixed (60-95%) in certain Southeast Asian populations, but are rare or absent from other global populations composing the 1000 Genomes Project. Further investigation revealed that the haplotype closely matches with sequences observed in two of three high-coverage Neanderthal genomes, providing strong evidence of a Neanderthal-introgressed origin. This extraordinary episode of positive selection, which we infer to have occurred between 1700 and 8400 years ago, corroborates the role of immune-related genes as prominent targets of adaptive archaic introgression. Our study demonstrates how combining recent advances in genome sequencing, genotyping algorithms, and population genetic methods can reveal signatures of key evolutionary events that remained hidden within poorly resolved regions of the genome.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Li Fang ◽  
Charlly Kao ◽  
Michael V. Gonzalez ◽  
Fernanda A. Mafra ◽  
Renata Pellegrino da Silva ◽  
...  

AbstractLinked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve detection and breakpoint identification for structural variants (SVs). Here we present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrate that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease-causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.


Viruses ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 651 ◽  
Author(s):  
Maritza Puray-Chavez ◽  
Mahmoud Farghali ◽  
Vincent Yapo ◽  
Andrew Huber ◽  
Dandan Liu ◽  
...  

Moloney leukemia virus 10 (MOV10) is an RNA helicase that has been shown to affect the replication of several viruses. The effect of MOV10 on Hepatitis B virus (HBV) infection is not known and its role on the replication of this virus is poorly understood. We investigated the effect of MOV10 down-regulation and MOV10 over-expression on HBV in a variety of cell lines, as well as in an infection system using a replication competent virus. We report that MOV10 down-regulation, using siRNA, shRNA, and CRISPR/Cas9 gene editing technology, resulted in increased levels of HBV DNA, HBV pre-genomic RNA, and HBV core protein. In contrast, MOV10 over-expression reduced HBV DNA, HBV pre-genomic RNA, and HBV core protein. These effects were consistent in all tested cell lines, providing strong evidence for the involvement of MOV10 in the HBV life cycle. We demonstrated that MOV10 does not interact with HBV-core. However, MOV10 binds HBV pgRNA and this interaction does not affect HBV pgRNA decay rate. We conclude that the restriction of HBV by MOV10 is mediated through effects at the level of viral RNA.


2019 ◽  
Author(s):  
Ahmed Ibrahim Samir Khalil ◽  
Anupam Chattopadhyay ◽  
Amartya Sanyal

AbstractMotivationHyperploidy and segmental aneuploidy are hallmarks of cancer cells due to chromosome segregation errors and genomic instability. In such situations, accurate aneuploidy profiling of cancer data is critical for calibration of copy number (CN)-detection tools. Additionally, cancer cell populations suffer from different levels of clonal heterogeneity and aneuploidy alterations over time. The degree of heterogeneity adversely affects the segregation of the depth of coverage (DOC) signal into integral CN states. This, in turn, strongly influences the reliability of this data for ploidy profiling and copy number variation (CNV) analysis.ResultsWe developed AStra framework for aneuploidy profiling of cancer data and assessing their suitability for copy number analysis without any prior knowledge of the input sequencing data. AStra estimates the best-fit aneuploidy profile as the spectrum with most genomic segments around integral CN states. We employ this spectrum to extract the CN-associated features such as the homogeneity score (HS), whole-genome ploidy level, and CN correction factor. The HS measures the percentage of genomic regions around CN states. It is used as a reliability assessment of sequencing data for downstream aneuploidy profiling and CNV analysis. We evaluated the accuracy of AStra using 31 low-coverage datasets from 20 cancer cell lines. AStra successfully identified the aneuploidy spectrum of complex cell lines with HS greater than 75%. Benchmarking against nQuire tool showed that AStra is superior in detecting the ploidy level using both low- and high-coverage data. Furthermore, AStra accurately estimated the ploidy of 26/27 strains of MCF7 (hyperploid) cell line which exhibit varied levels of aneuploidy spectrum and heterogeneity. Remarkably, we found that HS is strongly correlated with the doubling time of these strains.Availability and implementationAStra is an open source software implemented in Python and is available at https://github.com/AISKhalil/AStra


2021 ◽  
Author(s):  
Ning Wang ◽  
Vladislav Lysenkov ◽  
Katri Orte ◽  
Veli Kairisto ◽  
Juhani Aakko ◽  
...  

Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools on indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage, coupled with specific variant calling tools.


2019 ◽  
Author(s):  
Tom Hill ◽  
Robert L. Unckless

AbstractCopy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods or coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.Available at: https://github.com/tomh1lll/dudeml


2010 ◽  
Vol 84 (12) ◽  
pp. 5860-5867 ◽  
Author(s):  
David M. Iser ◽  
Nadia Warner ◽  
Peter A. Revill ◽  
Ajantha Solomon ◽  
Fiona Wightman ◽  
...  

ABSTRACT Liver-related mortality is increased in the setting of HIV-hepatitis B virus (HBV) coinfection. However, interactions between HIV and HBV to explain this observation have not been described. We hypothesized that HIV infection of hepatocytes directly affects the life cycle of HBV. We infected human hepatic cell lines expressing HBV (Hep3B and AD38 cells) or not expressing HBV (Huh7, HepG2, and AD43 cells) with laboratory strains of HIV (NL4-3 and AD8), as well as a vesicular stomatitis virus (VSV)-pseudotyped HIV expressing enhanced green fluorescent protein (EGFP). Following HIV infection with NL4-3 or AD8 in hepatic cell lines, we observed a significant increase in HIV reverse transcriptase activity which was infectious. Despite no detection of surface CD4, CCR5, and CXCR4 by flow cytometry, AD8 infection of AD38 cells was inhibited by maraviroc and NL4-3 was inhibited by AMD3100, demonstrating that HIV enters AD38 hepatic cell lines via CCR5 or CXCR4. High-level infection of AD38 cells (50%) was achieved using VSV-pseudotyped HIV. Coinfection of the AD38 cell line with HIV did not alter the HBV DNA amount or species as determined by Southern blotting or nucleic acid signal amplification. However, coinfection with HIV was associated with a significant increase in intracellular HBsAg when measured by Western blotting, quantitative HBsAg, and fluorescence microscopy. We conclude that HIV infection of HBV-infected hepatic cell lines significantly increased intracellular HBsAg but not HBV DNA synthesis and that increased intrahepatic HBsAg secondary to direct infection by HIV may contribute to accelerated liver disease in HIV-HBV-coinfected individuals.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chloe Goldsmith ◽  
Jesús Rafael Rodríguez-Aguilera ◽  
Ines El-Rifai ◽  
Adrien Jarretier-Yuste ◽  
Valérie Hervieu ◽  
...  

AbstractMammalian cytosine DNA methylation (5mC) is associated with the integrity of the genome and the transcriptional status of nuclear DNA. Due to technical limitations, it has been less clear if mitochondrial DNA (mtDNA) is methylated and whether 5mC has a regulatory role in this context. Here, we used bisulfite-independent single-molecule sequencing of native human and mouse DNA to study mitochondrial 5mC across different biological conditions. We first validated the ability of long-read nanopore sequencing to detect 5mC in CpG (5mCpG) and non-CpG (5mCpH) context in nuclear DNA at expected genomic locations (i.e. promoters, gene bodies, enhancers, and cell type-specific transcription factor binding sites). Next, using high coverage nanopore sequencing we found low levels of mtDNA CpG and CpH methylation (with several exceptions) and little variation across biological processes: differentiation, oxidative stress, and cancer. 5mCpG and 5mCpH were overall higher in tissues compared to cell lines, with small additional variation between cell lines of different origin. Despite general low levels, global and single-base differences were found in cancer tissues compared to their adjacent counterparts, in particular for 5mCpG. In conclusion, nanopore sequencing is a useful tool for the detection of modified DNA bases on mitochondria that avoid the biases introduced by bisulfite and PCR amplification. Enhanced nanopore basecalling models will provide further resolution on the small size effects detected here, as well as rule out the presence of other DNA modifications such as oxidized forms of 5mC.


Sign in / Sign up

Export Citation Format

Share Document