Mining Thousands of Genomes to Classify Somatic and Pathogenic Structural Variants

Mapping Intimacies ◽

10.1101/2021.04.21.440844 ◽

2021 ◽

Author(s):

Ryan M. Layer ◽

Fritz J. Sedlazeck ◽

Brent S. Pedersen ◽

Aaron R. Quinlan

Keyword(s):

Cancer Progression ◽

De Novo ◽

Variant Calling ◽

Structural Variants ◽

Mendelian Disorders

AbstractStructural variants (SVs) are associated with cancer progression and Mendelian disorders, but challenges with estimating SV frequency remain a barrier to somatic and de novo classification. In particular, variability in filtering and variant calling heuristics limit our ability to use SV catalogs from large cohorts. We present a method to index and search the raw alignments from thousands of samples that overcomes these limitations and supports robust SV analysis.

Download Full-text

Mining Thousands of Genomes to Classify Somatic and Pathogenic Structural Variants

10.21203/rs.3.rs-456227/v1 ◽

2021 ◽

Author(s):

Ryan Layer ◽

Fritz Sedlazeck ◽

Brent Pedersen ◽

Aaron Quinlan

Keyword(s):

Cancer Progression ◽

De Novo ◽

Variant Calling ◽

Structural Variants ◽

Mendelian Disorders

Abstract Structural variants (SVs) are associated with cancer progression and Mendelian disorders, but challenges with estimating SV frequency remain a barrier to somatic and de novo classification. In particular, variability in filtering and variant calling heuristics limit our ability to use SV catalogs from large cohorts. We present a method to index and search the raw alignments from thousands of samples that overcomes these limitations and supports robust SV analysis.

Download Full-text

Aquila_stLFR: assembly based variant calling package for stLFR and hybrid assembly for linked-reads

10.1101/742239 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xin Zhou ◽

Lu Zhang ◽

Xiaodong Fang ◽

Yichen Liu ◽

David L. Dill ◽

...

Keyword(s):

De Novo ◽

Low Cost ◽

Variant Calling ◽

Hybrid Assembly ◽

Structural Variants ◽

Sequencing Data ◽

Single Tube ◽

Large Numbers ◽

Key Characteristics ◽

Hybrid Assemblies

AbstractHuman diploid genome assembly enables identifying maternal and paternal genetic variations. Algorithms based on 10x linked-read sequencing have been developed for de novo assembly, variant calling and haplotyping. Another linked-read technology, single tube long fragment read (stLFR), has recently provided a low-cost single tube solution that can enable long fragment data. However, no existing software is available for human diploid assembly and variant calls. We develop Aquila stLFR to adapt to the key characteristics of stLFR. Aquila stLFR assembles near perfect diploid assembled contigs, and the assembly-based variant calling shows that Aquila stLFR detects large numbers of structural variants which were not easily spanned by Illumina short-reads. Furthermore, the hybrid assembly mode Aquila hybrid allows a hybrid assembly based on both stLFR and 10x linked-reads libraries, demonstrating that these two technologies can always be complementary to each other for assembly to improve contiguity and the variants detection, regardless of assembly quality of the library itself from single sequencing technology. The overlapped structural variants (SVs) from two independent sequencing data of the same individual, and the SVs from hybrid assemblies provide us a high-confidence profile to study them.AvailabilitySource code and documentation are available on https://github.com/maiziex/Aquila_stLFR.

Download Full-text

The correlation between lipid metabolism disorders and the prostate cancer

Current Medicinal Chemistry ◽

10.2174/0929867327666200806103744 ◽

2020 ◽

Vol 27 ◽

Author(s):

Justyna Dłubek ◽

Jacek Rysz ◽

Zbigniew Jabłonowski ◽

Anna Gluba-Brzózka ◽

Beata Franczyk

Keyword(s):

Prostate Cancer ◽

Fatty Acids ◽

Lipid Metabolism ◽

Cancer Progression ◽

De Novo ◽

Prostate Gland ◽

Hdl Cholesterol ◽

Atp Citrate Lyase ◽

Male Population ◽

Lipid Metabolism Disorders

: Prostate cancer is second most common cancer affecting male population all over the world. The existence of a correlation between lipid metabolism disorders and cancer of the prostate gland has been widely known for a long time. According to hypotheses, cholesterol may contribute to prostate cancer progression as a result of its participation as a signalling molecule in prostate growth and differentiation via numerous biologic mechanisms including Akt signalling and de novo steroidogenesis. The results of some studies suggest that increased cholesterol levels may be associated with higher risk of more aggressive course of disease. The aforementioned alterations in the synthesis of fatty acids are a unique feature of cancer and, therefore, it constitutes an attractive target for therapeutic intervention in the treatment of prostate cancer. Pharmacological or gene therapy aimed to reduce the activity of enzymes involved in de novo synthesis of fatty acids, FASN, ACLY (ATP citrate lyase) or SCD-1 (stearoyl-CoA desaturase) in particular, may result in cells growth arrest. Nevertheless, not all cancers are unequivocally associated with hypocholesterolaemia. It cannot be ruled out that the relationship between prostate cancer and lipid disorders is not a direct quantitative correlation between carcinogenesis and the amount of the circulating cholesterol. Perhaps the correspondence is more sophisticated and connected to the distribution of cholesterol fractions, or even sub-fractions of e.g. HDL cholesterol.

Download Full-text

nanotatoR: a tool for enhanced annotation of genomic structural variants

BMC Genomics ◽

10.1186/s12864-020-07182-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Surajit Bhattacharya ◽

Hayk Barseghyan ◽

Emmanuèle C. Délot ◽

Eric Vilain

Keyword(s):

De Novo ◽

Genome Mapping ◽

Gene List ◽

Sufficient Information ◽

Rna Seq ◽

Structural Variants ◽

De Novo Genome Assembly ◽

Pathogenic Variants ◽

Increased Sensitivity

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.

Download Full-text

Interplay between Histone and DNA Methylation Seen through Comparative Methylomes in Rare Mendelian Disorders

International Journal of Molecular Sciences ◽

10.3390/ijms22073735 ◽

2021 ◽

Vol 22 (7) ◽

pp. 3735

Author(s):

Guillaume Velasco ◽

Damien Ulveling ◽

Sophie Rondeau ◽

Pauline Marzin ◽

Motoko Unoki ◽

...

Keyword(s):

Dna Methylation ◽

Catalytic Domain ◽

De Novo ◽

Mutation Screening ◽

Histone H3 ◽

Regulatory Elements ◽

Germline Mutations ◽

Dna Methyltransferases ◽

Mendelian Disorders ◽

Epigenetic Machinery

DNA methylation (DNAme) profiling is used to establish specific biomarkers to improve the diagnosis of patients with inherited neurodevelopmental disorders and to guide mutation screening. In the specific case of mendelian disorders of the epigenetic machinery, it also provides the basis to infer mechanistic aspects with regard to DNAme determinants and interplay between histone and DNAme that apply to humans. Here, we present comparative methylomes from patients with mutations in the de novo DNA methyltransferases DNMT3A and DNMT3B, in their catalytic domain or their N-terminal parts involved in reading histone methylation, or in histone H3 lysine (K) methylases NSD1 or SETD2 (H3 K36) or KMT2D/MLL2 (H3 K4). We provide disease-specific DNAme signatures and document the distinct consequences of mutations in enzymes with very similar or intertwined functions, including at repeated sequences and imprinted loci. We found that KMT2D and SETD2 germline mutations have little impact on DNAme profiles. In contrast, the overlapping DNAme alterations downstream of NSD1 or DNMT3 mutations underlines functional links, more specifically between NSD1 and DNMT3B at heterochromatin regions or DNMT3A at regulatory elements. Together, these data indicate certain discrepancy with the mechanisms described in animal models or the existence of redundant or complementary functions unforeseen in humans.

Download Full-text

Computational study for suppression of CD25/IL-2 interaction

Biological Chemistry ◽

10.1515/hsz-2020-0326 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Moein Dehbashi ◽

Zohreh Hojati ◽

Majid Motovali-bashi ◽

Mazdak Ganjalikhani-Hakemi ◽

Akihiro Shimosaka ◽

...

Keyword(s):

Small Molecules ◽

Cancer Progression ◽

Immune Escape ◽

De Novo ◽

Computational Study ◽

Treg Cells ◽

Homology Search ◽

Small Interfering Rnas

AbstractCancer recurrence presents a huge challenge in cancer patient management. Immune escape is a key mechanism of cancer progression and metastatic dissemination. CD25 is expressed in regulatory T (Treg) cells including tumor-infiltrating Treg cells (TI-Tregs). These cells specially activate and reinforce immune escape mechanism of cancers. The suppression of CD25/IL-2 interaction would be useful against Treg cells activation and ultimately immune escape of cancer. Here, software, web servers and databases were used, at which in silico designed small interfering RNAs (siRNAs), de novo designed peptides and virtual screened small molecules against CD25 were introduced for the prospect of eliminating cancer immune escape and obtaining successful treatment. We obtained siRNAs with low off-target effects. Further, small molecules based on the binding homology search in ligand and receptor similarity were introduced. Finally, the critical amino acids on CD25 were targeted by a de novo designed peptide with disulfide bond. Hence we introduced computational-based antagonists to lay a foundation for further in vitro and in vivo studies.

Download Full-text

Genome-wide methylation sequencing identifies progression-related epigenetic drivers in myelodysplastic syndromes

Cell Death and Disease ◽

10.1038/s41419-020-03213-2 ◽

2020 ◽

Vol 11 (11) ◽

Author(s):

Jing-dong Zhou ◽

Ting-juan Zhang ◽

Zi-jun Xu ◽

Zhao-qun Deng ◽

Yu Gu ◽

...

Keyword(s):

Cancer Progression ◽

Myelodysplastic Syndromes ◽

Bisulfite Sequencing ◽

De Novo ◽

Dna Hypermethylation ◽

Reduced Representation ◽

Targeted Bisulfite Sequencing ◽

Specific Pcr ◽

Genome Wide ◽

Potential Biomarker

AbstractThe potential mechanism of myelodysplastic syndromes (MDS) progressing to acute myeloid leukemia (AML) remains poorly elucidated. It has been proved that epigenetic alterations play crucial roles in the pathogenesis of cancer progression including MDS. However, fewer studies explored the whole-genome methylation alterations during MDS progression. Reduced representation bisulfite sequencing was conducted in four paired MDS/secondary AML (MDS/sAML) patients and intended to explore the underlying methylation-associated epigenetic drivers in MDS progression. In four paired MDS/sAML patients, cases at sAML stage exhibited significantly increased methylation level as compared with the matched MDS stage. A total of 1090 differentially methylated fragments (DMFs) (441 hypermethylated and 649 hypomethylated) were identified involving in MDS pathogenesis, whereas 103 DMFs (96 hypermethylated and 7 hypomethylated) were involved in MDS progression. Targeted bisulfite sequencing further identified that aberrant GFRA1, IRX1, NPY, and ZNF300 methylation were frequent events in an additional group of de novo MDS and AML patients, of which only ZNF300 methylation was associated with ZNF300 expression. Subsequently, ZNF300 hypermethylation in larger cohorts of de novo MDS and AML patients was confirmed by real-time quantitative methylation-specific PCR. It was illustrated that ZNF300 methylation could act as a potential biomarker for the diagnosis and prognosis in MDS and AML patients. Functional experiments demonstrated the anti-proliferative and pro-apoptotic role of ZNF300 overexpression in MDS-derived AML cell-line SKM-1. Collectively, genome-wide DNA hypermethylation were frequent events during MDS progression. Among these changes, ZNF300 methylation, a regulator of ZNF300 expression, acted as an epigenetic driver in MDS progression. These findings provided a theoretical basis for the usage of demethylation drugs in MDS patients against disease progression.

Download Full-text

Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C

Nature Communications ◽

10.1038/s41467-020-20536-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zev N. Kronenberg ◽

Arang Rhie ◽

Sergey Koren ◽

Gregory T. Concepcion ◽

Paul Peluso ◽

...

Keyword(s):

Zebra Finch ◽

Cultured Cells ◽

De Novo ◽

Single Cells ◽

Variant Calling ◽

Chromatin Interaction ◽

Extended Haplotype ◽

Benchmark Datasets ◽

And Performance ◽

Genome Assemblies

AbstractHaplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80–91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.

Download Full-text

Genotyping structural variants in pangenome graphs using the vg toolkit

10.1101/654566 ◽

2019 ◽

Cited By ~ 7

Author(s):

Glenn Hickey ◽

David Heller ◽

Jean Monlong ◽

Jonas A. Sibbesen ◽

Jouni Sirén ◽

...

Keyword(s):

De Novo ◽

State Of The Art ◽

Effective Means ◽

Point Mutations ◽

Structural Variants ◽

Short Read ◽

Yeast Strains ◽

Sequencing Studies ◽

Long Read

AbstractStructural variants (SVs) remain challenging to represent and study relative to point mutations despite their demonstrated importance. We show that variation graphs, as implemented in the vg toolkit, provide an effective means for leveraging SV catalogs for short-read SV genotyping experiments. We benchmarked vg against state-of-the-art SV genotypers using three sequence-resolved SV catalogs generated by recent long-read sequencing studies. In addition, we use assemblies from 12 yeast strains to show that graphs constructed directly from aligned de novo assemblies improve genotyping compared to graphs built from intermediate SV catalogs in the VCF format.

Download Full-text

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

10.7287/peerj.preprints.314 ◽

2014 ◽

Author(s):

Jonathan Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

De Novo ◽

Variant Calling ◽

Population Level ◽

Model Organisms ◽

Effective Population ◽

Reduction Techniques ◽

Indel Polymorphisms ◽

Indel Calling ◽

Population Sizes

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text