sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies

Abstract Motivation Retrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or polymorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need. Results Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity and genomic context and classifying them as somatic or polymorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them, in addition to finding polymorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of bona fide retroCNVs. Availability and implementation sideRETRO is available at https://github.com/galantelab/sideRETRO Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

sideRETRO: a pipeline for identifying somatic and dimorphic insertions of processed pseudogenes or retrocopies

10.1101/2020.03.09.983858 ◽

2020 ◽

Author(s):

Thiago L A Miller ◽

Fernanda Orpinelli ◽

José Leonel L Buzzo ◽

Pedro A F Galante

Keyword(s):

Whole Genome ◽

Sequencing Data ◽

Genomic Context ◽

Computational Tools ◽

Exome Sequencing Data ◽

Gene Copies ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Processed Pseudogenes ◽

Insertion Sites

ABSTRACTRetrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or dimorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need. Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity, and genomic context and classifying them as somatic or dimorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them (93.2% accuracy), in addition to identifying dimorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of retroCNVs.AvailabilitysideRETRO is available at https://github.com/galantelab/sideRETRO

Download Full-text

SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences from Reference Genomes

10.1101/824128 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yue Xing ◽

Alan R. Dabney ◽

Xiao Li ◽

Guosong Wang ◽

Clare A. Gill ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Whole Genome ◽

Sequencing Data ◽

Software Applications ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

AbstractCopy number variants are insertions and deletions of 1 kb or larger in a genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested SECNVs (Simulator of Exome Copy Number Variants), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.

Download Full-text

TPES: tumor purity estimation from SNVs

Bioinformatics ◽

10.1093/bioinformatics/btz406 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4433-4435 ◽

Cited By ~ 4

Author(s):

Alessio Locallo ◽

Davide Prandi ◽

Tarcisio Fedrizzi ◽

Francesca Demichelis

Keyword(s):

R Package ◽

Computational Method ◽

Supplementary Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Tumor Purity ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Fraction Distribution ◽

Tumor Genome

Abstract Motivation Tumor purity (TP) is the proportion of cancer cells in a tumor sample. TP impacts on the accurate assessment of molecular and genomics features as assayed with NGS approaches. State-of-the-art tools mainly rely on somatic copy-number alterations (SCNA) to quantify TP and therefore fail when a tumor genome is nearly euploid, i.e. ‘non-aberrant’ in terms of identifiable SCNAs. Results We introduce a computational method, tumor purity estimation from single-nucleotide variants (SNVs), which derives TP from the allelic fraction distribution of SNVs. On more than 7800 whole-exome sequencing data of TCGA tumor samples, it showed high concordance with a range of TP tools (Spearman’s correlation between 0.68 and 0.82; >9 SNVs) and rescued TP estimates of 1, 194 samples (15%) pan-cancer. Availability and implementation TPES is available as an R package on CRAN and at https://bitbucket.org/l0ka/tpes.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btu651 ◽

2014 ◽

Vol 30 (24) ◽

pp. 3532-3540 ◽

Cited By ~ 73

Author(s):

Layla Oesper ◽

Gryte Satas ◽

Benjamin J. Raphael

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Tumor Heterogeneity ◽

Whole Genome ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Global copy number profiling of cancer genomes

Bioinformatics ◽

10.1093/bioinformatics/btv676 ◽

2015 ◽

Vol 32 (6) ◽

pp. 926-928 ◽

Cited By ~ 4

Author(s):

Xuefeng Wang ◽

Mengjie Chen ◽

Xiaoqing Yu ◽

Natapol Pornputtapong ◽

Hao Chen ◽

...

Keyword(s):

Copy Number ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Tumor Purity ◽

Whole Exome ◽

Cancer Genomes ◽

Whole Exome Sequencing Data ◽

Allele Specific

Abstract Summary: In this article, we introduce a robust and efficient strategy for deriving global and allele-specific copy number alternations (CNA) from cancer whole exome sequencing data based on Log R ratios and B-allele frequencies. Applying the approach to the analysis of over 200 skin cancer samples, we demonstrate its utility for discovering distinct CNA events and for deriving ancillary information such as tumor purity. Availability and implementation: https://github.com/xfwang/CLOSE Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Biallelic novel mutations of the COL27A1 gene in a patient with Steel syndrome

Human Genome Variation ◽

10.1038/s41439-021-00149-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Jong Seop Kim ◽

Hyoungseok Jeon ◽

Hyeran Lee ◽

Jung Min Ko ◽

Yonghwan Kim ◽

...

Keyword(s):

Hip Dysplasia ◽

Large Deletion ◽

Compound Heterozygous ◽

Radial Head Dislocation ◽

Sequencing Data ◽

Novel Mutations ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Carpal Coalition

AbstractAn 11-year-old Korean boy presented with short stature, hip dysplasia, radial head dislocation, carpal coalition, genu valgum, and fixed patellar dislocation and was clinically diagnosed with Steel syndrome. Scrutinizing the trio whole-exome sequencing data revealed novel compound heterozygous mutations of COL27A1 (c.[4229_4233dup]; [3718_5436del], p.[Gly1412Argfs*157];[Gly1240_Lys1812del]) in the proband, which were inherited from heterozygous parents. The maternal mutation was a large deletion encompassing exons 38–60, which was challenging to detect.

Download Full-text

ETumorMetastasis: A Network-based Algorithm Predicts Clinical Outcomes Using Whole-exome Sequencing Data of Cancer Patients

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2020.06.009 ◽

2021 ◽

Cited By ~ 1

Author(s):

Jean-Sébastien Milanese ◽

Chabane Tibiche ◽

Naif Zaman ◽

Jinfeng Zou ◽

Pengyong Han ◽

...

Keyword(s):

Exome Sequencing ◽

Cancer Patients ◽

Clinical Outcomes ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Long runs of homozygosity are associated with Alzheimer’s disease

Translational Psychiatry ◽

10.1038/s41398-020-01145-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sonia Moreno-Grau ◽

◽

Maria Victoria Fernández ◽

Itziar de Rojas ◽

Pablo Garcia-González ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

European Ancestry ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Outbred Population ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Outbred Populations ◽

Recessive Effects

AbstractLong runs of homozygosity (ROH) are contiguous stretches of homozygous genotypes, which are a footprint of inbreeding and recessive inheritance. The presence of recessive loci is suggested for Alzheimer’s disease (AD); however, their search has been poorly assessed to date. To investigate homozygosity in AD, here we performed a fine-scale ROH analysis using 10 independent cohorts of European ancestry (11,919 AD cases and 9181 controls.) We detected an increase of homozygosity in AD cases compared to controls [βAVROH (CI 95%) = 0.070 (0.037–0.104); P = 3.91 × 10−5; βFROH (CI95%) = 0.043 (0.009–0.076); P = 0.013]. ROHs increasing the risk of AD (OR > 1) were significantly overrepresented compared to ROHs increasing protection (p < 2.20 × 10−16). A significant ROH association with AD risk was detected upstream the HS3ST1 locus (chr4:11,189,482‒11,305,456), (β (CI 95%) = 1.09 (0.48 ‒ 1.48), p value = 9.03 × 10−4), previously related to AD. Next, to search for recessive candidate variants in ROHs, we constructed a homozygosity map of inbred AD cases extracted from an outbred population and explored ROH regions in whole-exome sequencing data (N = 1449). We detected a candidate marker, rs117458494, mapped in the SPON1 locus, which has been previously associated with amyloid metabolism. Here, we provide a research framework to look for recessive variants in AD using outbred populations. Our results showed that AD cases have enriched homozygosity, suggesting that recessive effects may explain a proportion of AD heritability.

Download Full-text

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data

International Journal of Genomics ◽

10.1155/2016/7983236 ◽

2016 ◽

Vol 2016 ◽

pp. 1-16 ◽

Cited By ~ 16

Author(s):

Jennifer D. Hintzsche ◽

William A. Robinson ◽

Aik Choon Tan

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Disease Treatment ◽

Computational Tools ◽

Whole Exome ◽

Data Production ◽

Whole Exome Sequencing Data ◽

Computationally Intensive ◽

Generation Technology

Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases. Understanding the exomes of individuals at single base resolution allows the identification of actionable mutations for disease treatment and management. WES technologies have shifted the bottleneck in experimental data production to computationally intensive informatics-based data analysis. Novel computational tools and methods have been developed to analyze and interpret WES data. Here, we review some of the current tools that are being used to analyze WES data. These tools range from the alignment of raw sequencing reads all the way to linking variants to actionable therapeutics. Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data.

Download Full-text