eTumorMetastasis, a network-based algorithm predicts clinical outcomes using whole-exome sequencing data of cancer patients

AbstractContinual reduction in sequencing cost is expanding the accessibility of genome sequencing data for routine clinical applications. However, the lack of methods to construct machine learning-based predictive models using these datasets has become a crucial bottleneck for the application of sequencing technology in clinics. Here we developed a new algorithm, eTumorMetastasis, which transforms tumor functional mutations into network-based profiles, and identify network operational gene signatures (NOG signatures) which model the tipping point at which a tumor cell shifts from a state that doesn’t favor recurrences to one that does. We showed that NOG signatures derived from genomic mutations of tumor founding clones (i.e., the ‘most recent common ancestor’ of the cells within a tumor) significantly distinguished recurred and non-recurred breast tumors. These results imply that somatic mutations of tumor founders are association with tumor recurrence and can be used to predict clinical outcomes. Finally, the concepts underlying the eTumorMetastasis pave the way for the application of genome sequencing in predictions for other complex genetic diseases.

Download Full-text

ETumorMetastasis: A Network-based Algorithm Predicts Clinical Outcomes Using Whole-exome Sequencing Data of Cancer Patients

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2020.06.009 ◽

2021 ◽

Cited By ~ 1

Author(s):

Jean-Sébastien Milanese ◽

Chabane Tibiche ◽

Naif Zaman ◽

Jinfeng Zou ◽

Pengyong Han ◽

...

Keyword(s):

Exome Sequencing ◽

Cancer Patients ◽

Clinical Outcomes ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences from Reference Genomes

10.1101/824128 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yue Xing ◽

Alan R. Dabney ◽

Xiao Li ◽

Guosong Wang ◽

Clare A. Gill ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Whole Genome ◽

Sequencing Data ◽

Software Applications ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

AbstractCopy number variants are insertions and deletions of 1 kb or larger in a genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested SECNVs (Simulator of Exome Copy Number Variants), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.

Download Full-text

Biallelic novel mutations of the COL27A1 gene in a patient with Steel syndrome

Human Genome Variation ◽

10.1038/s41439-021-00149-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Jong Seop Kim ◽

Hyoungseok Jeon ◽

Hyeran Lee ◽

Jung Min Ko ◽

Yonghwan Kim ◽

...

Keyword(s):

Hip Dysplasia ◽

Large Deletion ◽

Compound Heterozygous ◽

Radial Head Dislocation ◽

Sequencing Data ◽

Novel Mutations ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Carpal Coalition

AbstractAn 11-year-old Korean boy presented with short stature, hip dysplasia, radial head dislocation, carpal coalition, genu valgum, and fixed patellar dislocation and was clinically diagnosed with Steel syndrome. Scrutinizing the trio whole-exome sequencing data revealed novel compound heterozygous mutations of COL27A1 (c.[4229_4233dup]; [3718_5436del], p.[Gly1412Argfs*157];[Gly1240_Lys1812del]) in the proband, which were inherited from heterozygous parents. The maternal mutation was a large deletion encompassing exons 38–60, which was challenging to detect.

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Long runs of homozygosity are associated with Alzheimer’s disease

Translational Psychiatry ◽

10.1038/s41398-020-01145-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sonia Moreno-Grau ◽

◽

Maria Victoria Fernández ◽

Itziar de Rojas ◽

Pablo Garcia-González ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

European Ancestry ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Outbred Population ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Outbred Populations ◽

Recessive Effects

AbstractLong runs of homozygosity (ROH) are contiguous stretches of homozygous genotypes, which are a footprint of inbreeding and recessive inheritance. The presence of recessive loci is suggested for Alzheimer’s disease (AD); however, their search has been poorly assessed to date. To investigate homozygosity in AD, here we performed a fine-scale ROH analysis using 10 independent cohorts of European ancestry (11,919 AD cases and 9181 controls.) We detected an increase of homozygosity in AD cases compared to controls [βAVROH (CI 95%) = 0.070 (0.037–0.104); P = 3.91 × 10−5; βFROH (CI95%) = 0.043 (0.009–0.076); P = 0.013]. ROHs increasing the risk of AD (OR > 1) were significantly overrepresented compared to ROHs increasing protection (p < 2.20 × 10−16). A significant ROH association with AD risk was detected upstream the HS3ST1 locus (chr4:11,189,482‒11,305,456), (β (CI 95%) = 1.09 (0.48 ‒ 1.48), p value = 9.03 × 10−4), previously related to AD. Next, to search for recessive candidate variants in ROHs, we constructed a homozygosity map of inbred AD cases extracted from an outbred population and explored ROH regions in whole-exome sequencing data (N = 1449). We detected a candidate marker, rs117458494, mapped in the SPON1 locus, which has been previously associated with amyloid metabolism. Here, we provide a research framework to look for recessive variants in AD using outbred populations. Our results showed that AD cases have enriched homozygosity, suggesting that recessive effects may explain a proportion of AD heritability.

Download Full-text

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data

International Journal of Genomics ◽

10.1155/2016/7983236 ◽

2016 ◽

Vol 2016 ◽

pp. 1-16 ◽

Cited By ~ 16

Author(s):

Jennifer D. Hintzsche ◽

William A. Robinson ◽

Aik Choon Tan

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Disease Treatment ◽

Computational Tools ◽

Whole Exome ◽

Data Production ◽

Whole Exome Sequencing Data ◽

Computationally Intensive ◽

Generation Technology

Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases. Understanding the exomes of individuals at single base resolution allows the identification of actionable mutations for disease treatment and management. WES technologies have shifted the bottleneck in experimental data production to computationally intensive informatics-based data analysis. Novel computational tools and methods have been developed to analyze and interpret WES data. Here, we review some of the current tools that are being used to analyze WES data. These tools range from the alignment of raw sequencing reads all the way to linking variants to actionable therapeutics. Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data.

Download Full-text