scholarly journals Harvestman: A framework for hierarchical feature learning and selection from whole genome sequencing data

2020 ◽  
Author(s):  
Trevor S. Frisby ◽  
Shawn James Baker ◽  
Guillaume Marçais ◽  
Quang Minh Hoang ◽  
Carl Kingsford ◽  
...  

AbstractWe present Harvestman, a method that takes advantage of hierarchical relationships among the possible biological interpretations and representations of genomic variants to perform automatic feature learning, feature selection, and model building. We demonstrate that Harvestman scales to thousands of genomes comprising more than 84 million variants by processing phase 3 data from the 1000 Genomes Project, the largest publicly available collection of whole genome sequences. Next, using breast cancer data from The Cancer Genome Atlas, we show that Harvestman selects a rich combination of representations that are adapted to the learning task, and performs better than a binary representation of SNPs alone. Finally, we compare Harvestman to existing feature selection methods and demonstrate that our method selects smaller and less redundant feature subsets, while maintaining accuracy of the resulting classifier. The data used is available through either the 1000 Genomes Project or The Cancer Genome Atlas. Access to TCGA data requires the completion of a Data Access Request through the Database of Genotypes and Phenotypes (dbGaP). Binary releases of Harvestman compatible with Linux, Windows, and Mac are available for download at https://github.com/cmlh-gp/Harvestman-public/releases

2013 ◽  
Vol 88 (1) ◽  
pp. 774-774 ◽  
Author(s):  
E. S. Amirian ◽  
M. L. Bondy ◽  
Q. Mo ◽  
M. N. Bainbridge ◽  
M. E. Scheurer

2017 ◽  
Vol 3 (6) ◽  
pp. 584-589 ◽  
Author(s):  
Mark W. Ball ◽  
Michael A. Gorin ◽  
Charles G. Drake ◽  
Hans J. Hammers ◽  
Mohamad E. Allaf

2017 ◽  
pp. 1-12
Author(s):  
Manish R. Sharma ◽  
James T. Auman ◽  
Nirali M. Patel ◽  
Juneko E. Grilley-Olson ◽  
Xiaobei Zhao ◽  
...  

Purpose A 73-year-old woman with metastatic colon cancer experienced a complete response to chemotherapy with dose-intensified irinotecan that has been durable for 5 years. We sequenced her tumor and germ line DNA and looked for similar patterns in publicly available genomic data from patients with colorectal cancer. Patients and Methods Tumor DNA was obtained from a biopsy before therapy, and germ line DNA was obtained from blood. Tumor and germline DNA were sequenced using a commercial panel with approximately 250 genes. Whole-genome amplification and exome sequencing were performed for POLE and POLD1. A POLD1 mutation was confirmed by Sanger sequencing. The somatic mutation and clinical annotation data files from the colon (n = 461) and rectal (n = 171) adenocarcinoma data sets were downloaded from The Cancer Genome Atlas data portal and analyzed for patterns of mutations and clinical outcomes in patients with POLE- and/or POLD1-mutated tumors. Results The pattern of alterations included APC biallelic inactivation and microsatellite instability high (MSI-H) phenotype, with somatic inactivation of MLH1 and hypermutation (estimated mutation rate > 200 per megabase). The extremely high mutation rate led us to investigate additional mechanisms for hypermutation, including loss of function of POLE. POLE was unaltered, but a related gene not typically associated with somatic mutation in colon cancer, POLD1, had a somatic mutation c.2171G>A [p.Gly724Glu]. Additionally, we noted that the high mutation rate was largely composed of dinucleotide deletions. A similar pattern of hypermutation (dinucleotide deletions, POLD1 mutations, MSI-H) was found in tumors from The Cancer Genome Atlas. Conclusion POLD1 mutation with associated MSI-H and hyper-indel–hypermutated cancer genome characterizes a previously unrecognized variant of colon cancer that was found in this patient with an exceptional response to chemotherapy.


2018 ◽  
Vol Volume 11 ◽  
pp. 1-11 ◽  
Author(s):  
Chundi Gao ◽  
Huayao Li ◽  
Jing Zhuang ◽  
HongXiu Zhang ◽  
Kejia Wang ◽  
...  

2018 ◽  
Vol 17 (2) ◽  
pp. 476-487 ◽  
Author(s):  
Fengju Chen ◽  
Yiqun Zhang ◽  
Sooryanarayana Varambally ◽  
Chad J. Creighton

Sign in / Sign up

Export Citation Format

Share Document