Variant calling in low-coverage whole genome sequencing of a Native American population sample

Classical molecular analyses of Mycobacterium bovis based on spoligotyping and Variable Number Tandem Repeat (MIRU-VNTR) brought the first insights into the epidemiology of animal tuberculosis (TB) in Portugal, showing high genotypic diversity of circulating strains that mostly cluster within the European 2 clonal complex. Previous surveillance provided valuable information on the prevalence and spatial occurrence of TB and highlighted prevalent genotypes in areas where livestock and wild ungulates are sympatric. However, links at the wildlife–livestock interfaces were established mainly via classical genotype associations. Here, we apply whole genome sequencing (WGS) to cattle, red deer and wild boar isolates to reconstruct the M. bovis population structure in a multi-host, multi-region disease system and to explore links at a fine genomic scale between M. bovis from wildlife hosts and cattle. Whole genome sequences of 44 representative M. bovis isolates, obtained between 2003 and 2015 from three TB hotspots, were compared through single nucleotide polymorphism (SNP) variant calling analyses. Consistent with previous results combining classical genotyping with Bayesian population admixture modelling, SNP-based phylogenies support the branching of this M. bovis population into five genetic clades, three with apparent geographic specificities, as well as the establishment of an SNP catalogue specific to each clade, which may be explored in the future as phylogenetic markers. The core genome alignment of SNPs was integrated within a spatiotemporal metadata framework to further structure this M. bovis population by host species and TB hotspots, providing a baseline for network analyses in different epidemiological and disease control contexts. WGS of M. bovis isolates from Portugal is reported for the first time in this pilot study, refining the spatiotemporal context of TB at the wildlife–livestock interface and providing further support to the key role of red deer and wild boar on disease maintenance. The SNP diversity observed within this dataset supports the natural circulation of M. bovis for a long time period, as well as multiple introduction events of the pathogen in this Iberian multi-host system.

Download Full-text

Clinical-grade whole-genome sequencing and 3′ transcriptome analysis of colorectal cancer patients

Genome Medicine ◽

10.1186/s13073-021-00852-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Agata Stodolna ◽

Miao He ◽

Mahesh Vasipalli ◽

Zoya Kingsbury ◽

Jennifer Becq ◽

...

Keyword(s):

Colorectal Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Transcriptome Analysis ◽

Variant Calling ◽

Standard Of Care ◽

Genomic Variation ◽

Whole Genome ◽

Clinical Grade ◽

Pathway Gene

Abstract Background Clinical-grade whole-genome sequencing (cWGS) has the potential to become the standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3′ transcriptome analysis would give new insights into colorectal cancer. Methods Patients underwent PCR-free whole-genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into the mutational signatures and tumour biology were gained by the use of 3′ RNA-seq. Results Fifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 30% of patients had a tumour mutational burden of > 10 mutations/Mb of DNA, suggesting suitability for immunotherapy. Conclusions Clinical whole-genome sequencing offers a potential avenue for the identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.

Download Full-text

353 ASAS-EAAP Talk: Low-coverage whole-genome sequencing in local livestock breeds

Journal of Animal Science ◽

10.1093/jas/skaa278.149 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 81-82

Author(s):

Joaquim Casellas ◽

Melani Martín de Hijas-Villalba ◽

Marta Vázquez-Gómez ◽

Samir Id Lahoucine

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Error Rate ◽

Allele Frequencies ◽

Paternity Testing ◽

Sequencing Error ◽

Whole Genome ◽

Genomic Evaluation ◽

Sequencing Error Rate ◽

Low Coverage

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Batch effects in population genomic studies with low‐coverage whole genome sequencing data: causes, detection, and mitigation

Molecular Ecology Resources ◽

10.1111/1755-0998.13559 ◽

2021 ◽

Author(s):

Runyang Nicolas Lou ◽

Nina Overgaard Therkildsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Batch Effects ◽

Sequencing Data ◽

Population Genomic ◽

Genomic Studies ◽

Low Coverage

Download Full-text

Copy number analysis by low coverage whole genome sequencing using ultra low-input DNA from formalin-fixed paraffin embedded tumor tissue

Genome Medicine ◽

10.1186/s13073-016-0375-z ◽

2016 ◽

Vol 8 (1) ◽

Cited By ~ 18

Author(s):

Tanjina Kader ◽

David L. Goode ◽

Stephen Q. Wong ◽

Jacquie Connaughton ◽

Simone M. Rowley ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Tumor Tissue ◽

Whole Genome ◽

Copy Number Analysis ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Low Coverage ◽

Formalin Fixed

Download Full-text

Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach

Genetics Selection Evolution ◽

10.1186/s12711-019-0487-1 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 4

Author(s):

Yanjun Zan ◽

Thibaut Payen ◽

Mette Lillie ◽

Christa F. Honaker ◽

Paul B. Siegel ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Efficient Approach ◽

Cost Efficient ◽

Low Coverage

Download Full-text

Phylogenomics from low‐coverage whole‐genome sequencing

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13145 ◽

2019 ◽

Vol 10 (4) ◽

pp. 507-517 ◽

Cited By ~ 9

Author(s):

Feng Zhang ◽

Yinhuan Ding ◽

Chao‐Dong Zhu ◽

Xin Zhou ◽

Michael C. Orr ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Low Coverage

Download Full-text

Application of risk score analysis to low-coverage whole genome sequencing data for the noninvasive detection of trisomy 21, trisomy 18, and trisomy 13

Prenatal Diagnosis ◽

10.1002/pd.4712 ◽

2015 ◽

Vol 36 (1) ◽

pp. 56-62 ◽

Cited By ~ 8

Author(s):

J. A. Tynan ◽

S. K. Kim ◽

A. R. Mazloom ◽

C. Zhao ◽

G. McLennan ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Trisomy 21 ◽

Trisomy 13 ◽

Noninvasive Detection ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Score Analysis ◽

Low Coverage

Download Full-text

iStopCancer: A database of 6,016 low pass whole genome sequencing of minimal invasive samples from 21 cancer types of Chinese population.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.7062 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. 7062-7062

Author(s):

Min Yuan ◽

Qian Ziliang ◽

Juemin Fang ◽

Zhongzheng Zhu ◽

Jianguo Wu ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Cancer Patients ◽

Genome Sequencing ◽

Minimal Invasive ◽

The Body ◽

Whole Genome Sequencing Data ◽

Low Grade ◽

Whole Genome ◽

Low Pass ◽

Low Coverage

7062 Background: Cancer is a group of genetic diseases that result from changes in the genome of cells in the body, leading them to grow uncontrollably. Recent researches suggest Chromosome instability (CIN), which is defined as an increased rate of chromosome gains and losses, manifests as cell-to-cell karyotypic heterogeneity and drives cancer initiation and evolution. Methods: In the past two years, we initiated iStopCancer project, and characterized 4515 ‘best available’ minimal-invasive samples from cancer patients and 1501 plasma samples from non-tumor diseases by using low-pass whole genome sequencing. DNA from ‘best available’ minimal-invasive samples, including peripheral plasma, urines, pancreatic juice, bile and effusions were analyzed by low coverage whole genome sequencing followed by the UCAD Bioinformatics workflow to characterize the CINs. In total, 32T bp nucleotide (coverage =1.7X for each sample) were collected. All the data can be visualized on website: http://www.istopcancer.net/pgweb/cn/istopcancer.jsp . Results: 3748(83%) of tumors present detectable CIN (CIN score>1000) in minimal-invasive samples. The missed cancer patients were majorly from patients with either tumor size less than 2cm or less-aggressive cancers, including thyroid cancer, low-grade urothelial carcinoma, lung cancer in-situ, et al. Of the 1501 non-tumor individuals, 30(2.0%) present detectable CIN (|Z|>=3) at the time of sample collection, 24(80.0%) was diagnosed as tumor patient in 3-6 months follow-up. There were 9 (0.59%) of non-cancer individuals without detectable CIN were also reported as tumor patients during 6-month following up. In summary, the positive and negative prediction value is 80.0% and 99.4% respectively. The false alarms were majorly from patients with EBV activations, which indicates virus may interference chromosome stability and drove virus-associated carcinogenesis. For the patient with repeated detections, plasma cfDNA CIN dynamics predicted clinical responses and disease recurrences. Quick clearance of plasma cfDNA CIN in 2-3 weeks was found in 153 (83.6%) patients. Meanwhile, no quick clearance was found in majority of SDs/PDs (73/88=83.0%). Furthermore, cfDNA CIN predicts clinical response 2-8 weeks ahead of traditional biomarkers (CEA, CA15-3, CA199, AFP et al). Conclusions: Large-scale low coverage whole genome sequencing data provides useful information for cancer detection and managements.

Download Full-text