An Approximate Bayesian Computation Approach to Examining the Phylogenetic Relationships among the Four Gibbon Genera using Whole Genome Sequence Data

Mapping Intimacies ◽

10.1101/009498 ◽

2014 ◽

Author(s):

Krishna Veeramah ◽

August E Woerner ◽

Laurel Johnstone ◽

Ivo Gut ◽

Marta Gut ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Phylogenetic Relationships ◽

Approximate Bayesian Computation ◽

Whole Genome Sequence ◽

Sequencing Error ◽

Computation Method ◽

Bayesian Computation ◽

Whole Genome ◽

Approximate Bayesian

Gibbons are believed to have diverged from the larger great apes ~16.8 Mya and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the familyHylobatidaeis divided into four genera,Nomascus,Symphalangus,HoolockandHylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mtDNA, the Y chromosome, and short autosomal sequences have been inconclusive. To examine the relationships among gibbon genera in more depth, we performed 2nd generation whole genome sequencing to a mean of ~15X coverage in two individuals from each genus. We developed a coalescent-based Approximate Bayesian Computation method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. AlthoughHoolockandSymphalangusare likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Our combined results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1x10-9/site/year this speciation process occurred ~5 Mya during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera.

Leveraging whole genome sequencing data for demographic inference with approximate Bayesian computation

Molecular Ecology Resources ◽

10.1111/1755-0998.13092 ◽

2019 ◽

Vol 20 (1) ◽

pp. 125-139 ◽

Cited By ~ 4

Author(s):

Chris C. R. Smith ◽

Samuel M. Flaxman

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Approximate Bayesian Computation ◽

Whole Genome Sequencing Data ◽

Bayesian Computation ◽

Whole Genome ◽

Sequencing Data ◽

Demographic Inference ◽

Approximate Bayesian

Examining Phylogenetic Relationships Among Gibbon Genera Using Whole Genome Sequence Data Using an Approximate Bayesian Computation Approach

Genetics ◽

10.1534/genetics.115.174425 ◽

2015 ◽

Vol 200 (1) ◽

pp. 295-308 ◽

Cited By ~ 26

Author(s):

K. R. Veeramah ◽

A. E. Woerner ◽

L. Johnstone ◽

I. Gut ◽

M. Gut ◽

...

Keyword(s):

Genome Sequence ◽

Phylogenetic Relationships ◽

Approximate Bayesian Computation ◽

Sequence Data ◽

Whole Genome Sequence ◽

Bayesian Computation ◽

Whole Genome ◽

Genome Sequence Data ◽

Approximate Bayesian

353 ASAS-EAAP Talk: Low-coverage whole-genome sequencing in local livestock breeds

Journal of Animal Science ◽

10.1093/jas/skaa278.149 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 81-82

Author(s):

Joaquim Casellas ◽

Melani Martín de Hijas-Villalba ◽

Marta Vázquez-Gómez ◽

Samir Id Lahoucine

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Error Rate ◽

Allele Frequencies ◽

Paternity Testing ◽

Sequencing Error ◽

Whole Genome ◽

Genomic Evaluation ◽

Sequencing Error Rate ◽

Low Coverage

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

SureSelect targeted enrichment, a new cost effective method for the whole genome sequencing of Candidatus Liberibacter asiaticus

Scientific Reports ◽

10.1038/s41598-019-55144-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Weili Cai ◽

Schyler Nunziata ◽

John Rascoe ◽

Michael J. Stulberg

Keyword(s):

Whole Genome Sequencing ◽

Molecular Characterization ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Coverage ◽

Metagenomic Sample ◽

Liberibacter Asiaticus

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.

Parameter Estimation of Accelerated Lifetime Testing Models Using an Efficient Approximate Bayesian Computation Method

10.3850/978-981-18-2016-8_138-cd ◽

2021 ◽

Author(s):

Mohamed Rabhi ◽

Anis Ben Abdessalem ◽

Laurent Saintis ◽

Bruno Castanier ◽

Rodrigue Sohoin Koffisse

Keyword(s):

Parameter Estimation ◽

Approximate Bayesian Computation ◽

Computation Method ◽

Bayesian Computation ◽

Accelerated Lifetime ◽

Lifetime Testing ◽

Approximate Bayesian

Abstract 343: Bayesian Selection of Modifier Genes in Hypertrophic Cardiomyopathy Through Whole Genome Sequencing

Circulation Research ◽

10.1161/res.117.suppl_1.343 ◽

2015 ◽

Vol 117 (suppl_1) ◽

Author(s):

Matthew Wheeler ◽

Daryl Waggott ◽

Megan Grove ◽

Frederick Dewey ◽

Cuiping Pan ◽

...

Keyword(s):

Hypertrophic Cardiomyopathy ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

A Priori ◽

Copy Number Variants ◽

Whole Genome Sequence ◽

Monogenic Disease ◽

Whole Genome ◽

Genetic Modifiers ◽

Structural Variants

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.

Whole-Genome Sequencing Allows for Improved Identification of Persistent Listeria monocytogenes in Food-Associated Environments

Applied and Environmental Microbiology ◽

10.1128/aem.01049-15 ◽

2015 ◽

Vol 81 (17) ◽

pp. 6024-6037 ◽

Cited By ~ 76

Author(s):

Matthew J. Stasiewicz ◽

Haley F. Oliver ◽

Martin Wiedmann ◽

Henk C. den Bakker

Keyword(s):

Listeria Monocytogenes ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genetic Determinants ◽

Single Nucleotide ◽

Content Type ◽

Food Borne ◽

Clonal Spread

ABSTRACTWhile the food-borne pathogenListeria monocytogenescan persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study ofL. monocytogenesin retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping ofL. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced strains, and (iii) identify genetic determinants ofL. monocytogenespersistence. WGS analysis revealed three prophage regions that explained differences between three pairs of phylogenetically similar populations with pulsed-field gel electrophoresis types that differed by ≤3 bands. WGS-SNP-based phylogenetics found that putatively persistentL. monocytogenesrepresent SNP patterns (i) unique to a single retail deli, supporting persistence within the deli (11 clades), (ii) unique to a single state, supporting clonal spread within a state (7 clades), or (iii) spanning multiple states (5 clades). Isolates that formed one of 11 deli-specific clades differed by a median of 10 SNPs or fewer. Isolates from 12 putative persistence events had significantly fewer SNPs (median, 2 to 22 SNPs) than between isolates of the same subtype from other delis (median up to 77 SNPs), supporting persistence of the strain. In 13 events, nearly indistinguishable isolates (0 to 1 SNP) were found across multiple delis. No individual genes were enriched among persistent isolates compared to sporadic isolates. Our data show that WGS analysis improves food-borne pathogen subtyping and identification of persistent bacterial pathogens in food associated environments.

Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation

PLoS ONE ◽

10.1371/journal.pone.0130855 ◽

2015 ◽

Vol 10 (6) ◽

pp. e0130855 ◽

Cited By ~ 94

Author(s):

Frank Technow ◽

Carlos D. Messina ◽

L. Radu Totir ◽

Mark Cooper

Keyword(s):

Approximate Bayesian Computation ◽

Growth Models ◽

Crop Growth ◽

Bayesian Computation ◽

Whole Genome ◽

Genome Prediction ◽

Crop Growth Models ◽

Approximate Bayesian

Five-year Microevolution of a Multidrug-Resistant Mycobacterium Tuberculosis Strain within a Patient with Inadequate Compliance to Treatment.

10.21203/rs.3.rs-251738/v1 ◽

2021 ◽

Author(s):

Dario Fernández Do Porto ◽

Johana Monteserin ◽

Josefina Campos ◽

Ezequiel J Sosa ◽

Mario Matteo ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Sequence ◽

Treatment Compliance ◽

Whole Genome Sequence ◽

Intermittent Therapy ◽

Whole Genome ◽

Short Term ◽

Depth Analysis

Abstract BackgroundWhole-genome sequencing has shown that the Mycobacterium tuberculosis infection process can be more heterogeneous than previously thought. Compartmentalized infections, exogenous reinfections, and microevolution are manifestations of this clonal complexity. The analysis of the mechanisms causing the microevolution —the genetic variability of M. tuberculosis at short time scales— of a parental strain into clonal variants with a patient is a relevant issue that has not been yet completely addressed. To our knowledge, a whole genome sequence microevolution analysis in a single patient with inadequate adherence to treatment has not been previously reported.Case Presentations In this work, we applied whole genome sequencing for a more in-depth analysis of the microevolution of a parental Mycobacterium tuberculosis strain into clonal variants within a patient with poor treatment compliance in Argentina. We analyzed the whole-genome sequence of 8 consecutive Mycobacterium. tuberculosis isolates obtained from a patient within 57-month of intermittent therapy. Nineteen mutations (9 short-term, 10 fixed variants) emerged, most of them associated with drug resistance. The first isolate was already resistant to isoniazid, rifampicin, and streptomycin, thereafter the strain developed resistance to fluoroquinolones and pyrazinamide. Surprisingly, isolates remained susceptible to the pro-drug ethionamide after acquiring a frameshift mutation in ethA, a gene required for its activation. We also found a novel variant, (T-54G), in the 5' untranslated region of whiB7 (T-54G), a region allegedly related to kanamycin resistance. Notably, discrepancies between canonical and phage-based susceptibility testing to kanamycin were previously found for the isolate harboring this mutation. In our patience, microevolution was mainly driven by drug selective pressure. Rare short-term mutations fixed together with resistance-conferring mutations during therapy.ConclusionsThis report highlights the relevance of whole-genome sequencing in the clinic for characterization of pre-XDR and MDR resistance profile, particularly in patients with incomplete and/or intermittent treatment.