statistical mixture model
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 1)

H-INDEX

3
(FIVE YEARS 1)

2017 ◽  
Author(s):  
Péter Kómár ◽  
Deniz Kural

MotivationClassical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations.ResultsWe introduce a statistical mixture model forcomparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty.AvailabilityThe Python library geck, and usage examples are available at the following URL: https://github.com/sbg/[email protected] informationSupplementary materials are available at bioRxiv.


2014 ◽  
Vol 53 (3) ◽  
pp. 652-659 ◽  
Author(s):  
David Plavcan ◽  
Georg J. Mayr ◽  
Achim Zeileis

AbstractDiagnosing foehn winds from weather station data downwind of topographic obstacles requires distinguishing them from other downslope winds, particularly nocturnal ones driven by radiative cooling. An automatic classification scheme to obtain reproducible results that include information about the (un)certainty of the diagnosis is presented. A statistical mixture model separates foehn and no-foehn winds in a measured time series of wind. In addition to wind speed and direction, it accommodates other physically meaningful classifiers such as the (potential) temperature difference to an upwind station (e.g., near the crest) or relative humidity. The algorithm was tested for Wipp Valley in the central Alps against human expert classification and a previous objective method (Drechsel and Mayr 2008), which the new method outperforms. Climatologically, using only wind information gives nearly identical foehn frequencies as when using additional covariables. A data record length of at least one year is required for satisfactory results. The suitability of mixture models for objective classification of foehn at other locations will have to be tested in further studies.


Sign in / Sign up

Export Citation Format

Share Document