A Survey of Bayesian Data Mining

Data Mining ◽  
2011 ◽  
pp. 1-26 ◽  
Author(s):  
Stefan Arnborg

This chapter reviews the fundamentals of inference, and gives a motivation for Bayesian analysis. The method is illustrated with dependency tests in data sets with categorical data variables, and the Dirichlet prior distributions. Principles and problems for deriving causality conclusions are reviewed, and illustrated with Simpson’s paradox. The selection of decomposable and directed graphical models illustrates the Bayesian approach. Bayesian and EM classification is shortly described. The material is illustrated on two cases, one in personalization of media distribution, one in schizophrenia research. These cases are illustrations of how to approach problem types that exist in many other application areas.

Author(s):  
Daiane Aparecida Zuanetti ◽  
Luis Aparecido Milan

In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs’ effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents’ genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs’ position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.


Author(s):  
M. Azarkhail ◽  
M. Modarres

The physics-of-failure (POF) modeling approach is a proven and powerful method to predict the reliability of mechanical components and systems. Most of POF models have been originally developed based upon empirical data from a wide range of applications (e.g. fracture mechanics approach to the fatigue life). Available curve fitting methods such as least square for example, calculate the best estimate of parameters by minimizing the distance function. Such point estimate approaches, basically overlook the other possibilities for the parameters and fail to incorporate the real uncertainty of empirical data into the process. The other important issue with traditional methods is when new data points become available. In such conditions, the best estimate methods need to be recalculated using the new and old data sets all together. But the original data sets, used to develop POF models may be no longer available to be combined with new data in a point estimate framework. In this research, for efficient uncertainty management in POF models, a powerful Bayesian framework is proposed. Bayesian approach provides many practical features such as a fair coverage of uncertainty and the updating concept that provide a powerful means for knowledge management, meaning that the Bayesian models allow the available information to be stored in a probability density format over the model parameters. These distributions may be considered as prior to be updated in the light of new data when they become available. At the first part of this article a brief review of classical and probabilistic approach to regression is presented. In this part the accuracy of traditional normal distribution assumption for error is examined and a new flexible likelihood function is proposed. The Bayesian approach to regression and its bonds with classical and probabilistic methods are explained next. In Bayesian section we shall discuss how the likelihood functions introduced in probabilistic approach, can be combined with prior information using the conditional probability concept. In order to highlight the advantages, the Bayesian approach is further clarified with case studies in which the result of calculation is compared with other traditional methods such as least square and maximum likelihood estimation (MLE) method. In this research, the mathematical complexity of Bayesian inference equations was overcome utilizing Markov Chain Monte Carlo simulation technique.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1933-1942
Author(s):  
Matthew S Olson

Abstract Discrimination between disomic and tetrasomic inheritance aids in determining whether tetraploids originated by allotetraploidy or autotetraploidy, respectively. Past assessments of inheritance in tetraploids have used analyses whereby each inheritance hypothesis is tested independently. I present a Bayesian analysis that is appropriate for discriminating among several inheritance hypotheses and can be used in any case where hypotheses are defined by discrete distributions. The Bayesian approach incorporates prior knowledge of the probability of occurrence of disomic and tetrasomic hypotheses so that the results of the analysis are not biased by the fact that there is a single tetrasomic hypothesis and multiple disomic hypotheses. This analysis is used to interpret data from crosses in the tetraploid Astilbe biternata, a herbaceous plant native to the southern Appalachians. The progeny ratios from all crosses favored the hypothesis of disomic inheritance at both the PGM and slow-PGI loci. These results support earlier cytogenetic evidence for the allotetraploid origin of Astilbe biternata.


2021 ◽  
pp. 109634802199084
Author(s):  
A. George Assaf ◽  
Mike Tsionas

Testing for collinearity continues to be a controversial issue in the literature. Multicollinearity detection criteria, such as the variance inflation factor, often fail to detect the true extent of multicollinearity. In this article, we propose utilizing the Bayesian approach as an attractive alternative. Under the Bayesian approach, we recommend comparing the marginal posterior of regression parameters under two different priors. If the difference in the posterior under these two priors is pronounced, one can surmise that collinearity is harmful. The Kolmogorov–Smirnov test can also be used as further evidence to confirm whether the posterior difference is significant.


2001 ◽  
Vol 58 (8) ◽  
pp. 1663-1671 ◽  
Author(s):  
Milo D Adkison ◽  
Zhenming Su

In this simulation study, we compared the performance of a hierarchical Bayesian approach for estimating salmon escapement from count data with that of separate maximum likelihood estimation of each year's escapement. We simulated several contrasting counting schedules resulting in data sets that differed in information content. In particular, we were interested in the ability of the Bayesian approach to estimate escapement and timing in years where few or no counts are made after the peak of escapement. We found that the Bayesian hierarchical approach was much better able to estimate escapement and escapement timing in these situations. Separate estimates for such years could be wildly inaccurate. However, even a single postpeak count could dramatically improve the estimability of escapement parameters.


Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2013 ◽  
Vol 8 (S300) ◽  
pp. 393-394 ◽  
Author(s):  
Iñigo Arregui ◽  
Andrés Asensio Ramos ◽  
Antonio J. Díaz

AbstractWe propose and use Bayesian techniques for the determination of physical parameters in solar prominence plasmas, combining observational and theoretical properties of waves and oscillations. The Bayesian approach also enables to perform model comparison to assess how plausible alternative physical models/mechanisms are in view of data.


Author(s):  
Kazimierz Garbulewski ◽  
Stanisław Jabłonowski ◽  
Simon Rabarijoely

Advantage of Bayesian approach to geotechnical designing The paper addresses the possibility of the Bayesian approach's application to geotechnical engineering. First the principal information on the Bayesian analysis has been presented and its applications to estimate the soil parameters based on the CPT/DMT tests at SGGW Campus in Warsaw afterwards. The CPT/DMT tests had been carried out in order to recognize the geotechnical conditions in the foundations of design campus buildings. The data from two layers of glacial boulder clays have been analysed. The results demonstrate that the Bayesian approach is a useful tool in evaluation of ground properties and estimation of the geotechnical parameters in specified circumstances.


ACTA IMEKO ◽  
2016 ◽  
Vol 5 (2) ◽  
pp. 14 ◽  
Author(s):  
Francesco Maspero ◽  
Emanuela Sibilia ◽  
Marco Martini

<p class="Abstract"><span lang="EN-US">In this work the application of Bayesian statistics to archaeological problems will be discussed. In particular, three case studies will be analyzed, each presenting complex interpretative scenarios, and the most suitable way to solve them. It will be shown that the Bayesian approach allows to refine a dating when in presence of multiple data, even from different dating techniques. The Bayesian approach is presented as the common language between physicists, archaeologists and statisticians to perform more accurate evaluations on stratigraphies and chronologies.</span></p>


Sign in / Sign up

Export Citation Format

Share Document