A Survey of Bayesian Data Mining

Data Mining ◽

10.4018/978-1-59140-051-6.ch001 ◽

2011 ◽

pp. 1-26 ◽

Cited By ~ 1

Author(s):

Stefan Arnborg

Keyword(s):

Data Mining ◽

Bayesian Analysis ◽

Graphical Models ◽

Bayesian Approach ◽

Data Sets ◽

Media Distribution ◽

Dirichlet Prior ◽

Approach Problem ◽

The Bayesian Approach ◽

Selection Of

This chapter reviews the fundamentals of inference, and gives a motivation for Bayesian analysis. The method is illustrated with dependency tests in data sets with categorical data variables, and the Dirichlet prior distributions. Principles and problems for deriving causality conclusions are reviewed, and illustrated with Simpson’s paradox. The selection of decomposable and directed graphical models illustrates the Bayesian approach. Bayesian and EM classification is shortly described. The material is illustrated on two cases, one in personalization of media distribution, one in schizophrenia research. These cases are illustrations of how to approach problem types that exist in many other application areas.

Download Full-text

A new Bayesian approach for QTL mapping of family data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972002150030x ◽

2021 ◽

Author(s):

Daiane Aparecida Zuanetti ◽

Luis Aparecido Milan

Keyword(s):

Qtl Mapping ◽

Random Effects ◽

Bayesian Approach ◽

Variance Component ◽

Mendelian Inheritance ◽

Family Data ◽

Data Sets ◽

Mendelian Segregation ◽

Gaw17 Data ◽

The Bayesian Approach

In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs’ effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents’ genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs’ position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.

Download Full-text

Automatic and semantic pre — Selection of features using ontology for data mining on data sets related to cancer

International Conference on Information Society (i-Society 2014) ◽

10.1109/i-society.2014.7009060 ◽

2014 ◽

Author(s):

Adriana da Silva Jacinto ◽

Ricardo da Silva Santos ◽

Jose Maria Parente de Oliveira

Keyword(s):

Data Mining ◽

Data Sets ◽

Selection Of

Download Full-text

A Novel Bayesian Framework for Uncertainty Management in Physics-Based Reliability Models

Volume 14: Safety Engineering, Risk Analysis and Reliability Methods ◽

10.1115/imece2007-41333 ◽

2007 ◽

Cited By ~ 11

Author(s):

M. Azarkhail ◽

M. Modarres

Keyword(s):

Bayesian Approach ◽

Empirical Data ◽

Probabilistic Approach ◽

Bayesian Framework ◽

Uncertainty Management ◽

Least Square ◽

The Other ◽

Point Estimate ◽

Data Sets ◽

The Bayesian Approach

The physics-of-failure (POF) modeling approach is a proven and powerful method to predict the reliability of mechanical components and systems. Most of POF models have been originally developed based upon empirical data from a wide range of applications (e.g. fracture mechanics approach to the fatigue life). Available curve fitting methods such as least square for example, calculate the best estimate of parameters by minimizing the distance function. Such point estimate approaches, basically overlook the other possibilities for the parameters and fail to incorporate the real uncertainty of empirical data into the process. The other important issue with traditional methods is when new data points become available. In such conditions, the best estimate methods need to be recalculated using the new and old data sets all together. But the original data sets, used to develop POF models may be no longer available to be combined with new data in a point estimate framework. In this research, for efficient uncertainty management in POF models, a powerful Bayesian framework is proposed. Bayesian approach provides many practical features such as a fair coverage of uncertainty and the updating concept that provide a powerful means for knowledge management, meaning that the Bayesian models allow the available information to be stored in a probability density format over the model parameters. These distributions may be considered as prior to be updated in the light of new data when they become available. At the first part of this article a brief review of classical and probabilistic approach to regression is presented. In this part the accuracy of traditional normal distribution assumption for error is examined and a new flexible likelihood function is proposed. The Bayesian approach to regression and its bonds with classical and probabilistic methods are explained next. In Bayesian section we shall discuss how the likelihood functions introduced in probabilistic approach, can be combined with prior information using the conditional probability concept. In order to highlight the advantages, the Bayesian approach is further clarified with case studies in which the result of calculation is compared with other traditional methods such as least square and maximum likelihood estimation (MLE) method. In this research, the mathematical complexity of Bayesian inference equations was overcome utilizing Markov Chain Monte Carlo simulation technique.

Download Full-text

Bayesian Procedures for Discriminating Among Hypotheses With Discrete Distributions: Inheritance in the Tetraploid Astilbe biternata

Genetics ◽

10.1093/genetics/147.4.1933 ◽

1997 ◽

Vol 147 (4) ◽

pp. 1933-1942

Author(s):

Matthew S Olson

Keyword(s):

Bayesian Analysis ◽

Prior Knowledge ◽

Bayesian Approach ◽

Discrete Distributions ◽

Southern Appalachians ◽

Herbaceous Plant ◽

Bayesian Procedures ◽

Disomic Inheritance ◽

Cytogenetic Evidence ◽

The Bayesian Approach

Abstract Discrimination between disomic and tetrasomic inheritance aids in determining whether tetraploids originated by allotetraploidy or autotetraploidy, respectively. Past assessments of inheritance in tetraploids have used analyses whereby each inheritance hypothesis is tested independently. I present a Bayesian analysis that is appropriate for discriminating among several inheritance hypotheses and can be used in any case where hypotheses are defined by discrete distributions. The Bayesian approach incorporates prior knowledge of the probability of occurrence of disomic and tetrasomic hypotheses so that the results of the analysis are not biased by the fact that there is a single tetrasomic hypothesis and multiple disomic hypotheses. This analysis is used to interpret data from crosses in the tetraploid Astilbe biternata, a herbaceous plant native to the southern Appalachians. The progeny ratios from all crosses favored the hypothesis of disomic inheritance at both the PGM and slow-PGI loci. These results support earlier cytogenetic evidence for the allotetraploid origin of Astilbe biternata.

Download Full-text

Testing for Collinearity using Bayesian Analysis

Journal of Hospitality & Tourism Research ◽

10.1177/1096348021990841 ◽

2021 ◽

pp. 109634802199084

Author(s):

A. George Assaf ◽

Mike Tsionas

Keyword(s):

Bayesian Analysis ◽

Bayesian Approach ◽

Attractive Alternative ◽

Variance Inflation Factor ◽

Regression Parameters ◽

Kolmogorov Smirnov ◽

Inflation Factor ◽

The Difference ◽

The Bayesian Approach ◽

Smirnov Test

Testing for collinearity continues to be a controversial issue in the literature. Multicollinearity detection criteria, such as the variance inflation factor, often fail to detect the true extent of multicollinearity. In this article, we propose utilizing the Bayesian approach as an attractive alternative. Under the Bayesian approach, we recommend comparing the marginal posterior of regression parameters under two different priors. If the difference in the posterior under these two priors is pronounced, one can surmise that collinearity is harmful. The Kolmogorov–Smirnov test can also be used as further evidence to confirm whether the posterior difference is significant.

Download Full-text

A comparison of salmon escapement estimates using a hierarchical Bayesian approach versus separate maximum likelihood estimation of each year's return

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f01-100 ◽

2001 ◽

Vol 58 (8) ◽

pp. 1663-1671 ◽

Cited By ~ 11

Author(s):

Milo D Adkison ◽

Zhenming Su

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Bayesian Approach ◽

Count Data ◽

Likelihood Estimation ◽

Data Sets ◽

Hierarchical Bayesian ◽

Hierarchical Approach ◽

Bayesian Hierarchical ◽

The Bayesian Approach

In this simulation study, we compared the performance of a hierarchical Bayesian approach for estimating salmon escapement from count data with that of separate maximum likelihood estimation of each year's escapement. We simulated several contrasting counting schedules resulting in data sets that differed in information content. In particular, we were interested in the ability of the Bayesian approach to estimate escapement and timing in years where few or no counts are made after the peak of escapement. We found that the Bayesian hierarchical approach was much better able to estimate escapement and escapement timing in these situations. Separate estimates for such years could be wildly inaccurate. However, even a single postpeak count could dramatically improve the estimability of escapement parameters.

Download Full-text

Privacy-Preserving Hybrid K-Means

Censorship, Surveillance, and Privacy ◽

10.4018/978-1-5225-7113-1.ch049 ◽

2019 ◽

pp. 1009-1026

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Data Mining ◽

Differential Privacy ◽

Privacy Preserving ◽

Local Optimum ◽

Data Sets ◽

Swarm Optimization ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

The promise of Bayesian analysis for prominence seismology

Proceedings of the International Astronomical Union ◽

10.1017/s1743921313011241 ◽

2013 ◽

Vol 8 (S300) ◽

pp. 393-394 ◽

Cited By ~ 2

Author(s):

Iñigo Arregui ◽

Andrés Asensio Ramos ◽

Antonio J. Díaz

Keyword(s):

Bayesian Analysis ◽

Bayesian Approach ◽

Model Comparison ◽

Physical Models ◽

Physical Parameters ◽

Solar Prominence ◽

Bayesian Techniques ◽

The Bayesian Approach

AbstractWe propose and use Bayesian techniques for the determination of physical parameters in solar prominence plasmas, combining observational and theoretical properties of waves and oscillations. The Bayesian approach also enables to perform model comparison to assess how plausible alternative physical models/mechanisms are in view of data.

Download Full-text

Advantage of Bayesian approach to geotechnical designing

Annals of Warsaw University of Life Sciences – SGGW Land Reclamation ◽

10.2478/v10060-008-0052-z ◽

2009 ◽

Vol 41 (2) ◽

pp. 83-93

Author(s):

Kazimierz Garbulewski ◽

Stanisław Jabłonowski ◽

Simon Rabarijoely

Keyword(s):

Bayesian Analysis ◽

Bayesian Approach ◽

Geotechnical Engineering ◽

Soil Parameters ◽

Geotechnical Parameters ◽

The Bayesian Approach

Advantage of Bayesian approach to geotechnical designing The paper addresses the possibility of the Bayesian approach's application to geotechnical engineering. First the principal information on the Bayesian analysis has been presented and its applications to estimate the soil parameters based on the CPT/DMT tests at SGGW Campus in Warsaw afterwards. The CPT/DMT tests had been carried out in order to recognize the geotechnical conditions in the foundations of design campus buildings. The data from two layers of glacial boulder clays have been analysed. The results demonstrate that the Bayesian approach is a useful tool in evaluation of ground properties and estimation of the geotechnical parameters in specified circumstances.

Download Full-text

Constraining absolute chronologies with the application of Bayesian analysis

ACTA IMEKO ◽

10.21014/acta_imeko.v5i2.322 ◽

2016 ◽

Vol 5 (2) ◽

pp. 14 ◽

Cited By ~ 1

Author(s):

Francesco Maspero ◽

Emanuela Sibilia ◽

Marco Martini

Keyword(s):

Bayesian Analysis ◽

Bayesian Statistics ◽

Case Studies ◽

Bayesian Approach ◽

Common Language ◽

Multiple Data ◽

The Common ◽

The Bayesian Approach

<p class="Abstract"><span lang="EN-US">In this work the application of Bayesian statistics to archaeological problems will be discussed. In particular, three case studies will be analyzed, each presenting complex interpretative scenarios, and the most suitable way to solve them. It will be shown that the Bayesian approach allows to refine a dating when in presence of multiple data, even from different dating techniques. The Bayesian approach is presented as the common language between physicists, archaeologists and statisticians to perform more accurate evaluations on stratigraphies and chronologies.</span></p>

Download Full-text