Bayesian Data Mining and Knowledge Discovery

Data Mining ◽  
2011 ◽  
pp. 260-277
Author(s):  
Eitel J.M. Lauria ◽  
Giri Kumar Tayi

One of the major problems faced by data-mining technologies is how to deal with uncertainty. The prime characteristic of Bayesian methods is their explicit use of probability for quantifying uncertainty. Bayesian methods provide a practical method to make inferences from data using probability models for values we observe and about which we want to draw some hypotheses. Bayes’ Theorem provides the means of calculating the probability of a hypothesis (posterior probability) based on its prior probability, the probability of the observations, and the likelihood that the observational data fits the hypothesis. The purpose of this chapter is twofold: to provide an overview of the theoretical framework of Bayesian methods and its application to data mining, with special emphasis on statistical modeling and machine-learning techniques; and to illustrate each theoretical concept covered with practical examples. We will cover basic probability concepts, Bayes’ Theorem and its implications, Bayesian classification, Bayesian belief networks, and an introduction to simulation techniques.

Author(s):  
Eitel J.M. Lauria

Bayesian methods provide a probabilistic approach to machine learning. The Bayesian framework allows us to make inferences from data using probability models for values we observe and about which we want to draw some hypotheses. Bayes theorem provides the means of calculating the probability of a hypothesis (posterior probability) based on its prior probability, the probability of the observations and the likelihood that the observational data fit the hypothesis.


2019 ◽  
Vol 62 (3) ◽  
pp. 577-586 ◽  
Author(s):  
Garnett P. McMillan ◽  
John B. Cannon

Purpose This article presents a basic exploration of Bayesian inference to inform researchers unfamiliar to this type of analysis of the many advantages this readily available approach provides. Method First, we demonstrate the development of Bayes' theorem, the cornerstone of Bayesian statistics, into an iterative process of updating priors. Working with a few assumptions, including normalcy and conjugacy of prior distribution, we express how one would calculate the posterior distribution using the prior distribution and the likelihood of the parameter. Next, we move to an example in auditory research by considering the effect of sound therapy for reducing the perceived loudness of tinnitus. In this case, as well as most real-world settings, we turn to Markov chain simulations because the assumptions allowing for easy calculations no longer hold. Using Markov chain Monte Carlo methods, we can illustrate several analysis solutions given by a straightforward Bayesian approach. Conclusion Bayesian methods are widely applicable and can help scientists overcome analysis problems, including how to include existing information, run interim analysis, achieve consensus through measurement, and, most importantly, interpret results correctly. Supplemental Material https://doi.org/10.23641/asha.7822592


2019 ◽  
Vol 12 (3) ◽  
pp. 171-179 ◽  
Author(s):  
Sachin Gupta ◽  
Anurag Saxena

Background: The increased variability in production or procurement with respect to less increase of variability in demand or sales is considered as bullwhip effect. Bullwhip effect is considered as an encumbrance in optimization of supply chain as it causes inadequacy in the supply chain. Various operations and supply chain management consultants, managers and researchers are doing a rigorous study to find the causes behind the dynamic nature of the supply chain management and have listed shorter product life cycle, change in technology, change in consumer preference and era of globalization, to name a few. Most of the literature that explored bullwhip effect is found to be based on simulations and mathematical models. Exploring bullwhip effect using machine learning is the novel approach of the present study. Methods: Present study explores the operational and financial variables affecting the bullwhip effect on the basis of secondary data. Data mining and machine learning techniques are used to explore the variables affecting bullwhip effect in Indian sectors. Rapid Miner tool has been used for data mining and 10-fold cross validation has been performed. Weka Alternating Decision Tree (w-ADT) has been built for decision makers to mitigate bullwhip effect after the classification. Results: Out of the 19 selected variables affecting bullwhip effect 7 variables have been selected which have highest accuracy level with minimum deviation. Conclusion: Classification technique using machine learning provides an effective tool and techniques to explore bullwhip effect in supply chain management.


Author(s):  
Andrew Gelman ◽  
Deborah Nolan

This chapter contains many classroom activities and demonstrations to help students understand basic probability calculations, including conditional probability and Bayes rule. Many of the activities alert students to misconceptions about randomness. They create dramatic settings where the instructor discerns real coin flips from fake ones, students modify dice and coins in order to load them, students “accused” of lying based on the outcome of an inaccurate simulated lie detector face their classmates. Additionally, probability models of real outcomes offer good value: first we can do the probability calculations, and then can go back and discuss the potential flaws of the model.


Author(s):  
M Pourmahdian ◽  
R Zoghifard

Abstract This paper provides some model-theoretic analysis for probability (modal) logic ($PL$). It is known that this logic does not enjoy the compactness property. However, by passing into the sublogic of $PL$, namely basic probability logic ($BPL$), it is shown that this logic satisfies the compactness property. Furthermore, by drawing some special attention to some essential model-theoretic properties of $PL$, a version of Lindström characterization theorem is investigated. In fact, it is verified that probability logic has the maximal expressive power among those abstract logics extending $PL$ and satisfying both the filtration and disjoint unions properties. Finally, by alternating the semantics to the finitely additive probability models ($\mathcal{F}\mathcal{P}\mathcal{M}$) and introducing positive sublogic of $PL$ including $BPL$, it is proved that this sublogic possesses the compactness property with respect to $\mathcal{F}\mathcal{P}\mathcal{M}$.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 159
Author(s):  
Vaibhav A. Hiwase ◽  
Dr. Avinash J Agrawa

The growth of life insurance has been mainly depending on the risk of insured people. These risks are unevenly distributed among the people which can be captured from different characteristics and lifestyle. These unknown distribution needs to be analyzed from        historical data and use for underwriting and policy-making in life insurance industry. Traditionally risk is calculated from selected     features known as risk factors but today it becomes important to know these risk factors in high dimensional feature space. Clustering in high dimensional feature is a challenging task mainly because of the curse of dimensionality and noisy features. Hence the use of data mining and machine learning techniques should experiment to see some interesting pattern and behaviour. This will help life insurance company to protect from financial loss to the insured person and company as well. This paper focuses on analyzing hidden correlation among features and use it for risk calculation of an individual customer.  


Predictive modelling is a mathematical technique which uses Statistics for prediction, due to the rapid growth of data over the cloud system, data mining plays a significant role. Here, the term data mining is a way of extracting knowledge from huge data sources where it’s increasing the attention in the field of medical application. Specifically, to analyse and extract the knowledge from both known and unknown patterns for effective medical diagnosis, treatment, management, prognosis, monitoring and screening process. But the historical medical data might include noisy, missing, inconsistent, imbalanced and high dimensional data.. This kind of data inconvenience lead to severe bias in predictive modelling and decreased the data mining approach performances. The various pre-processing and machine learning methods and models such as Supervised Learning, Unsupervised Learning and Reinforcement Learning in recent literature has been proposed. Hence the present research focuses on review and analyses the various model, algorithm and machine learning technique for clinical predictive modelling to obtain high performance results from numerous medical data which relates to the patients of multiple diseases.


Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.


Sign in / Sign up

Export Citation Format

Share Document