conditional independence assumption
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 10)

H-INDEX

8
(FIVE YEARS 1)

Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2982
Author(s):  
Liangjun Yu ◽  
Shengfeng Gan ◽  
Yu Chen ◽  
Dechun Luo

Naive Bayes (NB) is easy to construct but surprisingly effective, and it is one of the top ten classification algorithms in data mining. The conditional independence assumption of NB ignores the dependency between attributes, so its probability estimates are often suboptimal. Hidden naive Bayes (HNB) adds a hidden parent to each attribute, which can reflect dependencies from all the other attributes. Compared with other Bayesian network algorithms, it offers significant improvements in classification performance and avoids structure learning. However, the assumption that HNB regards each instance equivalent in terms of probability estimation is not always true in real-world applications. In order to reflect different influences of different instances in HNB, the HNB model is modified into the improved HNB model. The novel hybrid approach called instance weighted hidden naive Bayes (IWHNB) is proposed in this paper. IWHNB combines instance weighting with the improved HNB model into one uniform framework. Instance weights are incorporated into the improved HNB model to calculate probability estimates in IWHNB. Extensive experimental results show that IWHNB obtains significant improvements in classification performance compared with NB, HNB and other state-of-the-art competitors. Meanwhile, IWHNB maintains the low time complexity that characterizes HNB.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0258586
Author(s):  
Luca Corazzini ◽  
Silvia D’Arrigo ◽  
Emanuele Millemaci ◽  
Pietro Navarra

Despite several attempts to provide a definite pattern regarding the effects of personality traits on performance in higher education, the debate over the nature of the relationship is far from being conclusive. The use of different subject pools and sample sizes, as well as the use of identification strategies that either do not adequately account for selection bias or are unable to establish causality between measures of academic performance and noncognitive skills, are possible sources of heterogeneity. This paper investigates the impact of the Big Five traits, as measured before the beginning of the academic year, on the grade point average achieved in the first year after the enrolment, taking advantage of a unique and large dataset from a cohort of Italian students in all undergraduate programs containing detailed information on student and parental characteristics. Relying on a robust strategy to credibly satisfy the conditional independence assumption, we find that higher levels of conscientiousness and openness to experience positively affect student score.


2021 ◽  
Vol 25 (1) ◽  
pp. 35-55
Author(s):  
Limin Wang ◽  
Peng Chen ◽  
Shenglei Chen ◽  
Minghui Sun

Bayesian network classifiers (BNCs) have proved their effectiveness and efficiency in the supervised learning framework. Numerous variations of conditional independence assumption have been proposed to address the issue of NP-hard structure learning of BNC. However, researchers focus on identifying conditional dependence rather than conditional independence, and information-theoretic criteria cannot identify the diversity in conditional (in)dependencies for different instances. In this paper, the maximum correlation criterion and minimum dependence criterion are introduced to sort attributes and identify conditional independencies, respectively. The heuristic search strategy is applied to find possible global solution for achieving the trade-off between significant dependency relationships and independence assumption. Our extensive experimental evaluation on widely used benchmark data sets reveals that the proposed algorithm achieves competitive classification performance compared to state-of-the-art single model learners (e.g., TAN, KDB, KNN and SVM) and ensemble learners (e.g., ATAN and AODE).


2020 ◽  
pp. 1471082X2092711
Author(s):  
Grigorios Papageorgiou ◽  
Dimitris Rizopoulos

Dropout is a common complication in longitudinal studies, especially since the distinction between missing not at random (MNAR) and missing at random (MAR) dropout is intractable. Consequently, one starts with an analysis that is valid under MAR and then performs a sensitivity analysis by considering MNAR departures from it. To this end, specific classes of joint models, such as pattern-mixture models (PMMs) and selection models (SeMs), have been proposed. On the contrary, shared-parameter models (SPMs) have received less attention, possibly because they do not embody a characterization of MAR. A few approaches to achieve MAR in SPMs exist, but are difficult to implement in existing software. In this article, we focus on SPMs for incomplete longitudinal and time-to-dropout data and propose an alternative characterization of MAR by exploiting the conditional independence assumption, under which outcome and missingness are independent given a set of random effects. By doing so, the censoring distribution can be utilized to cover a wide range of assumptions for the missing data mechanism on the subject-specific level. This approach offers substantial advantages over its counterparts and can be easily implemented in existing software. More specifically, it offers flexibility over the assumption for the missing data generating mechanism that governs dropout by allowing subject-specific perturbations of the censoring distribution, whereas in PMMs and SeMs dropout is considered MNAR strictly.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Eleanor M. Pullenayegum

AbstractClinic-based cohort studies enroll patients on first being admitted to the clinic, and follow them as part of usual care, with interest being in the marginal mean of the outcome process. As the required frequency of follow-up varies among patients, these studies often feature irregular visit times, with no two patients sharing a visit time. Inverse-intensity weighting has been developed to handle this, however it requires that the visit process be conditionally independent of the outcome given the observed history. When patients schedule visits in response to changes in their health (for example a disease flare), the conditional independence assumption is no longer plausible, leading to biased results. We suggest additional information that can be collected to ensure that conditional independence holds, and examine how this might be used in the analysis. This allows clinic-based cohort studies to be used to determine longitudinal outcomes without incurring bias due to irregular follow-up.


2019 ◽  
Vol 8 (4) ◽  
pp. 10385-10389

Kidney Disease (CKD) implies the condition of kidney risk which may even get worse by time and by referring the factors. If it continues to get worse Dialysis is done and worstcase scenario it may lead to kidney failure (End-Stage Renal Disease). Detection of CKD in an early stage could help in sorting out the complications and damage. In the previous work classification used are SVM and Naïve Bayes, it resulted that the execution time took by Naïve Bayes is minimal compared to SVM, incorrect instances are less for SVM that results in less classification performance of Naïve Bayes, because of slight accuracy difference. It can be rectified by taking a smaller number of attributes. Naïve Bayes is a probabilistic classifier a simple computation by applying Bayes Theorem with a conditional independence assumption. The work mainly results in increasing diagnostic accuracy and decrease diagnosis time, that is the main aim. An attempt is made to develop a model evaluating CKD data gathered from a particular set of people. From the model data, identification can be done. This work has engrossed on developing a system based on classification methods: SVM, Naïve Bayes, KNN.


2019 ◽  
Vol 8 (5) ◽  
pp. 990-1017 ◽  
Author(s):  
Arnout Van Delden ◽  
Bart J Du Chatinier ◽  
Sander Scholtus

Abstract Statistical matching is a technique to combine variables in two or more nonoverlapping samples that are drawn from the same population. In the current study, the unobserved joint distribution between two target variables in nonoverlapping samples is estimated using a parametric model. A classical assumption to estimate this joint distribution is that the target variables are independent given the background variables observed in both samples. A problem with the use of this conditional independence assumption is that the estimated joint distribution may be severely biased when the assumption does not hold, which in general will be unacceptable for official statistics. Here, we explored to what extent the accuracy can be improved by the use of two types of auxiliary information: the use of a common administrative variable and the use of a small additional sample from a similar population. This additional sample is included by using the partial correlation of the target variables given the background variables or by using an EM algorithm. In total, four different approaches were compared to estimate the joint distribution of the target variables. Starting with empirical data, we show how the accuracy of the joint distribution is affected by the use of administrative data and by the size of the additional sample included via a partial correlation and through an EM algorithm. The study further shows how this accuracy depends on the strength of the relations among the target and auxiliary variables. We found that including a common administrative variable does not always improve the accuracy of the results. We further found that the EM algorithm nearly always yielded the most accurate results; this effect is largest when the explained variance of the separate target variables by the common background variables is not large.


2019 ◽  
Vol 66 (2) ◽  
pp. 203-217 ◽  
Author(s):  
Laura Juznik-Rotar

This paper evaluates the effects of the employment programme on young unemployed people in the Netherlands. The effectiveness of the programme is measured by probability of both re-employment and participation within the regular educational system. This evaluation is made in comparison to that of an individual who would continue seeking employment as an openly unemployed person. The effects of the programme are evaluated a year/two years following the start of the programme. We apply a propensity score matching method. The identification of an average treatment effect is based on the conditional independence assumption. The effects on re-employment probability and the probability of participation in the regular educational system are statistically negative, applicable to both long and short-term scenarios.


Sign in / Sign up

Export Citation Format

Share Document