scholarly journals MINING SIMILAR PATTERN WITH ATTRIBUTE ORIENTED INDUCTION HIGH LEVEL EMERGING PATTERN (AOI-HEP) DATA MINING TECHNIQUE

2017 ◽  
Vol 79 (7-2) ◽  
Author(s):  
Harco Leslie Hendric Spits Warnars ◽  
Nizirwan Anwar ◽  
Richard Randriatoamanana ◽  
Horacio Emilio Perez Sanchez

AOI-HEP (Attribute Oriented Induction High Emerging Pattern) as new data mining technique has been success to mine frequent pattern and is extended to mine similar patterns. AOI-HEP is success to mine 3 and 1 similar patterns from IPUMS and breast cancer UCI machine learning datasets respectively. Meanwhile, the experiments showed that there was no finding similar patterns on adult and census UCI machine learning datasets. The experiments showed that finding AOI-HEP similar pattern in dataset is influenced by learning on chosen high level concept attribute in concept hierarchy and it is applied to AOI-HEP frequent pattern in previous research as well. The experiments chosed high level concept attributes such as workclass, clump thickness, means and marts for adult, breast cancer, census and IPUMS datasets respectively. In order to proof that the chosen high level concept attribute will influences the AOI-HEP similar pattern in dataset, then extended experiments were carried on and the finding were census dataset which had been none AOI-HEP similar pattern, had AOI-HEP similar pattern when learned on high level concept in marital attribute. Meanwhile, Breast cancer which had been had 1 AOI-HEP similar pattern, had none AOI-HEP similar pattern when learned on high level concept in attributes such as cell size, cell shape and bare nuclei. The 2 of 3 finding Similar patterns in IPUMS dataset have strong discriminant rule since having large growth rates such as 1.53% and 3.47%, and having large supports in target dataset such as 4.54% and 5.45 respectively. Moreover, there have small supports in contrasting dataset such as 2.96% and 1.57% respectively.         

Author(s):  
Harco Leslie Hendric Spits Warnars

<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>


Author(s):  
Harco Leslie Hendric Spits Warnars

<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>


Facilities ◽  
2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Cheng Zhang ◽  
Zehao Ye

Purpose Owing to the consumption of considerable resources in developing physical pipe prediction models and the fact that the statistical models cannot fit the failure records perfectly, the purpose of this paper is to use data mining method to analyze and predict the risks of water pipe failure via considering attributes and location of pipes in historical failure records. One of the Automatized Machine Learning (AutoML) methods, tree-based pipeline optimization technique (TPOT) was used as the key data mining technique in this research. Design/methodology/approach By considering pipeline attributes, environmental factors and historical pipeline broke/breaks records, a water pipeline failure prediction method is proposed in this research. Regression analysis, genetic algorithm, machine learning, data mining approaches are used to analyze and predict the probability of pipeline failure. TPOT was used as the key data mining technique. A case study was carried out in a specific area in China to investigate the relationships between pipeline broke/breaks and relevant parameters, such as pipeline age, materials, diameter, pipeline density and so on. Findings By integrating the prediction models for individual pipelines and small research regions, a prediction model is developed to describe the probability of water pipe failures and validated by real data. A high fitting degree is achieved, which means a good potential of using the proposed method in reality as a guideline for identifying areas with high risks and taking proactive measures and optimizing the resources allocation for water supply companies. Originality/value Different models are developed to have better prediction on regional or individual pipeline. A comparison between the predicted values with real records has shown that a preliminary model has a good potential in predicting the future failure risks.


2014 ◽  
Vol 14 (2) ◽  
pp. 5419-5431 ◽  
Author(s):  
Maha Fouad ◽  
Dr.Mahmoud M. Abd ellatif ◽  
Prof.Mohamed Hagag ◽  
Dr.Ahmed Akl

Predicting the outcome of a graft transplant with high level of accuracy is a challenging task In medical fields and Data Mining has a great role to answer the challenge. The goal of this study is to compare the performances and features of data mining technique namely Decision Tree , Rule Based Classifiers with Compare to Logistic Regression as a standard statistical data mining method to predict the outcome of kidney transplants over a 5-year horizon. The dataset was compiled from the Urology and Nephrology Center (UNC), Mansoura, Egypt. classifiers were developed using the Weka machine learning software workbench by applying Rule Based Classifiers (RIPPER, DTNB),Decision Tree Classifiers (BF,J48 ) and Logistic Regression. Further from Experimental Results, it has been found that Decision Tree and Rule Based classifiers are providing improved Accuracy and interpretable models compared to other Classifier.


2020 ◽  
Vol 9 (01) ◽  
Author(s):  
Erika Yunuen Morales Mateos ◽  
María Arely López Garrido ◽  
Laura López Díaz

The purpose of this research was to identify groups of students characterized by their student commitment. There were 31 participating students belonging to careers related to information technology from a university un the southern of Mexico. For this, the authors applied the UWES-S with a series of questions related to the academic fields. The data mining technique called clustering was subsequently applied to identify the group using the WEKA tool. It is highlighted as a result that the group of women has high levels of student’s commitment, vigor and absorption, compared to men, who have a high level of dedication.


Sign in / Sign up

Export Citation Format

Share Document