Data Mining Approach in Preterm Birth Prediction

Purpose Owing to the consumption of considerable resources in developing physical pipe prediction models and the fact that the statistical models cannot fit the failure records perfectly, the purpose of this paper is to use data mining method to analyze and predict the risks of water pipe failure via considering attributes and location of pipes in historical failure records. One of the Automatized Machine Learning (AutoML) methods, tree-based pipeline optimization technique (TPOT) was used as the key data mining technique in this research. Design/methodology/approach By considering pipeline attributes, environmental factors and historical pipeline broke/breaks records, a water pipeline failure prediction method is proposed in this research. Regression analysis, genetic algorithm, machine learning, data mining approaches are used to analyze and predict the probability of pipeline failure. TPOT was used as the key data mining technique. A case study was carried out in a specific area in China to investigate the relationships between pipeline broke/breaks and relevant parameters, such as pipeline age, materials, diameter, pipeline density and so on. Findings By integrating the prediction models for individual pipelines and small research regions, a prediction model is developed to describe the probability of water pipe failures and validated by real data. A high fitting degree is achieved, which means a good potential of using the proposed method in reality as a guideline for identifying areas with high risks and taking proactive measures and optimizing the resources allocation for water supply companies. Originality/value Different models are developed to have better prediction on regional or individual pipeline. A comparison between the predicted values with real records has shown that a preliminary model has a good potential in predicting the future failure risks.

Download Full-text

Knowledge Structure and Data Mining Techniques

Knowledge Management ◽

10.4018/978-1-59904-933-5.ch072 ◽

2011 ◽

pp. 874-882

Author(s):

Rick L. Wilson ◽

Peter A. Rosen ◽

Mohammad Saad Al-Ahmadi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Inductive Learning ◽

Knowledge Structure ◽

Statistical Techniques ◽

Data Sets ◽

Data Mining Technique ◽

Data Mining Techniques ◽

Mining Technique ◽

Learning Techniques

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.

Download Full-text

MINING SIMILAR PATTERN WITH ATTRIBUTE ORIENTED INDUCTION HIGH LEVEL EMERGING PATTERN (AOI-HEP) DATA MINING TECHNIQUE

Jurnal Teknologi ◽

10.11113/jt.v79.11876 ◽

2017 ◽

Vol 79 (7-2) ◽

Author(s):

Harco Leslie Hendric Spits Warnars ◽

Nizirwan Anwar ◽

Richard Randriatoamanana ◽

Horacio Emilio Perez Sanchez

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Similar Pattern ◽

Frequent Pattern ◽

Data Mining Technique ◽

Mining Technique ◽

High Level ◽

Concept Attribute ◽

Emerging Pattern

AOI-HEP (Attribute Oriented Induction High Emerging Pattern) as new data mining technique has been success to mine frequent pattern and is extended to mine similar patterns. AOI-HEP is success to mine 3 and 1 similar patterns from IPUMS and breast cancer UCI machine learning datasets respectively. Meanwhile, the experiments showed that there was no finding similar patterns on adult and census UCI machine learning datasets. The experiments showed that finding AOI-HEP similar pattern in dataset is influenced by learning on chosen high level concept attribute in concept hierarchy and it is applied to AOI-HEP frequent pattern in previous research as well. The experiments chosed high level concept attributes such as workclass, clump thickness, means and marts for adult, breast cancer, census and IPUMS datasets respectively. In order to proof that the chosen high level concept attribute will influences the AOI-HEP similar pattern in dataset, then extended experiments were carried on and the finding were census dataset which had been none AOI-HEP similar pattern, had AOI-HEP similar pattern when learned on high level concept in marital attribute. Meanwhile, Breast cancer which had been had 1 AOI-HEP similar pattern, had none AOI-HEP similar pattern when learned on high level concept in attributes such as cell size, cell shape and bare nuclei. The 2 of 3 finding Similar patterns in IPUMS dataset have strong discriminant rule since having large growth rates such as 1.53% and 3.47%, and having large supports in target dataset such as 4.54% and 5.45 respectively. Moreover, there have small supports in contrasting dataset such as 2.96% and 1.57% respectively.

Download Full-text

Knowledge Structure and Data Mining Techniques

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch002 ◽

2008 ◽

pp. 9-17

Author(s):

Rick L. Wilson ◽

Peter A. Rosen ◽

Mohammad Saad Al-Ahmadi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Inductive Learning ◽

Knowledge Structure ◽

Statistical Techniques ◽

Data Sets ◽

Data Mining Technique ◽

Data Mining Techniques ◽

Mining Technique ◽

Learning Techniques

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.

Download Full-text

Data Mining Technique for Temporal Association Mining using SPN-Sigmoid Neural Networks

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.14591465 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1459-1465

Author(s):

Vaishali Sahu ◽

Anubhav Sharma ◽

Anshul Sarawagi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Temporal Association ◽

Association Mining ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Data Mining Approach for Educational Decision Support

EKSAKTA: Journal of Sciences and Data Analysis ◽

10.20885/eksakta.vol2.iss1.art5 ◽

2021 ◽

Vol 2 (1) ◽

pp. 33-44

Author(s):

Sinta Septi Pangastuti ◽

Kartika Fithriasari ◽

Nur Iriawan ◽

Wahyuni Suryaningtyas

Keyword(s):

Data Mining ◽

Random Forest ◽

Classification Accuracy ◽

Storage System ◽

Performance Criteria ◽

Data Mining Technique ◽

Mining Technique ◽

Data Mining Approach ◽

Educational Decision ◽

Boosting Algorithm

data mining techniques in education sector have begun to evolve, along with the development of technology and the amount of data that can be stored in an education database storage system. One of them is a database of Bidikmisi scholarships in Indonesia. The Bidikmisi data used in this study will be classified using classification data mining technique. The technique that used in this study is random forest in combination with boosting algorithm and bagging algorithms. These algorithms also combine with SMOTE algorithm to handling the imbalance class in dataset. Based on the performance criteria G-mean and AUC, the algorithm combines with SMOTE tended to be better. The classification accuracy of each method being more than 90%

Download Full-text

Knowledge Structure and Data Mining Techniques

Encyclopedia of Knowledge Management, Second Edition ◽

10.4018/978-1-59904-931-1.ch090 ◽

2011 ◽

pp. 946-954 ◽

Cited By ~ 1

Author(s):

Rick L. Wilson ◽

Peter A. Rosen ◽

Mohammad Saad Al-Ahmadi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Inductive Learning ◽

Knowledge Structure ◽

Statistical Techniques ◽

Data Sets ◽

Data Mining Technique ◽

Data Mining Techniques ◽

Mining Technique ◽

Learning Techniques

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.

Download Full-text

Knowledge Structure and Data Mining Techniques

Encyclopedia of Knowledge Management ◽

10.4018/978-1-59140-573-3.ch068 ◽

2011 ◽

pp. 523-529

Author(s):

Rick L. Wilson ◽

Peter A. Rosen ◽

Mohammad Saad Al-Ahmadi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Inductive Learning ◽

Knowledge Structure ◽

Statistical Techniques ◽

Data Sets ◽

Data Mining Technique ◽

Data Mining Techniques ◽

Mining Technique ◽

Learning Techniques

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.

Download Full-text

Detection of Boiler Tube Leakage Fault in a Thermal Power Plant Using Machine Learning Based Data Mining Technique

2019 IEEE International Conference on Industrial Technology (ICIT) ◽

10.1109/icit.2019.8755058 ◽

2019 ◽

Author(s):

Kyu han Kim ◽

Heung seok Lee ◽

Jung hwan Kim ◽

June Ho Park

Keyword(s):

Machine Learning ◽

Data Mining ◽

Power Plant ◽

Thermal Power Plant ◽

Thermal Power ◽

Boiler Tube ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Generic Disease Prediction using Symptoms with Supervised Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952297 ◽

2019 ◽

pp. 1082-1086 ◽

Cited By ~ 1

Author(s):

Ashish Kailash Pal ◽

Pritam Rawal ◽

Rahil Ruwala ◽

Vaibhavi Patel

Keyword(s):

Machine Learning ◽

Data Mining ◽

Supervised Machine Learning ◽

Diagnostic Model ◽

Disease Prediction ◽

Data Mining Technique ◽

Mining Technique ◽

The Common ◽

Using Data ◽

Health Organization

Data Mining and Machine Learning plays most inspiring area of research that become most popular in health organization. It also plays a vital part to uncover new patterns in medicinal science and services association which thusly accommodating for all the parties associated with this field. This project intend to form a diagnostic model of the common diseases based on the symptoms by using data mining technique such as classification in health domain. In this project, we are going to use algorithms like Random forest, Naive Bayes which can be utilized for health care diagnosis. Performances of the classifiers are compared to each other to find out highest accuracy. This also helps us to find out persons who are affected by the infection. The test based on the outcomes of the diseases.

Download Full-text