scholarly journals Data Mining Approach in Preterm Birth Prediction

2010 ◽  
Vol 9 (1) ◽  
pp. 18-30 ◽  
Author(s):  
Jyothi Thomas ◽  
G. Kulanthaivel

Data mining refers to the process of discovering patterns in data, typically with the aid of powerful algorithms to automate part of the search. These methods come from the disciplines such as statistics, machine learning, pattern recognition, neural networks and database. In particular this paper reveals out how the problem of preterm birth prediction is approached by a data mining analyst with a background in machine learning. In the health field, data mining applications have been growing considerably as it can be used to directly derive patterns, which are relevant to forecast different risk groups among the patients. Data mining technique such as clustering has not been used to predict preterm birth. Hence this paper made an attempt to identify patterns from the database of the preterm birth patients using clustering.

Facilities ◽  
2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Cheng Zhang ◽  
Zehao Ye

Purpose Owing to the consumption of considerable resources in developing physical pipe prediction models and the fact that the statistical models cannot fit the failure records perfectly, the purpose of this paper is to use data mining method to analyze and predict the risks of water pipe failure via considering attributes and location of pipes in historical failure records. One of the Automatized Machine Learning (AutoML) methods, tree-based pipeline optimization technique (TPOT) was used as the key data mining technique in this research. Design/methodology/approach By considering pipeline attributes, environmental factors and historical pipeline broke/breaks records, a water pipeline failure prediction method is proposed in this research. Regression analysis, genetic algorithm, machine learning, data mining approaches are used to analyze and predict the probability of pipeline failure. TPOT was used as the key data mining technique. A case study was carried out in a specific area in China to investigate the relationships between pipeline broke/breaks and relevant parameters, such as pipeline age, materials, diameter, pipeline density and so on. Findings By integrating the prediction models for individual pipelines and small research regions, a prediction model is developed to describe the probability of water pipe failures and validated by real data. A high fitting degree is achieved, which means a good potential of using the proposed method in reality as a guideline for identifying areas with high risks and taking proactive measures and optimizing the resources allocation for water supply companies. Originality/value Different models are developed to have better prediction on regional or individual pipeline. A comparison between the predicted values with real records has shown that a preliminary model has a good potential in predicting the future failure risks.


2011 ◽  
pp. 874-882
Author(s):  
Rick L. Wilson ◽  
Peter A. Rosen ◽  
Mohammad Saad Al-Ahmadi

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.


2017 ◽  
Vol 79 (7-2) ◽  
Author(s):  
Harco Leslie Hendric Spits Warnars ◽  
Nizirwan Anwar ◽  
Richard Randriatoamanana ◽  
Horacio Emilio Perez Sanchez

AOI-HEP (Attribute Oriented Induction High Emerging Pattern) as new data mining technique has been success to mine frequent pattern and is extended to mine similar patterns. AOI-HEP is success to mine 3 and 1 similar patterns from IPUMS and breast cancer UCI machine learning datasets respectively. Meanwhile, the experiments showed that there was no finding similar patterns on adult and census UCI machine learning datasets. The experiments showed that finding AOI-HEP similar pattern in dataset is influenced by learning on chosen high level concept attribute in concept hierarchy and it is applied to AOI-HEP frequent pattern in previous research as well. The experiments chosed high level concept attributes such as workclass, clump thickness, means and marts for adult, breast cancer, census and IPUMS datasets respectively. In order to proof that the chosen high level concept attribute will influences the AOI-HEP similar pattern in dataset, then extended experiments were carried on and the finding were census dataset which had been none AOI-HEP similar pattern, had AOI-HEP similar pattern when learned on high level concept in marital attribute. Meanwhile, Breast cancer which had been had 1 AOI-HEP similar pattern, had none AOI-HEP similar pattern when learned on high level concept in attributes such as cell size, cell shape and bare nuclei. The 2 of 3 finding Similar patterns in IPUMS dataset have strong discriminant rule since having large growth rates such as 1.53% and 3.47%, and having large supports in target dataset such as 4.54% and 5.45 respectively. Moreover, there have small supports in contrasting dataset such as 2.96% and 1.57% respectively.         


Author(s):  
Rick L. Wilson ◽  
Peter A. Rosen ◽  
Mohammad Saad Al-Ahmadi

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.


2021 ◽  
Vol 2 (1) ◽  
pp. 33-44
Author(s):  
Sinta Septi Pangastuti ◽  
Kartika Fithriasari ◽  
Nur Iriawan ◽  
Wahyuni Suryaningtyas

data mining techniques in education sector have begun to evolve, along with the development of technology and the amount of data that can be stored in an education database storage system. One of them is a database of Bidikmisi scholarships in Indonesia. The Bidikmisi data used in this study will be classified using classification data mining technique. The technique that used in this study is random forest in combination with boosting algorithm and bagging algorithms. These algorithms also combine with SMOTE algorithm to handling the imbalance class in dataset. Based on the performance criteria G-mean and AUC, the algorithm combines with SMOTE tended to be better. The classification accuracy of each method being more than 90%


Author(s):  
Rick L. Wilson ◽  
Peter A. Rosen ◽  
Mohammad Saad Al-Ahmadi

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.


Author(s):  
Rick L. Wilson ◽  
Peter A. Rosen ◽  
Mohammad Saad Al-Ahmadi

Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g., Lim, Low, & Shih, 2000). The goal of these studies is to try to determine which data mining technique performs best under what circumstances. Results are often conflicting—for instance, some articles find that neural networks (NN) outperform both traditional statistical techniques and inductive learning techniques, but then the opposite is found with other datasets (Sen & Gibbs, 1994; Sung, Chang, & Lee, 1999: Spangler, May, & Vargas, 1999). Most of these studies use publicly available datasets in their analysis, and because they are not artificially created, it is difficult to control for possible data characteristics in the analysis. Another drawback of these datasets is that they are usually very small.


Author(s):  
Ashish Kailash Pal ◽  
Pritam Rawal ◽  
Rahil Ruwala ◽  
Vaibhavi Patel

Data Mining and Machine Learning plays most inspiring area of research that become most popular in health organization. It also plays a vital part to uncover new patterns in medicinal science and services association which thusly accommodating for all the parties associated with this field. This project intend to form a diagnostic model of the common diseases based on the symptoms by using data mining technique such as classification in health domain. In this project, we are going to use algorithms like Random forest, Naive Bayes which can be utilized for health care diagnosis. Performances of the classifiers are compared to each other to find out highest accuracy. This also helps us to find out persons who are affected by the infection. The test based on the outcomes of the diseases.


Sign in / Sign up

Export Citation Format

Share Document