PREDICTION OF DIABETES SCREENING BY USING DATA MINING ALGORITHMS

Diabetes is one of the most common non-communicable diseases in the world. Diabetes affects the ability to produce the hormone insulin. Thus, complications may occur if diabetes remains untreated and unidentified. That features a significant contribution to increased morbidity, mortality, and admission rates of patients in both developed and developing countries. When disease is not detected early, it leads to complications. Medical records of the cases were retrospective. Anthropometric and biochemical information was collected. From this data, four ML classification algorithms, including Decision Tree (J48), Naive-Bayes, PART rule induction, and JRIP, were used to prognosticate diabetes. Precision, recall, F-Measure, Receiver Operating Characteristics (ROC) scores, and the confusion matrix were calculated to determine the performance of the various algorithms. The performance was also measured by sensitivity and specificity. They have high classification accuracy and are generally comparable in predicting diabetes and free diabetes patients. Among the selected algorithms tested, the Decision Tree Classifier (J48) algorithm scored the highest accuracy and was the best predictor, with a classification accuracy of 92.74%.

Download Full-text

Student Performance Predictions Using Knowledge Discovery Database and Data Mining, DPU Students Records as Sample

Academic Journal of Nawroz University ◽

10.25007/ajnu.v10n3a875 ◽

2021 ◽

Vol 10 (3) ◽

pp. 121-127

Author(s):

Bareen Haval ◽

Karwan Jameel Abdulrahman ◽

Araz Rajab

Keyword(s):

Data Mining ◽

Decision Tree ◽

Student Performance ◽

Educational Data Mining ◽

Data Sets ◽

Decision Tree Classifier ◽

Data Mining Techniques ◽

Academic History ◽

Tree Classifier ◽

Using Data

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.

Download Full-text

Detecting Sugarcane Crop Yield using Decision Tree Classifier in the District of Muzaffarnagar

International Journal of Engineering and Management Research ◽

10.31033/ijemr.11.2.10 ◽

2021 ◽

Vol 11 (2) ◽

pp. 75-82

Author(s):

Ankit Kumar ◽

Anil Kumar Kapil

Keyword(s):

Decision Tree ◽

Sugar Industry ◽

Uttar Pradesh ◽

Large Degree ◽

Decision Tree Classifier ◽

Sugarcane Crop ◽

Productivity Data ◽

Tree Classifier ◽

Using Data ◽

Crop Forecasting

The district of Muzaffarnagar is the highest sugarcane producing district in Uttar Pradesh and therefore is an important industrial district as well. The district is part of Western UP and it shares the problems of the sugar industry elsewhere in the state: unpredictable demands and crop failures. In this context, predicting sugarcane demand and informing its production can turn to be just the key to solve some of the problems the industry faces. The existing crop forecasting method for the cultivation of sugarcane used in UP relies, to a large degree, on subjective details, centred on the expertise of engineers in the sugar and alcohol field and on information on input demand in the supply chain. The measurement of the utility of the sample detection using NDVI images from the SPOT sensor used in the sensor's determination over the ECMWF model was possible to infer the official productivity data reported in the previously selected municipalities and harvest. Significant features of the municipal productivity of a given village is listed in a decision tree, and out of the combinations of attributes the corresponding municipal productivity is rated as "Normal" on the average urban productivity scale. Using data from the NDVI time-series between 2013 to 2020, we can discern the three classes of productivity in the meanwhile. Findings indicate that productivity in January ranked as less than mean, mean, and more than mean. The findings were more successful for the class Vegetation, the participants of which were permitted to conclude about the pattern of the average federal productivity prior to.

Download Full-text

Prediction of Black Sigatoka Disease in Banana Plants By Data Mining Classification Techniques using Scikit for Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8714.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1273-1278

Keyword(s):

Data Mining ◽

Decision Tree ◽

Gaussian Process ◽

Linear Models ◽

Control Measures ◽

Decision Tree Classifier ◽

Black Sigatoka ◽

Tree Classifier ◽

Using Data ◽

Black Sigatoka Disease

Agriculture has been evolving since humans started cultivating plants for food consumption. As the agriculture field evolves, the disease control measures too have evolved. Now in this modern era, disease in plants can be easily identified using computers. Data mining is the process of obtaining the useful information from the data. Before the electronic era, diseases in plants are identified just by seeing the symptoms of the plants. Similarly, we can identify the diseases in plants using data mining by supplying the disease symptoms data and classify them accordingly. The purpose of this paper is focusing on the prediction of the diseases from images of black sigatoka disease and uses the following methods: MultilayerPerceptrons, SVM,KNeighborsClassifier,K-NeighborsRegressor, Gaussian Process Regressor, Gaussian Process Classifier, GaussianNB, Decision Tree Classifier, Decision Tree Regressor, linear models such as Linear Regression, RidgeCV, Lasso, ElasticNet, Logistic RegressionCV, SGD Classifier, Perceptron and Passive Aggressive Classifier and ensemble models of the above classifiers. The results are compared, and multilayer perceptron model is seen to give better results for individual classifiers and ensemble of week classifiers gives better results when ensembled. In future, a new hybrid algorithm would be used from the above algorithms for attaining better accuracy. The scikit is a library used for classification, clustering, regression, dimensionality reduction,model selection and preprocessing. Our paper discusses various classifiers used in scikit-learn library for Python and their ensembling is done. This can be applied to all the classification tasks. Classification is done for classifying the black sigatoka disease in banana from healthy leaves.This disease is the most vulnerable one among banana plants.

Download Full-text