scholarly journals Machine Learning by Data Mining REPTree and M5P for Predicating Novel Information for PM10

2020 ◽  
pp. 40-48
Author(s):  
Yas Alsultanny

We examined data mining as a technique to extract knowledge from database to predicate PM10 concentration related to meteorological parameters. The purpose of this paper is to compare between the two types of machine learning by data mining decision tree algorithms Reduced Error Pruning Tree (REPTree) and divide and conquer M5P to predicate Particular Matter 10 (PM10) concentration depending on meteorological parameters. The results of the analysis showed M5P tree gave higher correlation compared with REPTree, moreover lower errors, and higher number of rules, the elapsed time for processing REPTree is less than the time processing of M5P. Both of these trees proved that humidity absorbed PM10. The paper recommends REPTree and M5P for predicting PM10 and other pollution gases.

2013 ◽  
Vol 380-384 ◽  
pp. 1469-1472
Author(s):  
Gui Jun Shan

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.


Decision tree algorithms, being accurate and comprehensible classifiers, have been one of the most widely used classifiers in data mining and machine learning. However, like many other classification algorithms, decision tree algorithms focus on extracting patterns with high generality and in the process, these ignore some rare but useful and interesting patterns that may exist in small disjuncts of data. Such extraordinary patterns with low support and high confidence capture very specific but exceptional behavior present in data. This paper proposes a novel Enhanced Decision Tree Algorithm for Discovering Intra and Inter-class Exceptions (EDTADE). Intra-class exceptions cover objects of unique interest within a class whereas inter-class exceptions capture rare conditions due to which we are forced shift the class of few unusual objects. For instance, whales and bats are examples of intra-class exceptions since these have unique characteristics within the class of mammals. Further, most of the birds are flying creatures, but the rare birds, like penguin and ostrich fall in the category of no flying birds. Here, penguin and ostrich are inter-class exceptions. In fact, without knowing about such exceptional patterns, our knowledge about a domain is incomplete. We have enhanced the decision tree algorithm by defining a framework for capturing intra and inter-class exceptions at leaf nodes of a decision tree. The proposed algorithm (EDTADE) is applied to many datasets from UCI Machine Learning Repository. The results show that the EDTADE has been successful in discovering many intra and inter-class exceptions. The decision tree augmented with intra and inter-class exceptions are more accurate, comprehensible as well as interesting since these provide additional knowledge in the form of exceptional patterns that deviate from the general rules discovered for classification


Author(s):  
Tanujit Chakraborty

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal up to some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.


Author(s):  
Arundhati Navada ◽  
Aamir Nizam Ansari ◽  
Siddharth Patil ◽  
Balwant A. Sonkamble

Author(s):  
Yingjun Shen ◽  
Zhe Song ◽  
Andrew Kusiak

Abstract Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, the testing scores of KNN, CART and DNN algorithm are increased by 44.78%, 32.72% and 9.13% with our proposed process. We demonstrated the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow “Less is More” philosophy.


2020 ◽  
Vol 11 (1) ◽  
pp. 1-8
Author(s):  
Aqib Ali ◽  
Jamal Abdul Nasir ◽  
Muhammad Munawar Ahmed ◽  
Samreen Naeem ◽  
Sania Anam ◽  
...  

Background: Humans can deliver many emotions during a conversation. Facial expressions show information about emotions. Objectives: This study proposed a Machine Learning (ML) approach based on a statistical analysis of emotion recognition using facial expression through a digital image. Methodology: A total of 600 digital image datasets divided into 6 classes (Anger, Happy, Fear, Surprise, Sad, and Normal) was collected from publicly available Taiwan Facial Expression Images Database. In the first step, all images are converted into a gray level format and 4 Regions of Interest (ROIs) are created on each image, so the total image dataset gets divided in 2400 (600 x 4) sub-images. In the second step, 3 types of statistical features named texture, histogram, and binary feature are extracted from each ROIs. The third step is a statistical feature optimization using the best-first search algorithm. Lastly, an optimized statistical feature dataset is deployed on various ML classifiers. Results: The analysis part was divided into two phases: firstly boosting algorithms-based ML classifiers (named as LogitBoost, AdaboostM1, and Stacking) which obtained 94.11%, 92.15%, and 89.21% accuracy, respectively. Secondly, decision tree algorithms named J48, Random Forest, and Random Committee were obtained with 97.05%, 93.14%, and 92.15% accuracy, respectively. Conclusion: It was observed that decision tree based J48 classifiers gave 97.05% classification accuracy.


2017 ◽  
Vol 2 (2) ◽  
pp. 220-233
Author(s):  
Luluk Elvitaria

Extracurricular activities are additional activities in schools, where through this activity, students can add or explore the skills of students in self-development efforts. One of the extracurricular activities is foreign language extracurricular activities, covering 5 languages ​​namely Arabic, English, German, Mandarin, Japanese. In knowing students' interest in extracurricular activities, a study was conducted on the level of interest in extracurricular activities, namely foreign languages, students at the Vocational School Health Analyst Abdurrab. In predicting the level of interest in foreign languages ​​by the process of data mining using the C45 Algorithm method. C45 algorithm is a group of Decision Tree Algorithms. From this research, the school can find out the extent of interest in foreign languages ​​in students and schools can increase extracurricular activities and students can develop their interest in foreign languages ​​as they wish.


2017 ◽  
Vol 163 (8) ◽  
pp. 15-19 ◽  
Author(s):  
Bhumika Gupta ◽  
Aditya Rawat ◽  
Akshay Jain ◽  
Arpit Arora ◽  
Naresh Dhami

SISFOTENIKA ◽  
2018 ◽  
Vol 8 (1) ◽  
pp. 35
Author(s):  
Nadia Zulfa Rahma ◽  
Andik Setyono

Kelulusan siswa dalam menghadapi ujian nasional dapat dijadikan sebagai tolak ukur sejauh mana siswa memahami materi-materi yang di dapat saat di sekolah. Untuk mendapatkan nilai tuntas pada ujian nasional, sekolah mengadakan latihan try out yang bertujuan agar siswa<br />lebih siap menghadapi ujian. Dalam pelaksanaan try out, tidak semua siswa dapat menyelesaikan soal-soal dengan benar. Hal tersebut  berdampak pada hasil nilai try out yang buruk. Maka dari itu, sekolah membutuhkan sebuah data mining dengan metode klasifikasi yang dapat membantu memprediksi kesiapan siswa menghadapi ujian nasional. Metode tersebut mengolah data yang memiliki perbedaan atribut kemudian disusun ke dalam kategori yang sesuai. Data tersebut di prediksi menggunakan algoritma Decision Tree, yang merupakan<br />salah satu machine learning menggunakan perhitungan probabilitas. Penggunaan algoritma ini didukung dengan simulasi yang dilakukan menggunakan aplikasi RapidMiner dan mendapatkan nilai akurasi sebesar 99,48%. Dari hasil simulasi tersebut, diolah kembali menjadi sebuah aplikasi yang ditujukan untuk membantu pihak sekolah. Untuk menguji kegunaan dari aplikasi tersebut dilakukan penyebaran kuesioner berjumlah 10 soal ke 20 guru dan memperoleh hasil index 83,3% yang berarti memuaskan. Dengan begitu, aplikasi tersebut berguna bagi pihak<br />sekolah. <br />Kata Kunci— Ujian Nasional, Tryout, Data Mining, Klasifikasi, Decision Tree.


Author(s):  
Dimitris Kalles ◽  
Athanasios Pagagelis

Decision trees are one of the most successful Machine Learning paradigms. This paper presents a library of decision tree algorithms in Java that was eventually used as a programming laboratory workbench. The initial design focus was, as regards the non-expert user, to conduct experiments with decision trees using components and visual tools that facilitate tree construction and manipulation and as regards the expert user, to be able to focus on algorithm design and comparison with few implementation details. The system has been built over a number of years and over various development contexts and has been successfully used as a workbench in a programming laboratory for junior computer science students. The underlying philosophy was to achieve a solid introduction to object-oriented concepts and practices based on a fundamental machine learning paradigm.


Sign in / Sign up

Export Citation Format

Share Document