Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

Author(s):  
Nidia Rodríguez-Mazahua ◽  
Lisbeth Rodríguez-Mazahua ◽  
Asdrúbal López-Chau ◽  
Giner Alor-Hernández ◽  
S. Gustavo Peláez-Camarena
2020 ◽  
Vol 7 (2-1) ◽  
pp. 31-43
Author(s):  
Nidia Rodríguez Mazahua ◽  
Lisbeth Rodríguez Mazahua ◽  
Asdrúbal López Chau ◽  
Giner Alor Hernández

One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.


Today over 2.5 quintillion bytes of data is being created every single day where 753 crore people on this planet are creating 1.7mb of data each second. Most often than not, Researchers only scratch the surface when it comes to analyzing which algorithm will be best suited with their dataset and which one will give the highest efficiency. Sometimes, this analysis takes more computational time than the actual execution itself. Aim of this paper is to understand and solve this dilemma by applying different predictions models like Neural Networks, Regression and Decision Tree algorithms to different datasets where their performance was measured using ROC Index, Average Square Error and Misclassification Rate. A comparative analysis is done to show their best performance in different scopes and conditions. All data sets and results were compared and analyzed using SAS tool.


2021 ◽  
Author(s):  
İsmail Can Dikmen ◽  
Teoman Karadağ

Abstract Today, the storage of electrical energy is one of the most important technical challenges. The increasing number of high capacity, high-power applications, especially electric vehicles and grid energy storage, points to the fact that we will be faced with a large amount of batteries that will need to be recycled and separated in the near future. An alternative method to the currently used methods for separating these batteries according to their chemistry is discussed in this study. This method can be applied even on integrated circuits due to its ease of implementation and low operational cost. In this respect, it is also possible to use it in multi-chemistry battery management systems to detect the chemistry of the connected battery. For the implementation of the method, the batteries are connected to two different loads alternately. In this way, current and voltage values ​​are measured for two different loads without allowing the battery to relax. The obtained data is pre-processed with a separation function developed based on statistical significance. In machine learning algorithms, artificial neural network and decision tree algorithms are trained with processed data and used to determine battery chemistry with 100% accuracy. The efficiency and ease of implementation of the decision tree algorithm in such a categorization method are presented comparatively.


Sign in / Sign up

Export Citation Format

Share Document