scholarly journals Comparison Analysis: Large Data Classification Using PLS-DA and Decision Trees

2020 ◽  
Vol 8 (2) ◽  
pp. 100-105 ◽  
Author(s):  
Nurazlina Abdul Rashid ◽  
Norashikin Nasaruddin ◽  
Kartini Kassim ◽  
Amirah Hazwani Abdul Rahim
2019 ◽  
pp. 147-169
Author(s):  
Michael Paluszek ◽  
Stephanie Thomas

2017 ◽  
Vol 23 (6) ◽  
pp. 5501-5505
Author(s):  
Mumtazimah Mohamad ◽  
Mokhairi Makhtar ◽  
Mohd Nordin Abd Rahman ◽  
Roslinda Muda

Author(s):  
A. Sheik Abdullah ◽  
R. Suganya ◽  
S. Selvakumar ◽  
S. Rajaram

Classification is considered to be the one of the data analysis technique which can be used over many applications. Classification model predicts categorical continuous class labels. Clustering mainly deals with grouping of variables based upon similar characteristics. Classification models are experienced by comparing the predicted values to that of the known target values in a set of test data. Data classification has many applications in business modeling, marketing analysis, credit risk analysis; biomedical engineering and drug retort modeling. The extension of data analysis and classification makes the insight into big data with an exploration to processing and managing large data sets. This chapter deals with various techniques, methodologies that correspond to the classification problem in data analysis process and its methodological impacts to big data.


2014 ◽  
pp. 215-223
Author(s):  
Dipak V. Patil ◽  
Rajankumar S. Bichkar

The advances and use of technology in all walks of life results in tremendous growth of data available for data mining. Large amount of knowledge available can be utilized to improve decision-making process. The data contains the noise or outlier data to some extent which hampers the classification performance of classifier built on that training data. The learning process on large data set becomes very slow, as it has to be done serially on available large datasets. It has been proved that random data reduction techniques can be used to build optimal decision trees. Thus, we can integrate data cleaning and data sampling techniques to overcome the problems in handling large data sets. In this proposed technique outlier data is first filtered out to get clean data with improved quality and then random sampling technique is applied on this clean data set to get reduced data set. This reduced data set is used to construct optimal decision tree. Experiments performed on several data sets proved that the proposed technique builds decision trees with enhanced classification accuracy as compared to classification performance on complete data set. Due to use of classification filter a quality of data is improved and sampling reduces the size of the data set. Thus, the proposed method constructs more accurate and optimal sized decision trees and it also avoids problems like overloading of memory and processor with large data sets. In addition, the time required to build a model on clean data is significantly reduced providing significant speedup.


Sign in / Sign up

Export Citation Format

Share Document