scholarly journals Performance Analysis of Predictive Models using Generic Datasets

Today over 2.5 quintillion bytes of data is being created every single day where 753 crore people on this planet are creating 1.7mb of data each second. Most often than not, Researchers only scratch the surface when it comes to analyzing which algorithm will be best suited with their dataset and which one will give the highest efficiency. Sometimes, this analysis takes more computational time than the actual execution itself. Aim of this paper is to understand and solve this dilemma by applying different predictions models like Neural Networks, Regression and Decision Tree algorithms to different datasets where their performance was measured using ROC Index, Average Square Error and Misclassification Rate. A comparative analysis is done to show their best performance in different scopes and conditions. All data sets and results were compared and analyzed using SAS tool.

2020 ◽  
Vol 7 (2-1) ◽  
pp. 31-43
Author(s):  
Nidia Rodríguez Mazahua ◽  
Lisbeth Rodríguez Mazahua ◽  
Asdrúbal López Chau ◽  
Giner Alor Hernández

One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.


Author(s):  
Serkan Eti

Quantitative methods are mainly preferred in the literature. The main purpose of this chapter is to evaluate the usage of quantitative methods in the subject of the investment decision. Within this framework, the studies related to the investment decision in which quantitative methods are taken into consideration. As for the quantitative methods, probit, logit, decision tree algorithms, artificial neural networks methods, Monte Carlo simulation, and MARS approaches are taken into consideration. The findings show that MARS methodology provides a more accurate results in comparison with other techniques. In addition to this situation, it is also concluded that probit and logit methodologies were less preferred in comparison with decision tree algorithms, artificial neural networks methods, and Monte Carlo simulation analysis, especially in the last studies. Therefore, it is recommended that a new evaluation for investment analysis can be performed with MARS method because it is understood that this approach provides better results.


2021 ◽  
Vol 6 (2) ◽  
pp. 128-133
Author(s):  
Ihor Koval ◽  

The problem of finding objects in images using modern computer vision algorithms has been considered. The description of the main types of algorithms and methods for finding objects based on the use of convolutional neural networks has been given. A comparative analysis and modeling of neural network algorithms to solve the problem of finding objects in images has been conducted. The results of testing neural network models with different architectures on data sets VOC2012 and COCO have been presented. The results of the study of the accuracy of recognition depending on different hyperparameters of learning have been analyzed. The change in the value of the time of determining the location of the object depending on the different architectures of the neural network has been investigated.


Sensors ◽  
2020 ◽  
Vol 20 (1) ◽  
pp. 322 ◽  
Author(s):  
Faraz Malik Awan ◽  
Yasir Saleem ◽  
Roberto Minerva ◽  
Noel Crespi

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander’s parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.


Data Mining ◽  
2013 ◽  
pp. 1819-1834
Author(s):  
Alan Olinsky ◽  
Phyllis A. Schumacher ◽  
John Quinn

One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.


Author(s):  
Nidia Rodríguez-Mazahua ◽  
Lisbeth Rodríguez-Mazahua ◽  
Asdrúbal López-Chau ◽  
Giner Alor-Hernández ◽  
S. Gustavo Peláez-Camarena

Author(s):  
MAJURA F. SELEKWA ◽  
VALERIAN KWIGIZILE ◽  
RENATUS N. MUSSA

Many neural network methods used for efficient classification of populations work only when the population is globally separable. In situ classification of highway vehicles is one of the problems with globally nonseparable populations. This paper presents a systematic procedure for setting up a probabilistic neural network that can classify the globally nonseparable population of highway vehicles. The method is based on a simple concept that any set of classifiable data can be broken down to subclasses of locally separable data. Hence, if these locally separable data can be identified, then the classification problem can be carried out in two hierarchical steps; step one classifies the data according to the local subclasses, and step two classifies the local subclasses into the global classes. The proposed approach was tested on the problem of classifying highway vehicles according to the US Federal Highway Administration standard, which is normally handled by decision tree methods that use vehicle axle information and a set of IF-THEN rules. By using a sample of 3326 vehicles, the proposed method showed improved classification results with an overall misclassification rate of only 2.9% compared to 9.7% of the decision tree methods. A similar setup can be used with different neural networks such as recurrent neural networks, but they were not tested in this study especially since the focus was for in situ applications where a high learning rate is desired.


Sign in / Sign up

Export Citation Format

Share Document