Evaluating Score Normalization Methods in Data Fusion

Author(s):  
Shengli Wu ◽  
Fabio Crestani ◽  
Yaxin Bi
2021 ◽  
Vol 5 (1) ◽  
pp. 114-122
Author(s):  
Gde Agung Brahmana Suryanegara ◽  
Adiwijaya ◽  
Mahendra Dwifebri Purbolaksono

Diabetes is a disease caused by high blood sugar in the body or beyond normal limits. Diabetics in Indonesia have experienced a significant increase, Basic Health Research states that diabetics in Indonesia were 6.9% to 8.5% increased from 2013 to 2018 with an estimated number of sufferers more than 16 million people. Therefore, it is necessary to have a technology that can detect diabetes with good performance, accurate level of analysis, so that diabetes can be treated early to reduce the number of sufferers, disabilities, and deaths. The different scale values for each attribute in Gula Karya Medika’s data can complicate the classification process, for this reason the researcher uses two data normalization methods, namely min-max normalization, z-score normalization, and a method without data normalization with Random Forest (RF) as a classification method. Random Forest (RF) as a classification method has been tested in several previous studies. Moreover, this method is able to produce good performance with high accuracy. Based on the research results, the best accuracy is model 1 (Min-max normalization-RF) of 95.45%, followed by model 2 (Z-score normalization-RF) of 95%, and model 3 (without data normalization-RF) of 92%. From these results, it can be concluded that model 1 (Min-max normalization-RF) is better than the other two data normalization models and is able to increase the performance of classification Random Forest by 95.45%.  


2017 ◽  
Vol 5 (4) ◽  
pp. 319 ◽  
Author(s):  
Adel S. Eesa ◽  
Wahab Kh. Arabo

Neural Networks (NN) have been used by many researchers to solve problems in several domains including classification and pattern recognition, and Backpropagation (BP) which is one of the most well-known artificial neural network models. Constructing effective NN applications relies on some characteristics such as the network topology, learning parameter, and normalization approaches for the input and the output vectors. The Input and the output vectors for BP need to be normalized properly in order to achieve the best performance of the network. This paper applies several normalization methods on several UCI datasets and comparing between them to find the best normalization method that works better with BP. Norm, Decimal scaling, Mean-Man, Median-Mad, Min-Max, and Z-score normalization are considered in this study. The comparative study shows that the performance of Mean-Mad and Median-Mad is better than the all remaining methods. On the other hand, the worst result is produced with Norm method.


Author(s):  
Dwianti Westari ◽  

The diabetes classification system is very useful in the health sector. This paper discusses the classification system for diabetes using the K-Means algorithm. The Pima Indian Diabetes (PID) dataset is used to train and evaluate this algorithm. The unbalanced value range in the attributes affects the quality of the classification result, so it is necessary to preprocess the data which is expected to improve the accuracy of the PID dataset classification result. Two types of preprocessing methods are used that are min-max normalization and z-score normalization. These two normalization methods are used and the classification accuracies are compared. Before the data classification process is carried out, the data is divided into training data and test data. The result of the classification test using the K-Means algorithm has shown that the best accuracy lies in the PID dataset which has been normalized using the min-max normalization method, which 79% compared to z-score normalization.


2009 ◽  
Vol 6 (6) ◽  
pp. 10447-10477 ◽  
Author(s):  
L. Zhang ◽  
M. Xu ◽  
M. Huang ◽  
G. Yu

Abstract. Modeling ecosystem carbon cycle on the regional and global scales is crucial to the prediction of future global atmospheric CO2 concentration and thus global temperature which features large uncertainties due mainly to the limitations in our knowledge and in the climate and ecosystem models. There is a growing body of research on parameter estimation against available carbon measurements to reduce model prediction uncertainty at regional and global scales. However, the systematic errors with the observation data have rarely been investigated in the optimization procedures in previous studies. In this study, we examined the feasibility of reducing the impact of systematic errors on parameter estimation using normalization methods, and evaluated the effectiveness of three normalization methods (i.e. maximum normalization, min-max normalization, and z-score normalization) on inversing key parameters, for example the maximum carboxylation rate (Vcmax,25) at a reference temperature of 25°C, in a process-based ecosystem model for deciduous needle-leaf forests in northern China constrained by the leaf area index (LAI) data. The LAI data used for parameter estimation were composed of the model output LAI (truth) and various designated systematic errors and random errors. We found that the estimation of Vcmax,25 could be severely biased with the composite LAI if no normalization was taken. Compared with the maximum normalization and the min-max normalization methods, the z-score normalization method was the most robust in reducing the impact of systematic errors on parameter estimation. The most probable values of estimated Vcmax,25 inversed by the z-score normalized LAI data were consistent with the true parameter values as in the model inputs though the estimation uncertainty increased with the magnitudes of random errors in the observations. We concluded that the z-score normalization method should be applied to the observed or measured data to improve model parameter estimation, especially when the potential errors in the constraining (observation) datasets are unknown.


1995 ◽  
Vol 6 (1) ◽  
pp. 227-236 ◽  
Author(s):  
X. E. Gros ◽  
P. Strachan ◽  
D. W. Lowden
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document