scholarly journals Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

2017 ◽  
Vol 10 (04) ◽  
pp. 773-779
Author(s):  
V.B. Kamble ◽  
S.N. Deshmukh

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Author(s):  
Chisimkwuo John ◽  
Emmanuel J. Ekpenyong ◽  
Charles C. Nworu

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.


2018 ◽  
Vol 4 (1) ◽  
pp. 24
Author(s):  
Imam Halimi ◽  
Wahyu Andhyka Kusuma

Investasi saham merupakan hal yang tidak asing didengar maupun dilakukan. Ada berbagai macam saham di Indonesia, salah satunya adalah Indeks Harga Saham Gabungan (IHSG) atau dalam bahasa inggris disebut Indonesia Composite Index, ICI, atau IDX Composite. IHSG merupakan parameter penting yang dipertimbangkan pada saat akan melakukan investasi mengingat IHSG adalah saham gabungan. Penelitian ini bertujuan memprediksi pergerakan IHSG dengan teknik data mining menggunakan algoritma neural network dan dibandingkan dengan algoritma linear regression, yang dapat dijadikan acuan investor saat akan melakukan investasi. Hasil dari penelitian ini berupa nilai Root Mean Squared Error (RMSE) serta label tambahan angka hasil prediksi yang didapatkan setelah dilakukan validasi menggunakan sliding windows validation dengan hasil paling baik yaitu pada pengujian yang menggunakan algoritma neural network yang menggunakan windowing yaitu sebesar 37,786 dan pada pengujian yang tidak menggunakan windowing sebesar 13,597 dan untuk pengujian algoritma linear regression yang menggunakan windowing yaitu sebesar 35,026 dan pengujian yang tidak menggunakan windowing sebesar 12,657. Setelah dilakukan pengujian T-Test menunjukan bahwa pengujian menggunakan neural network yang dibandingkan dengan linear regression memiliki hasil yang tidak signifikan dengan nilai T-Test untuk pengujian dengan windowing dan tanpa windowing hasilnya sama, yaitu sebesar 1,000.


2014 ◽  
Vol 39 (2) ◽  
pp. 107-127 ◽  
Author(s):  
Artur Matyja ◽  
Krzysztof Siminski

Abstract The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.


Author(s):  
V. Jinubala ◽  
P. Jeyakumar

Data Mining is an emerging research field in the analysis of agricultural data. In fact the most important problem in extracting knowledge from the agriculture data is the missing values of the attributes in the selected data set. If such deficiencies are there in the selected data set then it needs to be cleaned during preprocessing of the data in order to obtain a functional data. The main objective of this paper is to analyse the effectiveness of the various imputation methods in producing a complete data set that can be more useful for applying data mining techniques and presented a comparative analysis of the imputation methods for handling missing values. The pest data set of rice crop collected throughout Maharashtra state under Crop Pest Surveillance and Advisory Project (CROPSAP) during 2009-2013 was used for analysis. The different methodologies like Deleting of rows, Mean & Median, Linear regression and Predictive Mean Matching were analysed for Imputation of Missing values. The comparative analysis shows that Predictive Mean Matching Methodology was better than other methods and effective for imputation of missing values in large data set.


2014 ◽  
Vol 51 (2) ◽  
pp. 75-88 ◽  
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Wojtek Janusz Krzanowski ◽  
Carlos Tadeu dos Santos Dias

Abstract A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.


2017 ◽  
Author(s):  
Runmin Wei ◽  
Jingye Wang ◽  
Mingming Su ◽  
Erik Jia ◽  
Tianlu Chen ◽  
...  

AbstractIntroductionMissing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection of methods can significantly affect following data analyses and interpretations. According to the definition, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).ObjectivesThe aim of this study was to comprehensively compare common imputation methods for different types of missing values using two separate metabolomics data sets (977 and 198 serum samples respectively) to propose a strategy to deal with missing values in metabolomics studies.MethodsImputation methods included zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC). Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate the imputation accuracy for MCAR/MAR and MNAR correspondingly. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes sum of squared error were used to evaluate the overall sample distribution. Student’s t-test followed by Pearson correlation analysis was conducted to evaluate the effect of imputation on univariate statistical analysis.ResultsOur findings demonstrated that RF imputation performed the best for MCAR/MAR and QRILC was the favored one for MNAR.ConclusionCombining with “modified 80% rule”, we proposed a comprehensive strategy and developed a public-accessible web-tool for missing value imputation in metabolomics data.


Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation


Author(s):  
SONALI R. MAHAKALE ◽  
NILESHSINGH V. THAKUR

This paper deals with the comparative study of research work done in the field of Image Filtering. Different noises can affect the image in different ways. Although various solutions are available for denoising them, a detail study of the research is required in order to design a filter which will fulfill the desire aspects along with handling most of the image filtering issues. An output image should be judged on the basis of Image Quality Metrics for ex-: Peak-Signal-to-Noise ratio (PSNR), Mean Squared Error (MSE) and Mean Absolute Error (MAE) and Execution Time.


Sign in / Sign up

Export Citation Format

Share Document