Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Download Full-text

Imputation of Missing Values in Economic and Financial Time Series Data Using Five Principal Component Analysis (PCA) Approaches

Central Bank of Nigeria Journal of Applied Statistics ◽

10.33429/cjas.10119.3/6 ◽

2019 ◽

pp. 51-73

Author(s):

Chisimkwuo John ◽

Emmanuel J. Ekpenyong ◽

Charles C. Nworu

Keyword(s):

Private Sector ◽

Least Squares ◽

Missing Values ◽

Time Series Data ◽

Mean Squared Error ◽

Forecast Error ◽

Series Data ◽

Imputation Methods ◽

Root Mean Squared Error ◽

Squared Error

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.

Download Full-text

Prediksi Indeks Harga Saham Gabungan (IHSG) Menggunakan Algoritma Neural Network

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v4i1.25384 ◽

2018 ◽

Vol 4 (1) ◽

pp. 24

Author(s):

Imam Halimi ◽

Wahyu Andhyka Kusuma

Keyword(s):

Neural Network ◽

Data Mining ◽

Linear Regression ◽

Mean Squared Error ◽

Composite Index ◽

T Test ◽

Sliding Windows ◽

Root Mean Squared Error ◽

Squared Error

Investasi saham merupakan hal yang tidak asing didengar maupun dilakukan. Ada berbagai macam saham di Indonesia, salah satunya adalah Indeks Harga Saham Gabungan (IHSG) atau dalam bahasa inggris disebut Indonesia Composite Index, ICI, atau IDX Composite. IHSG merupakan parameter penting yang dipertimbangkan pada saat akan melakukan investasi mengingat IHSG adalah saham gabungan. Penelitian ini bertujuan memprediksi pergerakan IHSG dengan teknik data mining menggunakan algoritma neural network dan dibandingkan dengan algoritma linear regression, yang dapat dijadikan acuan investor saat akan melakukan investasi. Hasil dari penelitian ini berupa nilai Root Mean Squared Error (RMSE) serta label tambahan angka hasil prediksi yang didapatkan setelah dilakukan validasi menggunakan sliding windows validation dengan hasil paling baik yaitu pada pengujian yang menggunakan algoritma neural network yang menggunakan windowing yaitu sebesar 37,786 dan pada pengujian yang tidak menggunakan windowing sebesar 13,597 dan untuk pengujian algoritma linear regression yang menggunakan windowing yaitu sebesar 35,026 dan pengujian yang tidak menggunakan windowing sebesar 12,657. Setelah dilakukan pengujian T-Test menunjukan bahwa pengujian menggunakan neural network yang dibandingkan dengan linear regression memiliki hasil yang tidak signifikan dengan nilai T-Test untuk pengujian dengan windowing dan tanpa windowing hasilnya sama, yaitu sebesar 1,000.

Download Full-text

The impact of missing values imputation methods in cDNA microarrays on downstream data analysis

2011 28th National Radio Science Conference (NRSC) ◽

10.1109/nrsc.2011.5873605 ◽

2011 ◽

Author(s):

Vidan Fathi Ghoneim ◽

Nahed H. Solouma ◽

Yasser M. Kadah

Keyword(s):

Data Analysis ◽

Missing Values ◽

Cdna Microarrays ◽

Imputation Methods ◽

The Impact

Download Full-text

Imputation methods to deal with missing values when data mining trauma injury data

28th International Conference on Information Technology Interfaces, 2006. ◽

10.1109/iti.2006.1708480 ◽

2006 ◽

Cited By ~ 6

Author(s):

K.I. Penny ◽

T. Chesney

Keyword(s):

Data Mining ◽

Missing Values ◽

Imputation Methods ◽

Injury Data ◽

Trauma Injury

Download Full-text

Comparison of Algorithms for Clustering Incomplete Data

Foundations of Computing and Decision Sciences ◽

10.2478/fcds-2014-0007 ◽

2014 ◽

Vol 39 (2) ◽

pp. 107-127 ◽

Cited By ~ 6

Author(s):

Artur Matyja ◽

Krzysztof Siminski

Keyword(s):

Data Analysis ◽

Incomplete Data ◽

Missing Values ◽

Real Data ◽

Complete Data ◽

The Other ◽

Data Sets ◽

Missing Value ◽

Comparison Of Algorithms ◽

New Algorithms

Abstract The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.

Download Full-text

Methodologies for Imputation of Missing Values in Rice Pest Data

Current Journal of Applied Science and Technology ◽

10.9734/cjast/2021/v40i531304 ◽

2021 ◽

pp. 64-73

Author(s):

V. Jinubala ◽

P. Jeyakumar

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Missing Values ◽

Large Data ◽

Research Field ◽

Data Set ◽

Imputation Methods ◽

Predictive Mean Matching ◽

Rice Pest ◽

Better Than

Data Mining is an emerging research field in the analysis of agricultural data. In fact the most important problem in extracting knowledge from the agriculture data is the missing values of the attributes in the selected data set. If such deficiencies are there in the selected data set then it needs to be cleaned during preprocessing of the data in order to obtain a functional data. The main objective of this paper is to analyse the effectiveness of the various imputation methods in producing a complete data set that can be more useful for applying data mining techniques and presented a comparative analysis of the imputation methods for handling missing values. The pest data set of rice crop collected throughout Maharashtra state under Crop Pest Surveillance and Advisory Project (CROPSAP) during 2009-2013 was used for analysis. The different methodologies like Deleting of rows, Mean & Median, Linear regression and Predictive Mean Matching were analysed for Imputation of Missing values. The comparative analysis shows that Predictive Mean Matching Methodology was better than other methods and effective for imputation of missing values in large data set.

Download Full-text

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Biometrical Letters ◽

10.2478/bile-2014-0006 ◽

2014 ◽

Vol 51 (2) ◽

pp. 75-88 ◽

Cited By ~ 3

Author(s):

Sergio Arciniegas-Alarcón ◽

Marisol García-Peña ◽

Wojtek Janusz Krzanowski ◽

Carlos Tadeu dos Santos Dias

Keyword(s):

Missing Values ◽

Mean Squared Error ◽

Real Data ◽

Genotype By Environment Interaction ◽

Environment Interaction ◽

Simple Imputation ◽

Alternative Methodology ◽

Complete Set ◽

Rank Approximation ◽

Value Decomposition

Abstract A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

Download Full-text

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

10.1101/171967 ◽

2017 ◽

Cited By ~ 1

Author(s):

Runmin Wei ◽

Jingye Wang ◽

Mingming Su ◽

Erik Jia ◽

Tianlu Chen ◽

...

Keyword(s):

Mass Spectrometry ◽

Missing Values ◽

Pearson Correlation ◽

Imputation Accuracy ◽

Metabolomics Data ◽

Missing Value ◽

Sample Distribution ◽

Imputation Methods ◽

Missing Value Imputation ◽

Squared Error

AbstractIntroductionMissing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection of methods can significantly affect following data analyses and interpretations. According to the definition, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).ObjectivesThe aim of this study was to comprehensively compare common imputation methods for different types of missing values using two separate metabolomics data sets (977 and 198 serum samples respectively) to propose a strategy to deal with missing values in metabolomics studies.MethodsImputation methods included zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC). Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate the imputation accuracy for MCAR/MAR and MNAR correspondingly. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes sum of squared error were used to evaluate the overall sample distribution. Student’s t-test followed by Pearson correlation analysis was conducted to evaluate the effect of imputation on univariate statistical analysis.ResultsOur findings demonstrated that RF imputation performed the best for MCAR/MAR and QRILC was the favored one for MNAR.ConclusionCombining with “modified 80% rule”, we proposed a comprehensive strategy and developed a public-accessible web-tool for missing value imputation in metabolomics data.

Download Full-text

Soil Data Analysis and Crop Yield Prediction in Data Mining using R – Programming

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8683.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1857-1860

Keyword(s):

Data Mining ◽

Data Analysis ◽

Decision Tree ◽

Crop Yield ◽

Climatic Condition ◽

Research Work ◽

Yield Prediction ◽

Decision Tree Algorithm ◽

Data Set ◽

R Programming

Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation

Download Full-text

A COMPARATIVE STUDY OF IMAGE FILTERING ON VARIOUS NOISY PIXELS

International Journal of Image Processing and Vision Science ◽

10.47893/ijipvs.2013.1029 ◽

2013 ◽

pp. 157-165

Author(s):

SONALI R. MAHAKALE ◽

NILESHSINGH V. THAKUR

Keyword(s):

Comparative Study ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Signal To Noise Ratio ◽

Research Work ◽

Absolute Error ◽

Image Filtering ◽

Squared Error ◽

The Comparative Study ◽

Work Done

This paper deals with the comparative study of research work done in the field of Image Filtering. Different noises can affect the image in different ways. Although various solutions are available for denoising them, a detail study of the research is required in order to design a filter which will fulfill the desire aspects along with handling most of the image filtering issues. An output image should be judged on the basis of Image Quality Metrics for ex-: Peak-Signal-to-Noise ratio (PSNR), Mean Squared Error (MSE) and Mean Absolute Error (MAE) and Execution Time.

Download Full-text