Missing Value Imputation Using ANN Optimized by Genetic Algorithm

Missing value arises in almost all serious statistical analyses and creates numerous problems in processing data in databases. In real world applications, information may be missing due to instrumental errors, optional fields and non-response to some questions in surveys, data entry errors, etc. Most of the data mining techniques need analysis of complete data without any missing information and this induces researchers to develop efficient methods to handle them. It is one of the most important areas where research is being carried out for a long time in various domains. The objective of this article is to handle missing data, using an evolutionary (genetic) algorithm including some relatively simple methodologies that can often yield reasonable results. The proposed method uses genetic algorithm and multi-layer perceptron (MLP) for accurately predicting missing data with higher accuracy.

Download Full-text

Optimasi Naive Bayes Menggunakan Algoritma Genetika Sebagai Seleksi Fitur Untuk Memprediksi Performa Siswa

Jurnal Ilmiah Teknologi Informasi Asia ◽

10.32815/jitika.v14i1.400 ◽

2020 ◽

Vol 14 (1) ◽

pp. 31

Author(s):

Suhendro Busono

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Student Performance ◽

Parent Education ◽

Naive Bayes ◽

Electronic Media ◽

Naïve Bayes ◽

Parent Support ◽

Long Time ◽

Parent Relation

In this globalisation era, the morality tenegers decrease.This fenomena can be seen on mass or electronic media. Mass or electronic media inform that the negatif case often happend on teenegers community. Negatif case such as brawl, drug, gambling, rape, disobidience to parents, and others. The cause of negatif case is not from himself or hisself but it is triggered by bad customs. The less of parent attention, the low of parent relation quality can inflict bad customs from children. Parent education, parent job, the parent support of education can influence children mainset. How long time children study, how long time children have sparetime, how long time children make friend, and how long time children acess internet can influence mainset of children. The customs of children explained on sentences before, can be measured by science and tecnology. Data Mining that is branch of computer science can measure how much quality children or adult perform based on custom framer indicator. In the last research of student performance using Naive Bayes Methode, the number of attribute is too much (33 attribut) and the score of accuracy is 91.15 %. In this research, the researcher optimize attributes of the last research using Genetic Algorithm. Genetic Algorithm can choose relevant attribut. The choice of relevant attributes can increase score of accuracy. The score of accuracy after using Genetic Algorithm is 97.21 %.

Download Full-text

A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp3222-3228 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3222

Author(s):

Madhu G ◽

Nagachandrika G

Keyword(s):

Data Mining ◽

Missing Data ◽

Nearest Neighbour ◽

Data Imputation ◽

New Paradigm ◽

Missing Value ◽

Complete Dataset ◽

Data Value ◽

Value Estimation ◽

Missing Value Estimation

Many real-world applications encountered a common issue in data analysis is the presence of missing data value and challenging task in many applications such as wireless sensor networks, medical applications and psychological domain and others. Learning and prediction in the presence of missing value can be treacherous in machine learning, data mining and statistical analysis. A missing value can signify important information about dataset in the mining process. Handling missing data value is a challenging task for the data mining process. In this paper, we propose new paradigm for the development of data imputation method for missing data value estimation based on centroids and the nearest neighbours. Firstly, identify clusters based on the k-means algorithm and calculate centroids and the nearest neighbour data records. Secondly, the nearest distances from complete dataset as well as incomplete dataset from the centroids and estimated the nearest data record which tends to be curse dimensionality. Finally, impute the missing value based nearest neighbour record using statistical measure called z-score. The experimental study demonstrates strengthen of the proposed paradigm for the imputation of the missing data value estimation in dataset. Tests have been run using different types of datasets in order to validate our approach and compare the results with other imputation methods such as KNNI, SVMI, WKNNI, KMI and FKNNI. The proposed approach is geared towards maximizing the utility of imputation with respect to missing data value estimation.

Download Full-text

A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.10632 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3222

Author(s):

Madhu G ◽

Nagachandrika G

Keyword(s):

Data Mining ◽

Missing Data ◽

Nearest Neighbour ◽

Data Imputation ◽

New Paradigm ◽

Missing Value ◽

Complete Dataset ◽

Data Value ◽

Value Estimation ◽

Missing Value Estimation

Many real-world applications encountered a common issue in data analysis is the presence of missing data value and challenging task in many applications such as wireless sensor networks, medical applications and psychological domain and others. Learning and prediction in the presence of missing value can be treacherous in machine learning, data mining and statistical analysis. A missing value can signify important information about dataset in the mining process. Handling missing data value is a challenging task for the data mining process. In this paper, we propose new paradigm for the development of data imputation method for missing data value estimation based on centroids and the nearest neighbours. Firstly, identify clusters based on the k-means algorithm and calculate centroids and the nearest neighbour data records. Secondly, the nearest distances from complete dataset as well as incomplete dataset from the centroids and estimated the nearest data record which tends to be curse dimensionality. Finally, impute the missing value based nearest neighbour record using statistical measure called z-score. The experimental study demonstrates strengthen of the proposed paradigm for the imputation of the missing data value estimation in dataset. Tests have been run using different types of datasets in order to validate our approach and compare the results with other imputation methods such as KNNI, SVMI, WKNNI, KMI and FKNNI. The proposed approach is geared towards maximizing the utility of imputation with respect to missing data value estimation.

Download Full-text

Improving accuracy of missing data imputation in data mining

Kurdistan Journal of Applied Research ◽

10.24017/science.2017.3.30 ◽

2017 ◽

Vol 2 (3) ◽

pp. 66-73

Author(s):

Nzar A. Ali ◽

Zhyan M. Omer

Keyword(s):

Data Mining ◽

Missing Data ◽

Real World ◽

Missing Values ◽

Large Data ◽

Data Repository ◽

Data Sets ◽

Real World Data ◽

Missing Data Imputation ◽

Improving Accuracy

In fact, raw data in the real world is dirty. Each large data repository contains various types of anomalous values that influence the result of the analysis, since in data mining, good models usually need good data, databases in the world are not always clean and includes noise, incomplete data, duplicate records, inconsistent data and missing values. Missing data is a common drawback in many real-world data sets. In this paper, we proposed an algorithm depending on improving (MIGEC) algorithm in the way of imputation for dealing missing values. We implement grey relational analysis (GRA) on attribute values instead of instance values, and the missing data were initially imputed by mean imputation and then estimated by our proposed algorithm (PA) used as a complete value for imputing next missing value.We compare our proposed algorithm with several other algorithms such as MMS, HDI, KNNMI, FCMOCS, CRI, CMI, NIIA and MIGEC under different missing mechanisms. Experimental results demonstrate that the proposed algorithm has less RMSE values than other algorithms under all missingness mechanisms.

Download Full-text

A Thorough Theoretical Exploration of Intriguing Characteristics of Cyclo[18]carbon: Geometry, Bonding Nature, Aromaticity, Weak Interaction, Reactivity, Excited States, Vibrations, Molecular Dynamics and Various Molecular Properties

10.26434/chemrxiv.11320130.v1 ◽

2019 ◽

Cited By ~ 2

Author(s):

Tian Lu ◽

Qinxue Chen ◽

Zeyu Liu

Keyword(s):

Molecular Dynamics ◽

Excited States ◽

Electronic Excitation ◽

Ring Structure ◽

Molecular Properties ◽

Molecular Vibration ◽

Full Characterization ◽

Long Time ◽

Almost All

Although cyclo[18]carbon has been theoretically and experimentally investigated since long time ago, only very recently it was prepared and directly observed by means of STM/AFM in condensed phase (Kaiser et al., Science, 365, 1299 (2019)). The unique ring structure and dual 18-center π delocalization feature bring a variety of unusual characteristics and properties to the cyclo[18]carbon, which are quite worth to be explored. In this work, we present an extremely comprehensive and detailed investigation on almost all aspects of the cyclo[18]carbon, including (1) Geometric characteristics (2) Bonding nature (3) Electron delocalization and aromaticity (4) Intermolecular interaction (5) Reactivity (6) Electronic excitation and UV/Vis spectrum (7) Molecular vibration and IR/Raman spectrum (8) Molecular dynamics (9) Response to external field (10) Electron ionization, affinity and accompanied process (11) Various molecular properties. We believe that our full characterization of the cyclo[18]carbon will greatly deepen researchers' understanding of this system, and thereby help them to utilize it in practice and design its various valuable derivatives.

Download Full-text

A Thorough Theoretical Exploration of Intriguing Characteristics of Cyclo[18]carbon: Geometry, Bonding Nature, Aromaticity, Weak Interaction, Reactivity, Excited States, Vibrations, Molecular Dynamics and Various Molecular Properties

10.26434/chemrxiv.11320130 ◽

2019 ◽

Cited By ~ 2

Author(s):

Tian Lu ◽

Qinxue Chen ◽

Zeyu Liu

Keyword(s):

Molecular Dynamics ◽

Excited States ◽

Electronic Excitation ◽

Ring Structure ◽

Molecular Properties ◽

Molecular Vibration ◽

Full Characterization ◽

Long Time ◽

Almost All

Although cyclo[18]carbon has been theoretically and experimentally investigated since long time ago, only very recently it was prepared and directly observed by means of STM/AFM in condensed phase (Kaiser et al., Science, 365, 1299 (2019)). The unique ring structure and dual 18-center π delocalization feature bring a variety of unusual characteristics and properties to the cyclo[18]carbon, which are quite worth to be explored. In this work, we present an extremely comprehensive and detailed investigation on almost all aspects of the cyclo[18]carbon, including (1) Geometric characteristics (2) Bonding nature (3) Electron delocalization and aromaticity (4) Intermolecular interaction (5) Reactivity (6) Electronic excitation and UV/Vis spectrum (7) Molecular vibration and IR/Raman spectrum (8) Molecular dynamics (9) Response to external field (10) Electron ionization, affinity and accompanied process (11) Various molecular properties. We believe that our full characterization of the cyclo[18]carbon will greatly deepen researchers' understanding of this system, and thereby help them to utilize it in practice and design its various valuable derivatives.

Download Full-text

A Survey on Data Mining using Genetic Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.888891 ◽

2019 ◽

Vol 7 (6) ◽

pp. 888-891

Author(s):

Mariya Khatoon ◽

Abhay Kumar Agarwal

Keyword(s):

Data Mining ◽

Genetic Algorithm

Download Full-text

Genetic Algorithm Based Approach For Predict Disease and Avoid Congestion in Data Mining

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i7.15371543 ◽

2018 ◽

Vol 6 (7) ◽

pp. 1537-1543

Author(s):

J. Adamkani ◽

M. Wasim Raja

Keyword(s):

Data Mining ◽

Genetic Algorithm

Download Full-text

Missing value or behaviour: how to increase the signal of social media data

METRON ◽

10.1007/s40300-021-00216-7 ◽

2021 ◽

Author(s):

Paolo Mariani ◽

Andrea Marletta

Keyword(s):

Social Media ◽

Missing Data ◽

Everyday Life ◽

Processing Technique ◽

Missing Value ◽

Social Media Data ◽

Practical Strategy ◽

Specific Behaviour ◽

Complex Features ◽

Media Data

AbstractSocial media has become a widespread element of people’s everyday life, which is used to communicate and generate contents. Among the several ways to express a reaction to social media contents, the “Likes” are critical. Indeed, they convey preferences, which drive existing markets or allow the creation of new ones. Nevertheless, the appreciation indicators have some complex features, as for example the interpretation of the absence of “Likes”. In this case, the lack of approval may be considered as a specific behaviour. The present study aimed to define whether the absence of Likes may indicate the presence of a specific behaviour through the contextualization of the treatment of missing data applied to real cases. We provided a practical strategy for extracting more knowledge from social media data, whose synthesis raises several measurement problems. We proposed an approach based on the disambiguation of missing data in two modalities: “Dislike” and “Nothing”. Finally, a data pre-processing technique was suggested to increase the signal of social media data.

Download Full-text