Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model

Author(s):  
Kumar Rahul ◽  
Rohitash Kumar Banyal

Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.

Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1190
Author(s):  
Mohammad Dehghani ◽  
Zeinab Montazeri ◽  
Štěpán Hubálovský

There are many optimization problems in the different disciplines of science that must be solved using the appropriate method. Population-based optimization algorithms are one of the most efficient ways to solve various optimization problems. Population-based optimization algorithms are able to provide appropriate solutions to optimization problems based on a random search of the problem-solving space without the need for gradient and derivative information. In this paper, a new optimization algorithm called the Group Mean-Based Optimizer (GMBO) is presented; it can be applied to solve optimization problems in various fields of science. The main idea in designing the GMBO is to use more effectively the information of different members of the algorithm population based on two selected groups, with the titles of the good group and the bad group. Two new composite members are obtained by averaging each of these groups, which are used to update the population members. The various stages of the GMBO are described and mathematically modeled with the aim of being used to solve optimization problems. The performance of the GMBO in providing a suitable quasi-optimal solution on a set of 23 standard objective functions of different types of unimodal, high-dimensional multimodal, and fixed-dimensional multimodal is evaluated. In addition, the optimization results obtained from the proposed GMBO were compared with eight other widely used optimization algorithms, including the Marine Predators Algorithm (MPA), the Tunicate Swarm Algorithm (TSA), the Whale Optimization Algorithm (WOA), the Grey Wolf Optimizer (GWO), Teaching–Learning-Based Optimization (TLBO), the Gravitational Search Algorithm (GSA), Particle Swarm Optimization (PSO), and the Genetic Algorithm (GA). The optimization results indicated the acceptable performance of the proposed GMBO, and, based on the analysis and comparison of the results, it was determined that the GMBO is superior and much more competitive than the other eight algorithms.


2021 ◽  
Vol 41 (1) ◽  
pp. 1657-1675
Author(s):  
Luis Rodriguez ◽  
Oscar Castillo ◽  
Mario Garcia ◽  
Jose Soria

The main goal of this paper is to outline a new optimization algorithm based on String Theory, which is a relative new area of physics. The String Theory Algorithm (STA) is a nature-inspired meta-heuristic, which is based on studies about a theory stating that all the elemental particles that exist in the universe are strings, and the vibrations of these strings create all particles existing today. The newly proposed algorithm uses equations based on the laws of physics that are stated in String Theory. The main contribution in this proposed method is the new techniques that are devised in order to generate potential solutions in optimization problems, and we are presenting a detailed explanation and the equations involved in the new algorithm in order to solve optimization problems. In this case, we evaluate this new proposed meta-heuristic with three cases. The first case is of 13 traditional benchmark mathematical functions and a comparison with three different meta-heuristics is presented. The three algorithms are: Flower Pollination Algorithm (FPA), Firefly Algorithm (FA) and Grey Wolf Optimizer (GWO). The second case is the optimization of benchmark functions of the CEC 2015 Competition and we are also presenting a statistical comparison of these results with respect to FA and GWO. In addition, we are presenting a third case, which is the optimization of a fuzzy inference system (FIS), specifically finding the optimal design of a fuzzy controller, where the main goal is to optimize the membership functions of the FIS. It is important to mention that we used these study cases in order to analyze the proposed meta-heuristic with: basic problems, complex problems and control problems. Finally, we present the performance, results and conclusions of the new proposed meta-heuristic.


2017 ◽  
Vol 12 (1) ◽  
pp. 32 ◽  
Author(s):  
Amjad A. Hudaib ◽  
Hussam N. Fakhouri

Bio and natural phenomena inspired algorithms and meta-heuristics provide solutions to solve optimization and preliminary convergence problems. It significantly has wide effect that is integrated in many scientific fields. Thereby justifying the relevance development of many applications that relay on optimization algorithms, which allow finding the best solution in the shortest possible time. Therefore it is necessary to further consider and develop new swarm intelligence optimization algorithms. This paper proposes a novel optimization algorithm called supernova optimizer (SO) inspired by the supernova phenomena in nature. SO mimics this natural phenomena aiming to improve the three main features of optimization; exploration, exploitation, and local minima avoidance. The proposed meta-heuristic optimizer has been tested over 20 will known benchmarks functions, the results have been verified by a comparative study with the state of art optimization algorithms Grey Wolf Optimizer (GWO), A Sine Cosine Algorithm for solving optimization problems (SCA), Multi-Verse Optimizer (MVO), Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm (MFO), The Whale Optimization Algorithm (WOA), Polar Particle Swarm Optimizer (PLOARPSO) and with Particle Swarm Optimizer (PSO). The results showed that SO provided very competitive and effective results. It outperforms the best state-of-art algorithms that are compared to on the most of the tested benchmark functions.


Author(s):  
Arif Hanafi ◽  
Sulaiman Harun ◽  
Sofika Enggari ◽  
Larissa Navia Rani

The way that email has extraordinary significance in present day business communication is certain. Consistently, a bulk of emails is sent from organizations to clients and suppliers, from representatives to their managers and starting with one colleague then onto the next. In this way there is vast of email in data warehouse. Data cleaning is an activity performed on the data sets of data warehouse to upgrade and keep up the quality and consistency of the data. This paper underlines the issues related with dirty data, detection of duplicatein email column. The paper identifies the strategy of data cleaning from adifferent point of view. It provides an algorithm to the discovery of error and duplicates entries in the data sets of existing data warehouse. The paper characterizes the alliance rules based on the concept of mathematical association rules to determine the duplicate entries in email column in data sets.


2021 ◽  
Author(s):  
Sehej Jain ◽  
Kusum Kumari Bharti

Abstract A novel meta-heuristic algorithm named as the Cell Division Optimizer (CDO) is proposed. The proposed algorithm is inspired by the reproduction methods at the cellular level, which is formulated by the well-known cell division process known as mitosis and meiosis. In the proposed model Meiosis and Mitosis govern the exploration and exploitation aspects of the optimization algorithm, respectively. In the proposed method, the solutions are updated in two phases to achieve the global optimum solution. The proposed algorithm can be easily adopted to solve the combinatorial optimization method. To evaluate the proposed method, 50 well-known benchmark test functions and also 2 classical engineering optimization problems including 1 mechanical engineering problem and 1 electrical engineering problem are employed. The results of the proposed method are compared with the latest versions of state-of-the-art algorithms like Particle Swarm Optimization, Cuckoo Search, Grey Wolf Optimizer, FruitFly Optimization, Whale Optimizer, Water-Wave Optimizer and recently proposed variants of top-performing algorithms like SHADE (success history-based adaptive differential evolution) and CMAES (Covariance matrix adaptation evolution strategy). Moreover, the convergence speed of the proposed algorithm is better than the considered competitive methods in most cases.


2017 ◽  
Vol 12 (1) ◽  
pp. 148 ◽  
Author(s):  
Amjad A. Hudaib ◽  
Ahmad Kamel AL Hwaitat

Particle Swarm Optimization (PSO) ia a will known meta-heuristic that has been used in many applications for solving optimization problems. But it has some problems such as local minima. In this paper proposed a optimization algorithm called Movement Particle Swarm Optimization (MPSO) that enhances the behavior of PSO by using a random movement function to search for more points in the search space. The meta-heuristic has been experimented over 23 benchmark faction compared with state of the art algorithms: Multi-Verse Optimizer (MFO), Sine Cosine Algorithm (SCA), Grey Wolf Optimizer (GWO) and particle Swarm Optimization (PSO). The Results showed that the proposed algorithm has enhanced the PSO over the tested benchmarked functions.


Author(s):  
Fariborz Masoumi ◽  
Sina Masoumzadeh ◽  
Negin Zafari ◽  
Mohammad Javad Emami-Skardi

Abstract Reservoir operation is a key issue in the water resources system. In this paper, the Shuffled Grey Wolf Optimizer (SGWO), a hybrid optimization algorithm inspired by Shuffled Complex Evolution (SCE-UA) and Gray Wolf Optimizer (GWO) algorithms, is introduced. The main modification in the proposed algorithm is how it divides and shuffles the population to enhance the information exchange among the individuals. The performance of the SGWO algorithm is compared to famous evolutionary algorithms such as Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) in solving mathematical benchmark functions and multiple types of reservoir operation optimization problems with different scales. Two hypothetical 4 and10-reservoir system, and the Dez dam in Iran as a single reservoir system were selected as the case study in this research. The capability of the algorithms was compared in terms of accuracy of derived optimum objective function values, convergence speed, and stability of answers in different implementations. The results showed that the SGWO can reach considerably better results (0.3% to 26% better than the closest rival algorithms) using significantly lower number of function evaluations. It also showed the lowest standard deviation among other algorithms for all problems, which indicated the high reliability of this algorithm.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5214
Author(s):  
Mohammad Dehghani ◽  
Štěpán Hubálovský ◽  
Pavel Trojovský

Numerous optimization problems designed in different branches of science and the real world must be solved using appropriate techniques. Population-based optimization algorithms are some of the most important and practical techniques for solving optimization problems. In this paper, a new optimization algorithm called the Cat and Mouse-Based Optimizer (CMBO) is presented that mimics the natural behavior between cats and mice. In the proposed CMBO, the movement of cats towards mice as well as the escape of mice towards havens is simulated. Mathematical modeling and formulation of the proposed CMBO for implementation on optimization problems are presented. The performance of the CMBO is evaluated on a standard set of objective functions of three different types including unimodal, high-dimensional multimodal, and fixed-dimensional multimodal. The results of optimization of objective functions show that the proposed CMBO has a good ability to solve various optimization problems. Moreover, the optimization results obtained from the CMBO are compared with the performance of nine other well-known algorithms including Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), Teaching-Learning-Based Optimization (TLBO), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), Marine Predators Algorithm (MPA), Tunicate Swarm Algorithm (TSA), and Teamwork Optimization Algorithm (TOA). The performance analysis of the proposed CMBO against the compared algorithms shows that CMBO is much more competitive than other algorithms by providing more suitable quasi-optimal solutions that are closer to the global optimal.


Author(s):  
Achmad Fanany Onnilita Gaffar ◽  
Agusma Wajiansyah ◽  
Supriadi Supriadi

The shortest path problem is one of the optimization problems where the optimization value is a distance. In general, solving the problem of the shortest route search can be done using two methods, namely conventional methods and heuristic methods. The Ant Colony Optimization (ACO) is the one of the optimization algorithm based on heuristic method. ACO is adopted from the behavior of ant colonies which naturally able to find the shortest route on the way from the nest to the food sources. In this study, ACO is used to determine the shortest route from Bumi Senyiur Hotel (origin point) to East Kalimantan Governor's Office (destination point). The selection of the origin and destination points is based on a large number of possible major roads connecting the two points. The data source used is the base map of Samarinda City which is cropped on certain coordinates by using Google Earth app which covers the origin and destination points selected. The data pre-processing is performed on the base map image of the acquisition results to obtain its numerical data. ACO is implemented on the data to obtain the shortest path from the origin and destination point that has been determined. From the study results obtained that the number of ants that have been used has an effect on the increase of possible solutions to optimal. The number of tours effect on the number of pheromones that are left on each edge passed ant. With the global pheromone update on each tour then there is a possibility that the path that has passed the ant will run out of pheromone at the end of the tour. This causes the possibility of inconsistent results when using the number of ants smaller than the number of tours.


Author(s):  
Prachi Agrawal ◽  
Talari Ganesh ◽  
Ali Wagdy Mohamed

AbstractThis article proposes a novel binary version of recently developed Gaining Sharing knowledge-based optimization algorithm (GSK) to solve binary optimization problems. GSK algorithm is based on the concept of how humans acquire and share knowledge during their life span. A binary version of GSK named novel binary Gaining Sharing knowledge-based optimization algorithm (NBGSK) depends on mainly two binary stages: binary junior gaining sharing stage and binary senior gaining sharing stage with knowledge factor 1. These two stages enable NBGSK for exploring and exploitation of the search space efficiently and effectively to solve problems in binary space. Moreover, to enhance the performance of NBGSK and prevent the solutions from trapping into local optima, NBGSK with population size reduction (PR-NBGSK) is introduced. It decreases the population size gradually with a linear function. The proposed NBGSK and PR-NBGSK applied to set of knapsack instances with small and large dimensions, which shows that NBGSK and PR-NBGSK are more efficient and effective in terms of convergence, robustness, and accuracy.


Sign in / Sign up

Export Citation Format

Share Document