fitness functions
Recently Published Documents


TOTAL DOCUMENTS

371
(FIVE YEARS 84)

H-INDEX

31
(FIVE YEARS 3)

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 242
Author(s):  
Oumaima Stitini ◽  
Soulaimane Kaloun ◽  
Omar Bencharef

Nowadays, recommendation systems offer a method of facilitating the user’s desire. It is useful for recommending items from a variety of areas such as in the e-commerce, medical, education, tourism, and industry domains. The e-commerce area represents the most active research we found, which assists users in locating the things they want. A recommender system can also provide users with helpful knowledge about things that could be of interest. Sometimes, the user gets bored with recommendations which are similar to their profiles, which leads to the over-specialization problem. Over-specialization is caused by limited content data, under which content-based recommendation algorithms suggest goods directly related to the customer profile rather than new things. In this study, we are particularly interested in recommending surprising, new, and unexpected items that may likely be enjoyed by users and will mitigate this limited content. In order to recommend novel and serendipitous items along with familiar items, we need to introduce additional hacks and note of randomness, which can be achieved using genetic algorithms that brings diversity to recommendations being made. This paper describes a Revolutionary Recommender System using a Genetic Algorithm called RRSGA which improves the fitness functions for recommending optimal results. The proposed approach employs a genetic algorithm to address the over-specialization issue of content-based filtering. The proposed method aims to incorporate genetic algorithms that bring variety to recommendations and efficiently adjust and suggest unpredictable and innovative things to the user. Experiments objectively demonstrate that our technology can recommend additional products that every consumer is likely to appreciate. The results of RRSGA have been compared against recommendation results from the content-based filtering approach. The results indicate the effectiveness of RRSGA and its capacity to make more accurate predictions than alternative approaches.


2022 ◽  
Vol 27 (2) ◽  
Author(s):  
Hussein Almulla ◽  
Gregory Gay

AbstractSearch-based test generation is guided by feedback from one or more fitness functions—scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage—do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current population of test suites. To test this hypothesis, we have implemented two reinforcement learning algorithms in the EvoSuite unit test generation framework, and used these algorithms to dynamically set the fitness functions used during generation for the three goals identified above. We have evaluated our framework, EvoSuiteFIT, on a set of Java case examples. EvoSuiteFIT techniques attain significant improvements for two of the three goals, and show limited improvements on the third when the number of generations of evolution is fixed. Additionally, for two of the three goals, EvoSuiteFIT detects faults missed by the other techniques. The ability to adjust fitness functions allows strategic choices that efficiently produce more effective test suites, and examining these choices offers insight into how to attain our testing goals. We find that adaptive fitness function selection is a powerful technique to apply when an effective fitness function does not already exist for achieving a testing goal.


2021 ◽  
Vol 119 (1) ◽  
pp. e2109649118
Author(s):  
David H. Brookes ◽  
Amirali Aghazadeh ◽  
Jennifer Listgarten

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.


2021 ◽  
Author(s):  
J PRINCE JEROME CHRISTOPHER ◽  
K LINGADURAI ◽  
G SHANKAR

Abstract Genetic algorithms are search algorithms based on the mechanics of natural selection and natural genetics. In this paper, we investigate a novel approach to the binary coded testing process based on a genetic algorithm. This paper consists of two parts. Thefirst part addresses the problem in the traditional way of using the decimal number system to define the fitness function to study the variations of counts and the variations of probability against the fitness functions. Second, the initialpopulationsare defined using binary coded digits (genes). For the evaluation of the high fitness function values,three genetic operators, namely, reproduction, crossover and mutation, are randomly used. The results show the importance of the genetic operator, mutation, which yields the peak values for the fitness function based on binary coded numbers performed in a new way.


2021 ◽  
Author(s):  
◽  
Wenlong Fu

<p>Edge detection is important in image processing. Extracting edge features is the main and necessary process in edge detection. Since features in edge detection are implicit, most of the existing edge features only work well on specific images. Using a moving window has a trade-off between noise rejection and localisation accuracy. Genetic Programming (GP) has been widely applied to image processing, and GP has potential for extracting edge features, although there is little work in GP for edge detection. The overall goal of this thesis is to investigate GP for automatic edge feature extraction using different amounts of existing knowledge from only using raw pixel intensities and ground truth to more advanced domain knowledge such as Gaussian filters.  First of all, this thesis conducts an investigation on fundamental low-level edge detector construction with very little prior edge knowledge. Search operators based on a single raw pixel, a block of pixels, and two blocks of pixels are proposed to construct edge detectors. Unlike most existing methods, this GP system automatically searches neighbours and avoids manually predefining a window size. The results show that the evolved edge detectors outperform some existing edge detectors, such as the Sobel edge detector.  Secondly, from the pixel and image views, localisation of detected edges, and observations of GP programs, new fitness functions are suggested in this thesis. It is found that the pixel view is better than the image view to design fitness functions without allowing a distance from predictions to ground truth. However, in terms of edge localisation, the pixel view is worse than the image view to design fitness functions. A new fitness function combining detection accuracy and localisation effectively improves the performance of evolved edge detectors. When utilising observations of GP programs to construct soft edge maps, two new fitness functions including a restriction on the range of observations are proposed to evolve edge detectors with good soft edge maps on test images.  Thirdly, pixels implicitly selected by the GP system based on full images are analysed. A set of pixels are extracted from the evolved programs and used to construct edge filters. A merge operation is proposed to extract six pixels to construct second-order edge filters. The results show that a rich but compact set of pixels can be extracted from the evolved edge detectors.  Fourthly, GP is utilised to evolve edge detectors based on the Gaussian-based technique. These GP evolved edge detectors are significantly better than the Gaussian gradient and the surround suppression technique. An efficient and effective sampling technique is proposed for evolving Gaussian-based edge detectors. From the results, there are no significant differences between the Gaussian-based edge detectors evolved by a full set of images and by the sampling technique on the training set.  Fifthly, GP is employed to construct features using an existing set of basic features. The distribution of observations of GP programs is estimated. Evolved composite features are proposed using known distribution models to indicate the probability of pixels being discriminated as edge points. It is found that the composite features effectively combine advantages of basic features and can richly indicate edge responses.  Finally, a Bayesian-based GP system is proposed to construct high-level edge features via employing two general algebraic operators and a function developed from a simple Bayesian model. The simple Bayesian model utilises a general multivariate normal density to combine basic features. Experiments show that the GP evolved programs perform better than the simple Bayesian model to obtain composite features.   Overall, this thesis shows that GP has the capability to effectively extract edge features using different degrees of prior knowledge about edges.</p>


2021 ◽  
Author(s):  
◽  
Wenlong Fu

<p>Edge detection is important in image processing. Extracting edge features is the main and necessary process in edge detection. Since features in edge detection are implicit, most of the existing edge features only work well on specific images. Using a moving window has a trade-off between noise rejection and localisation accuracy. Genetic Programming (GP) has been widely applied to image processing, and GP has potential for extracting edge features, although there is little work in GP for edge detection. The overall goal of this thesis is to investigate GP for automatic edge feature extraction using different amounts of existing knowledge from only using raw pixel intensities and ground truth to more advanced domain knowledge such as Gaussian filters.  First of all, this thesis conducts an investigation on fundamental low-level edge detector construction with very little prior edge knowledge. Search operators based on a single raw pixel, a block of pixels, and two blocks of pixels are proposed to construct edge detectors. Unlike most existing methods, this GP system automatically searches neighbours and avoids manually predefining a window size. The results show that the evolved edge detectors outperform some existing edge detectors, such as the Sobel edge detector.  Secondly, from the pixel and image views, localisation of detected edges, and observations of GP programs, new fitness functions are suggested in this thesis. It is found that the pixel view is better than the image view to design fitness functions without allowing a distance from predictions to ground truth. However, in terms of edge localisation, the pixel view is worse than the image view to design fitness functions. A new fitness function combining detection accuracy and localisation effectively improves the performance of evolved edge detectors. When utilising observations of GP programs to construct soft edge maps, two new fitness functions including a restriction on the range of observations are proposed to evolve edge detectors with good soft edge maps on test images.  Thirdly, pixels implicitly selected by the GP system based on full images are analysed. A set of pixels are extracted from the evolved programs and used to construct edge filters. A merge operation is proposed to extract six pixels to construct second-order edge filters. The results show that a rich but compact set of pixels can be extracted from the evolved edge detectors.  Fourthly, GP is utilised to evolve edge detectors based on the Gaussian-based technique. These GP evolved edge detectors are significantly better than the Gaussian gradient and the surround suppression technique. An efficient and effective sampling technique is proposed for evolving Gaussian-based edge detectors. From the results, there are no significant differences between the Gaussian-based edge detectors evolved by a full set of images and by the sampling technique on the training set.  Fifthly, GP is employed to construct features using an existing set of basic features. The distribution of observations of GP programs is estimated. Evolved composite features are proposed using known distribution models to indicate the probability of pixels being discriminated as edge points. It is found that the composite features effectively combine advantages of basic features and can richly indicate edge responses.  Finally, a Bayesian-based GP system is proposed to construct high-level edge features via employing two general algebraic operators and a function developed from a simple Bayesian model. The simple Bayesian model utilises a general multivariate normal density to combine basic features. Experiments show that the GP evolved programs perform better than the simple Bayesian model to obtain composite features.   Overall, this thesis shows that GP has the capability to effectively extract edge features using different degrees of prior knowledge about edges.</p>


2021 ◽  
Author(s):  
◽  
Urvesh Bhowan

<p>In classification,machine learning algorithms can suffer a performance bias when data sets are unbalanced. Binary data sets are unbalanced when one class is represented by only a small number of training examples (called the minority class), while the other class makes up the rest (majority class). In this scenario, the induced classifiers typically have high accuracy on the majority class but poor accuracy on the minority class. As the minority class typically represents the main class-of-interest in many real-world problems, accurately classifying examples from this class can be at least as important as, and in some cases more important than, accurately classifying examples from the majority class. Genetic Programming (GP) is a promising machine learning technique based on the principles of Darwinian evolution to automatically evolve computer programs to solve problems. While GP has shown much success in evolving reliable and accurate classifiers for typical classification tasks with balanced data, GP, like many other learning algorithms, can evolve biased classifiers when data is unbalanced. This is because traditional training criteria such as the overall success rate in the fitness function in GP, can be influenced by the larger number of examples from the majority class.  This thesis proposes a GP approach to classification with unbalanced data. The goal is to develop new internal cost-adjustment techniques in GP to improve classification performances on both the minority class and the majority class. By focusing on internal cost-adjustment within GP rather than the traditional databalancing techniques, the unbalanced data can be used directly or "as is" in the learning process. This removes any dependence on a sampling algorithm to first artificially re-balance the input data prior to the learning process. This thesis shows that by developing a number of new methods in GP, genetic program classifiers with good classification ability on the minority and the majority classes can be evolved. This thesis evaluates these methods on a range of binary benchmark classification tasks with unbalanced data. This thesis demonstrates that unlike tasks with multiple balanced classes where some dynamic (non-static) classification strategies perform significantly better than the simple static classification strategy, either a static or dynamic strategy shows no significant difference in the performance of evolved GP classifiers on these binary tasks. For this reason, the rest of the thesis uses this static classification strategy.  This thesis proposes several new fitness functions in GP to perform cost adjustment between the minority and the majority classes, allowing the unbalanced data sets to be used directly in the learning process without sampling. Using the Area under the Receiver Operating Characteristics (ROC) curve (also known as the AUC) to measure how well a classifier performs on the minority and majority classes, these new fitness functions find genetic program classifiers with high AUC on the tasks on both classes, and with fast GP training times. These GP methods outperform two popular learning algorithms, namely, Naive Bayes and Support Vector Machines on the tasks, particularly when the level of class imbalance is large, where both algorithms show biased classification performances.  This thesis also proposes a multi-objective GP (MOGP) approach which treats the accuracies of the minority and majority classes separately in the learning process. The MOGP approach evolves a good set of trade-off solutions (a Pareto front) in a single run that perform as well as, and in some cases better than, multiple runs of canonical single-objective GP (SGP). In SGP, individual genetic program solutions capture the performance trade-off between the two objectives (minority and majority class accuracy) using an ROC curve; whereas in MOGP, this requirement is delegated to multiple genetic program solutions along the Pareto front.  This thesis also shows how multiple Pareto front classifiers can be combined into an ensemble where individual members vote on the class label. Two ensemble diversity measures are developed in the fitness functions which treat the diversity on both the minority and the majority classes as equally important; otherwise, these measures risk being biased toward the majority class. The evolved ensembles outperform their individual members on the tasks due to good cooperation between members.  This thesis further improves the ensemble performances by developing a GP approach to ensemble selection, to quickly find small groups of individuals that cooperate very well together in the ensemble. The pruned ensembles use much fewer individuals to achieve performances that are as good as larger (unpruned) ensembles, particularly on tasks with high levels of class imbalance, thereby reducing the total time to evaluate the ensemble.</p>


2021 ◽  
Author(s):  
◽  
Urvesh Bhowan

<p>In classification,machine learning algorithms can suffer a performance bias when data sets are unbalanced. Binary data sets are unbalanced when one class is represented by only a small number of training examples (called the minority class), while the other class makes up the rest (majority class). In this scenario, the induced classifiers typically have high accuracy on the majority class but poor accuracy on the minority class. As the minority class typically represents the main class-of-interest in many real-world problems, accurately classifying examples from this class can be at least as important as, and in some cases more important than, accurately classifying examples from the majority class. Genetic Programming (GP) is a promising machine learning technique based on the principles of Darwinian evolution to automatically evolve computer programs to solve problems. While GP has shown much success in evolving reliable and accurate classifiers for typical classification tasks with balanced data, GP, like many other learning algorithms, can evolve biased classifiers when data is unbalanced. This is because traditional training criteria such as the overall success rate in the fitness function in GP, can be influenced by the larger number of examples from the majority class.  This thesis proposes a GP approach to classification with unbalanced data. The goal is to develop new internal cost-adjustment techniques in GP to improve classification performances on both the minority class and the majority class. By focusing on internal cost-adjustment within GP rather than the traditional databalancing techniques, the unbalanced data can be used directly or "as is" in the learning process. This removes any dependence on a sampling algorithm to first artificially re-balance the input data prior to the learning process. This thesis shows that by developing a number of new methods in GP, genetic program classifiers with good classification ability on the minority and the majority classes can be evolved. This thesis evaluates these methods on a range of binary benchmark classification tasks with unbalanced data. This thesis demonstrates that unlike tasks with multiple balanced classes where some dynamic (non-static) classification strategies perform significantly better than the simple static classification strategy, either a static or dynamic strategy shows no significant difference in the performance of evolved GP classifiers on these binary tasks. For this reason, the rest of the thesis uses this static classification strategy.  This thesis proposes several new fitness functions in GP to perform cost adjustment between the minority and the majority classes, allowing the unbalanced data sets to be used directly in the learning process without sampling. Using the Area under the Receiver Operating Characteristics (ROC) curve (also known as the AUC) to measure how well a classifier performs on the minority and majority classes, these new fitness functions find genetic program classifiers with high AUC on the tasks on both classes, and with fast GP training times. These GP methods outperform two popular learning algorithms, namely, Naive Bayes and Support Vector Machines on the tasks, particularly when the level of class imbalance is large, where both algorithms show biased classification performances.  This thesis also proposes a multi-objective GP (MOGP) approach which treats the accuracies of the minority and majority classes separately in the learning process. The MOGP approach evolves a good set of trade-off solutions (a Pareto front) in a single run that perform as well as, and in some cases better than, multiple runs of canonical single-objective GP (SGP). In SGP, individual genetic program solutions capture the performance trade-off between the two objectives (minority and majority class accuracy) using an ROC curve; whereas in MOGP, this requirement is delegated to multiple genetic program solutions along the Pareto front.  This thesis also shows how multiple Pareto front classifiers can be combined into an ensemble where individual members vote on the class label. Two ensemble diversity measures are developed in the fitness functions which treat the diversity on both the minority and the majority classes as equally important; otherwise, these measures risk being biased toward the majority class. The evolved ensembles outperform their individual members on the tasks due to good cooperation between members.  This thesis further improves the ensemble performances by developing a GP approach to ensemble selection, to quickly find small groups of individuals that cooperate very well together in the ensemble. The pruned ensembles use much fewer individuals to achieve performances that are as good as larger (unpruned) ensembles, particularly on tasks with high levels of class imbalance, thereby reducing the total time to evaluate the ensemble.</p>


Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2156
Author(s):  
Juan M. Cebrian ◽  
Baldomero Imbernón ◽  
Jesús Soto ◽  
José M. Cecilia

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.


Sign in / Sign up

Export Citation Format

Share Document