randomized search Recently Published Documents

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively. Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling

Time Complexity Analysis of Randomized Search Heuristics for the Dynamic Graph Coloring Problem

Algorithmica ◽

10.1007/s00453-021-00838-3 ◽

2021 ◽

Author(s):

Jakob Bossek ◽

Frank Neumann ◽

Pan Peng ◽

Dirk Sudholt

Keyword(s):

Vertex Coloring ◽

Theoretical Understanding ◽

Color Palette ◽

Original Algorithm ◽

Coloring Problem ◽

Mutation Operators ◽

Graph Classes ◽

Search Heuristics ◽

AbstractWe contribute to the theoretical understanding of randomized search heuristics for dynamic problems. We consider the classical vertex coloring problem on graphs and investigate the dynamic setting where edges are added to the current graph. We then analyze the expected time for randomized search heuristics to recompute high quality solutions. The (1+1) Evolutionary Algorithm and RLS operate in a setting where the number of colors is bounded and we are minimizing the number of conflicts. Iterated local search algorithms use an unbounded color palette and aim to use the smallest colors and, consequently, the smallest number of colors. We identify classes of bipartite graphs where reoptimization is as hard as or even harder than optimization from scratch, i.e., starting with a random initialization. Even adding a single edge can lead to hard symmetry problems. However, graph classes that are hard for one algorithm turn out to be easy for others. In most cases our bounds show that reoptimization is faster than optimizing from scratch. We further show that tailoring mutation operators to parts of the graph where changes have occurred can significantly reduce the expected reoptimization time. In most settings the expected reoptimization time for such tailored algorithms is linear in the number of added edges. However, tailored algorithms cannot prevent exponential times in settings where the original algorithm is inefficient.

International Journal of Advanced Research in Science, Communication and Technology ◽

Optimized Hyperparameter Tuned Random Forest Regressor Algorithm in Predicting Resale Car Value based on Grid Search Method

10.48175/ijarsct-1217 ◽

2021 ◽

pp. 106-113

Author(s):

Aruna M ◽

M Anjana ◽

Harshita Chauhan ◽

Deepa R

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Trees ◽

Search Method ◽

Grid Search ◽

Random Forest Regression ◽

Prediction Rate ◽

Grid Search Method ◽

Randomized Search ◽

The Cost

The price of a car depreciates right from the time it is bought. The resale value of cars is influenced by many factors and influences both buyers and sellers, making it a prominent problem in the machine learning field. Diverse methodologies in machine learning can help us use all the varied factors and process a large amount of data to predict the cost. For our dataset, the Random Forest Regression algorithm shows a significant increase in the prediction rate. In order to optimise the Random Forest Regressor model, best hyperparameters can be found using hyperparameter tuning strategies. On comparing Grid Search and Randomized Search, a better prediction rate is accounted for using the former. These parameters are then passed to the algorithm as hyperparameter tuning can help collect the best batch of decision trees in the random forest for the most optimised prediction rate.

Error Analysis of Elitist Randomized Search Heuristics

Swarm and Evolutionary Computation ◽

10.1016/j.swevo.2021.100875 ◽

2021 ◽

pp. 100875

Author(s):

Cong Wang ◽

Yu Chen ◽

Jun He ◽

Chengwang Xie

Keyword(s):

Error Analysis ◽

Search Heuristics ◽

PREDICTING THE TYPE OF PHYSICAL ACTIVITY FROM TRI-AXIAL SMARTPHONE ACCELEROMETER DATA

Istrazivanja i projektovanja za privredu ◽

10.5937/jaes0-27166 ◽

2021 ◽

pp. 1-6

Author(s):

Katarina Pavlović

Keyword(s):

Physical Activity ◽

Data Science ◽

Human Movement ◽

Supervised Machine Learning ◽

Accelerometer Data ◽

Digital Devices ◽

Physical Mobility ◽

Digital Phenotyping ◽

Using Data ◽

Development of various statistical learning methods and their implementation in mobile device software enables moment-by-moment study of human social interactions, behavioral patterns, sleep, as well as their physical mobility and gross motor activity. Recently, through the use of supervised Machine Learning, human activity recognition (HAR) has been found numerous applications in biomedical engineering especially in the field of digital phenotyping. Having this in mind, in this research in order to be able to quantify the human movement activity in situ, using data from portable digital devices, we have developed code which uses Random Forest Classifier to predict the type of physical activity from tri-axial smartphone accelerometer data. The code has been written using Python programing language and Anaconda distribution of data-science packages. Raw accelerometer data was collected by using the Beiwe research platform, which is developed by the Onnela Lab at the Harvard T.H. Chan School of Public Health. Tuning has been performed by defining a grid of hyperparameter ranges, using Scikit-Learn’s Randomized Search CV method, randomly sampling from the grid and performing K-Fold CV with each combination of tested values. Obtained results will enable development a more robust models for predicting the type of physical activity with more subjects, usage of different hardwares, various test situations, and different environments.

Exponential upper bounds for the runtime of randomized search heuristics

Theoretical Computer Science ◽

10.1016/j.tcs.2020.09.032 ◽

2021 ◽

Vol 851 ◽

pp. 24-38

Author(s):

Benjamin Doerr

Keyword(s):

Upper Bounds ◽

Search Heuristics ◽

Tail bounds on hitting times of randomized search heuristics using variable drift analysis

Combinatorics Probability Computing ◽

10.1017/s0963548320000565 ◽

2020 ◽

pp. 1-20

Author(s):

P. K. Lehre ◽

C. Witt

Keyword(s):

Hitting Time ◽

Linear Functions ◽

Hitting Times ◽

Benchmark Problems ◽

Expected Time ◽

Search Heuristics ◽

Drift Analysis ◽

Number Of Cycles ◽

Abstract Drift analysis is one of the state-of-the-art techniques for the runtime analysis of randomized search heuristics (RSHs) such as evolutionary algorithms (EAs), simulated annealing, etc. The vast majority of existing drift theorems yield bounds on the expected value of the hitting time for a target state, for example the set of optimal solutions, without making additional statements on the distribution of this time. We address this lack by providing a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution. The new tail bounds are applied to prove very precise sharp-concentration results on the running time of a simple EA on standard benchmark problems, including the class of general linear functions. On all these problems, the probability of deviating by an r-factor in lower-order terms of the expected time decreases exponentially with r. The usefulness of the theorem outside the theory of RSHs is demonstrated by deriving tail bounds on the number of cycles in random permutations. All these results handle a position-dependent (variable) drift that was not covered by previous drift theorems with tail bounds. Finally, user-friendly specializations of the general drift theorem are given.

Fraudulent Credit Card Transactions Classification using Randomized Search CV with XGB Classifier

International Journal of Engineering and Techniques ◽

10.29126/23951303/ijet-v6i5p3 ◽

2020 ◽

Vol 6 (5) ◽

Author(s):

Mr. Kapil Dev Tripathi ◽

Mr.Vikash Singh Rajput

Keyword(s):

Credit Card ◽

A note to: A multiple-rule based constructive randomized search algorithm for solving assembly line worker assignment and balancing problem

Journal of Intelligent Manufacturing ◽

10.1007/s10845-020-01632-8 ◽

2020 ◽

Author(s):

Adalberto Sato Michels ◽

Alysson M. Costa

Keyword(s):

Assembly Line ◽

Search Algorithm ◽

Worker Assignment ◽

Rule Based ◽

More effective randomized search heuristics for graph coloring through dynamic optimization

Proceedings of the 2020 Genetic and Evolutionary Computation Conference ◽

10.1145/3377930.3390174 ◽

2020 ◽

Author(s):

Jakob Bossek ◽

Frank Neumann ◽

Pan Peng ◽

Dirk Sudholt

Keyword(s):

Dynamic Optimization ◽

Graph Coloring ◽

Search Heuristics ◽