randomized search
Recently Published Documents


TOTAL DOCUMENTS

109
(FIVE YEARS 20)

H-INDEX

19
(FIVE YEARS 2)

2021 ◽  
Vol 8 (1) ◽  
pp. 57-64
Author(s):  
Lionel Reinhart Halim ◽  
Alethea Suryadibrata

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively.   Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling


Algorithmica ◽  
2021 ◽  
Author(s):  
Jakob Bossek ◽  
Frank Neumann ◽  
Pan Peng ◽  
Dirk Sudholt

AbstractWe contribute to the theoretical understanding of randomized search heuristics for dynamic problems. We consider the classical vertex coloring problem on graphs and investigate the dynamic setting where edges are added to the current graph. We then analyze the expected time for randomized search heuristics to recompute high quality solutions. The (1+1) Evolutionary Algorithm and RLS operate in a setting where the number of colors is bounded and we are minimizing the number of conflicts. Iterated local search algorithms use an unbounded color palette and aim to use the smallest colors and, consequently, the smallest number of colors. We identify classes of bipartite graphs where reoptimization is as hard as or even harder than optimization from scratch, i.e., starting with a random initialization. Even adding a single edge can lead to hard symmetry problems. However, graph classes that are hard for one algorithm turn out to be easy for others. In most cases our bounds show that reoptimization is faster than optimizing from scratch. We further show that tailoring mutation operators to parts of the graph where changes have occurred can significantly reduce the expected reoptimization time. In most settings the expected reoptimization time for such tailored algorithms is linear in the number of added edges. However, tailored algorithms cannot prevent exponential times in settings where the original algorithm is inefficient.


Author(s):  
Aruna M ◽  
M Anjana ◽  
Harshita Chauhan ◽  
Deepa R

The price of a car depreciates right from the time it is bought. The resale value of cars is influenced by many factors and influences both buyers and sellers, making it a prominent problem in the machine learning field. Diverse methodologies in machine learning can help us use all the varied factors and process a large amount of data to predict the cost. For our dataset, the Random Forest Regression algorithm shows a significant increase in the prediction rate. In order to optimise the Random Forest Regressor model, best hyperparameters can be found using hyperparameter tuning strategies. On comparing Grid Search and Randomized Search, a better prediction rate is accounted for using the former. These parameters are then passed to the algorithm as hyperparameter tuning can help collect the best batch of decision trees in the random forest for the most optimised prediction rate.


Author(s):  
Katarina Pavlović

Development of various statistical learning methods and their implementation in mobile device software enables moment-by-moment study of human social interactions, behavioral patterns, sleep, as well as their  physical mobility and gross motor activity. Recently, through the use of supervised Machine Learning, human activity recognition (HAR) has been found numerous applications in biomedical engineering especially in the field of digital phenotyping. Having this in mind, in this research in order to be able to quantify the human movement activity in situ, using data from portable digital devices,  we have developed code which uses Random Forest Classifier to predict the type of physical activity from tri-axial smartphone accelerometer data. The code has been written using Python programing language and Anaconda distribution of data-science packages. Raw accelerometer data was collected by using the Beiwe research platform, which is developed by the Onnela Lab at the Harvard T.H. Chan School of Public Health. Tuning has been performed by defining a grid of hyperparameter ranges, using Scikit-Learn’s Randomized Search CV method, randomly sampling from the grid and performing K-Fold CV with each combination of tested values. Obtained results will enable development a more robust models for predicting the type of physical activity with more subjects, usage of different hardwares, various test situations, and different environments.


Author(s):  
P. K. Lehre ◽  
C. Witt

Abstract Drift analysis is one of the state-of-the-art techniques for the runtime analysis of randomized search heuristics (RSHs) such as evolutionary algorithms (EAs), simulated annealing, etc. The vast majority of existing drift theorems yield bounds on the expected value of the hitting time for a target state, for example the set of optimal solutions, without making additional statements on the distribution of this time. We address this lack by providing a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution. The new tail bounds are applied to prove very precise sharp-concentration results on the running time of a simple EA on standard benchmark problems, including the class of general linear functions. On all these problems, the probability of deviating by an r-factor in lower-order terms of the expected time decreases exponentially with r. The usefulness of the theorem outside the theory of RSHs is demonstrated by deriving tail bounds on the number of cycles in random permutations. All these results handle a position-dependent (variable) drift that was not covered by previous drift theorems with tail bounds. Finally, user-friendly specializations of the general drift theorem are given.


Sign in / Sign up

Export Citation Format

Share Document