DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization

Modern machine learning algorithms crucially rely on several design decisions to achieve strong performance, making the problem of Hyperparameter Optimization (HPO) more important than ever. Here, we combine the advantages of the popular bandit-based HPO method Hyperband (HB) and the evolutionary search approach of Differential Evolution (DE) to yield a new HPO method which we call DEHB. Comprehensive results on a very broad range of HPO problems, as well as a wide range of tabular benchmarks from neural architecture search, demonstrate that DEHB achieves strong performance far more robustly than all previous HPO methods we are aware of, especially for high-dimensional problems with discrete input dimensions. For example, DEHB is up to 1000x faster than random search. It is also efficient in computational time, conceptually simple and easy to implement, positioning it well to become a new default HPO method.

Download Full-text

Weighted Random Search for Hyperparameter Optimization

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2019.2.3514 ◽

2019 ◽

Vol 14 (2) ◽

pp. 154-169 ◽

Cited By ~ 2

Author(s):

Adrian-Catalin Florea ◽

Razvan Andonie

Keyword(s):

Machine Learning ◽

Optimization Problem ◽

Random Search ◽

Machine Learning Algorithms ◽

New Combinations ◽

Hyperparameter Optimization ◽

A Value ◽

Computational Budget ◽

Theoretical Results ◽

Discrete Domain

We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.

Download Full-text

Deep neural network-based automatic metasurface design with a wide frequency range

Scientific Reports ◽

10.1038/s41598-021-86588-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fardin Ghorbani ◽

Sina Beyraghi ◽

Javad Shabanpour ◽

Homayoon Oraizi ◽

Hossein Soleimani ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Cell Structure ◽

Machine Learning Algorithms ◽

Inverse Design ◽

Computational Time ◽

Network Architectures ◽

Average Accuracy ◽

Wide Range ◽

Working Frequency

AbstractBeyond the scope of conventional metasurface, which necessitates plenty of computational resources and time, an inverse design approach using machine learning algorithms promises an effective way for metasurface design. In this paper, benefiting from Deep Neural Network (DNN), an inverse design procedure of a metasurface in an ultra-wide working frequency band is presented in which the output unit cell structure can be directly computed by a specified design target. To reach the highest working frequency for training the DNN, we consider 8 ring-shaped patterns to generate resonant notches at a wide range of working frequencies from 4 to 45 GHz. We propose two network architectures. In one architecture, we restrict the output of the DNN, so the network can only generate the metasurface structure from the input of 8 ring-shaped patterns. This approach drastically reduces the computational time, while keeping the network’s accuracy above 91%. We show that our model based on DNN can satisfactorily generate the output metasurface structure with an average accuracy of over 90% in both network architectures. Determination of the metasurface structure directly without time-consuming optimization procedures, an ultra-wide working frequency, and high average accuracy equip an inspiring platform for engineering projects without the need for complex electromagnetic theory.

Download Full-text

RepPer: Perception of Psychiatric Disorders on Twitter in French (Preprint)

10.2196/preprints.18539 ◽

2020 ◽

Author(s):

Sarah Delanys ◽

Farah Benamara ◽

Véronique Moriceau ◽

François Olivier ◽

Josiane Mothe

Keyword(s):

Social Media ◽

Psychiatric Disorders ◽

Digital Technology ◽

Psychotic Disorders ◽

Negative Polarity ◽

Machine Learning Algorithms ◽

Annotation Scheme ◽

Word Use ◽

Wide Range ◽

Initial Dataset

BACKGROUND With the advent of digital technology and specifically user generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French. OBJECTIVE Our aim is to study how generic, nosographic and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has three complementary goals: (1) to analyze the types of psychiatric word use namely medical, misuse, irrelevant, (2) to analyze the polarity conveyed in the tweets that use these terms (positive/negative/neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English ). METHODS Our study has been conducted on a corpus of tweets in French posted between 01/01/2016 to 12/31/2018 and collected using dedicated keywords. The corpus has been manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. Two analysis have been performed. First a qualitative analysis to measure the reliability of the produced manual annotation, then a quantitative analysis considering mainly term frequency in each layer and exploring the interactions between them. RESULTS One of the first result is a resource as an annotated dataset . The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3,040 tweets that corresponds to the resource resulting from our work. The second result is the analysis of the annotations; it shows that terms are misused in 45.3% of the tweets and that their associated polarity is negative in 86.2% of the cases. When considering the three types of term use, 59.5% of the tweets are associated to a negative polarity. Misused terms related to psychotic disorders (55.5%) are more frequent to those related to mood disorders (26.5%). CONCLUSIONS Some psychiatric terms are misused in the corpora we studied; which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks which are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced; so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Early Dropout Prediction in MOOCs through Supervised Learning and Hyperparameter Optimization

Electronics ◽

10.3390/electronics10141701 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1701

Author(s):

Theodor Panagiotakopoulos ◽

Sotiris Kotsiantis ◽

Georgios Kostopoulos ◽

Omiros Iatrellis ◽

Achilles Kameas

Keyword(s):

Online Education ◽

Online Courses ◽

Early Stage ◽

Activity Patterns ◽

Daily Basis ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Educational Institutions ◽

Student Dropout ◽

Wide Range

Over recent years, massive open online courses (MOOCs) have gained increasing popularity in the field of online education. Students with different needs and learning specificities are able to attend a wide range of specialized online courses offered by universities and educational institutions. As a result, large amounts of data regarding students’ demographic characteristics, activity patterns, and learning performances are generated and stored in institutional repositories on a daily basis. Unfortunately, a key issue in MOOCs is low completion rates, which directly affect student success. Therefore, it is of utmost importance for educational institutions and faculty members to find more effective practices and reduce non-completer ratios. In this context, the main purpose of the present study is to employ a plethora of state-of-the-art supervised machine learning algorithms for predicting student dropout in a MOOC for smart city professionals at an early stage. The experimental results show that accuracy exceeds 96% based on data collected during the first week of the course, thus enabling effective intervention strategies and support actions.

Download Full-text

Detecting cybersecurity attacks across different network features and learners

Journal Of Big Data ◽

10.1186/s40537-021-00426-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Joffrey L. Leevy ◽

John Hancock ◽

Richard Zuech ◽

Taghi M. Khoshgoftaar

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Operating Characteristic ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Feature Selection Technique ◽

Impact Performance ◽

Detection Model ◽

Wide Range ◽

Research Questions

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

Download Full-text

Advanced CFD Simulations of free-surface flows around modern sailing yachts using a newly developed openFOAM solver

10.5957/csys-2016-013 ◽

2016 ◽

Author(s):

Janek Meyer ◽

Hannes Renzsch ◽

Kai Graf ◽

Thomas Slawig

Keyword(s):

Free Surface ◽

Large Scale ◽

Breaking Waves ◽

Free Surface Flows ◽

Body Motion ◽

Computational Time ◽

Efficient Computation ◽

Surface Flows ◽

Scale Free ◽

Wide Range

While plain vanilla OpenFOAM has strong capabilities with regards to quite a few typical CFD-tasks, some problems actually require additional bespoke solvers and numerics for efficient computation of high-quality results. One of the fields requiring these additions is the computation of large-scale free-surface flows as found e.g. in naval architecture. This holds especially for the flow around typical modern yacht hulls, often planing, sometimes with surface-piercing appendages. Particular challenges include, but are not limited to, breaking waves, sharpness of interface, numerical ventilation (aka streaking) and a wide range of flow phenomenon scales. A new OF-based application including newly implemented discretization schemes, gradient computation and rigid body motion computation is described. In the following the new code will be validated against published experimental data; the effect on accuracy, computational time and solver stability will be shown by comparison to standard OF-solvers (interFoam / interDyMFoam) and Star CCM+. The code’s capabilities to simulate complex “real-world” flows are shown on a well-known racing yacht design.

Download Full-text

Thermoacoustic Stability Analysis of a Full-Annular Lean Combustor for Heavy-Duty Applications

10.1115/gt2021-59267 ◽

2021 ◽

Author(s):

Daniele Pampaloni ◽

Antonio Andreini ◽

Alessandro Marini ◽

Giovanni Riccio ◽

Gianni Ceccherini

Keyword(s):

Stability Analysis ◽

Gas Turbine ◽

Flame Temperature ◽

Numerical Procedure ◽

Pollutant Emissions ◽

Computational Time ◽

Heavy Duty ◽

Operational Parameters ◽

3D Fem ◽

Wide Range

Abstract Thermoacoustic characterization of gas turbine combustion systems is of primary importance for successful development of gas turbine technology, to meet the stringent targets on pollutant emissions. In this context, it becomes more and more necessary to develop reliable tools to be used in the industrial design process. The dynamics of a lean-premixed full-annular combustor for heavy-duty applications has been numerically studied in this work. The well-established CFD-SI method has been used to investigate the flame response varying operational parameters such as the flame temperature (global equivalence ratio) and the fuel split between premixed and pilot fuel injections: such a wide range experimental characterization represents an opportunity to validate the employed numerical methods and to give a deeper insight into the flame dynamics. URANS simulations have been performed, due to their affordable computational costs from the industrial perspective, after validating their accuracy through the comparison against LES results. Furthermore, an approach where the pilot and the premixed flame responses are analyzed separately is proposed, exploiting the independence of their evolution. The calculated FTFs have been implemented in a 3D FEM model of the chamber, in order to perform linear stability analysis and to validate the numerical approach. A boundary condition for rotational periodicity based on Bloch-Wave theory has been implemented into the Helmholtz solver and validated against full-annular chamber simulations, allowing a significant reduction in computational time. The reliability of the numerical procedure has been assessed through the comparison against full-annular experimental results.

Download Full-text

Numerical Model for the Analysis of Thermal Transients in District Heating Networks

E3S Web of Conferences ◽

10.1051/e3sconf/202019701004 ◽

2020 ◽

Vol 197 ◽

pp. 01004

Author(s):

Martina Capone ◽

Elisa Guelpa ◽

Vittorio Verda

Keyword(s):

Carbon Dioxide Emissions ◽

Renewable Energy Sources ◽

Management Strategies ◽

District Heating ◽

Equivalent Model ◽

Computational Time ◽

Physical Parameters ◽

Thermal Transients ◽

Proposed Model ◽

Wide Range

As District Heating (DH) networks are experiencing an evolution towards the so-called 4th generation, there is a need to update the currently used models to take into account the ever-increasing complexity of this technology. Indeed, to further improve the reduction in energy consumption and carbon-dioxide emissions, a wide range of technologies and management strategies are being introduced within district heating, such as a large exploitation of Renewable Energy Sources (RES). As a consequence, thermal transients assume a major importance, posing the need to redefine the relevant physical parameters and to develop a model which accurately describes their behaviour. In this framework, this paper proposes a quantitative analysis of the influence of the pipe heat-capacity on the model. Moreover, an equivalent-model, which is able to take into account the two heat capacities of steel and water in just one equation, is proposed and compared with two commonly used approaches. One of the features of the proposed model is the suitability for application to large networks. To prove its capabilities, an application to the Turin district heating network, which is among the largest systems in Europe, is proposed. Results show significant improvements in terms of accuracy over computational time ratio.

Download Full-text

Flexible Bayesian Nonlinear Model Configuration

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13047 ◽

2021 ◽

Vol 72 ◽

pp. 901-942

Author(s):

Aliaksandr Hubin ◽

Geir Storvik ◽

Florian Frommlet

Keyword(s):

Regression Models ◽

Nonlinear Models ◽

Model Averaging ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Parametric Models ◽

Monte Carlo Algorithm ◽

Wide Range ◽

Nonlinear Features ◽

Interpretable Models

Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional flexibility on the possible types of features to be considered. This flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modi ed mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.

Download Full-text