Classifier-based constraint acquisition

AbstractModeling a combinatorial problem is a hard and error-prone task requiring significant expertise. Constraint acquisition methods attempt to automate this process by learning constraints from examples of solutions and (usually) non-solutions. Active methods query an oracle while passive methods do not. We propose a known but not widely-used application of machine learning to constraint acquisition: training a classifier to discriminate between solutions and non-solutions, then deriving a constraint model from the trained classifier. We discuss a wide range of possible new acquisition methods with useful properties inherited from classifiers. We also show the potential of this approach using a Naive Bayes classifier, obtaining a new passive acquisition algorithm that is considerably faster than existing methods, scalable to large constraint sets, and robust under errors.

Download Full-text

Boosting Combinatorial Problem Modeling with Machine Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/772 ◽

2018 ◽

Cited By ~ 11

Author(s):

Michele Lombardi ◽

Michela Milano

Keyword(s):

Machine Learning ◽

Combinatorial Problem ◽

Machine Learning Techniques ◽

Domain Experts ◽

Pervasive Technology ◽

Modeling Process ◽

Learning Techniques ◽

Wide Range ◽

Efficiency And Effectiveness ◽

Optimization Problem Solving

In the past few years, the area of Machine Learning (ML) has witnessed tremendous advancements, becoming a pervasive technology in a wide range of applications. One area that can significantly benefit from the use of ML is Combinatorial Optimization. The three pillars of constraint satisfaction and optimization problem solving, i.e., modeling, search, and optimization, can exploit ML techniques to boost their accuracy, efficiency and effectiveness. In this survey we focus on the modeling component, whose effectiveness is crucial for solving the problem. The modeling activity has been traditionally shaped by optimization and domain experts, interacting to provide realistic results. Machine Learning techniques can tremendously ease the process, and exploit the available data to either create models or refine expert-designed ones. In this survey we cover approaches that have been recently proposed to enhance the modeling process by learning either single constraints, objective functions, or the whole model. We highlight common themes to multiple approaches and draw connections with related fields of research.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Machine Learning Techniques ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.

Download Full-text

COVID-19 Outbreak Prediction with Machine Learning

10.34055/osf.io/xr4js ◽

2020 ◽

Author(s):

Sina Faizollahzadeh Ardabili ◽

Amir Mosavi ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

Annamaria R. Varkonyi-Koczy ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Fuzzy Inference ◽

Control Measures ◽

Future Research ◽

Complex Nature ◽

Inference System ◽

Wide Range ◽

Standard Models ◽

High Level

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text

Intelligent Techniques Analysis for Glycosylation Site Prediction

Current Bioinformatics ◽

10.2174/1574893615666210108094847 ◽

2021 ◽

Vol 15 ◽

Author(s):

Alhassan Alkuhlani ◽

Walaa Gad ◽

Mohamed Roushdy ◽

Abdel-Badeeh M. Salem

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Cell Interaction ◽

Glycosylation Site ◽

Machine Learning Classification ◽

Site Prediction ◽

Glycosylation Sites ◽

Wide Range ◽

Feature Extraction And Selection ◽

Computational Intelligent

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.

Download Full-text

Global soil moisture data derived through machine learning trained with in-situ measurements

Scientific Data ◽

10.1038/s41597-021-00964-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sungmin O. ◽

Rene Orth

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Large Scale ◽

Short Term Memory ◽

Temporal Dynamics ◽

Soil Moisture Data ◽

Wide Range ◽

Global Soil

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language

Journal of Intelligent Systems ◽

10.1515/jisys-2021-0045 ◽

2021 ◽

Vol 30 (1) ◽

pp. 774-792

Author(s):

Mazin Abed Mohammed ◽

Dheyaa Ahmed Ibrahim ◽

Akbal Omran Salman

Keyword(s):

Natural Language ◽

Naive Bayes ◽

False Negative ◽

Naïve Bayes ◽

Final Decision ◽

Learning Approach ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Wide Range

Abstract Spam electronic mails (emails) refer to harmful and unwanted commercial emails sent to corporate bodies or individuals to cause harm. Even though such mails are often used for advertising services and products, they sometimes contain links to malware or phishing hosting websites through which private information can be stolen. This study shows how the adaptive intelligent learning approach, based on the visual anti-spam model for multi-natural language, can be used to detect abnormal situations effectively. The application of this approach is for spam filtering. With adaptive intelligent learning, high performance is achieved alongside a low false detection rate. There are three main phases through which the approach functions intelligently to ascertain if an email is legitimate based on the knowledge that has been gathered previously during the course of training. The proposed approach includes two models to identify the phishing emails. The first model has proposed to identify the type of the language. New trainable model based on Naive Bayes classifier has also been proposed. The proposed model is trained on three types of languages (Arabic, English and Chinese) and the trained model has used to identify the language type and use the label for the next model. The second model has been built by using two classes (phishing and normal email for each language) as a training data. The second trained model (Naive Bayes classifier) has been applied to identify the phishing emails as a final decision for the proposed approach. The proposed strategy is implemented using the Java environments and JADE agent platform. The testing of the performance of the AIA learning model involved the use of a dataset that is made up of 2,000 emails, and the results proved the efficiency of the model in accurately detecting and filtering a wide range of spam emails. The results of our study suggest that the Naive Bayes classifier performed ideally when tested on a database that has the biggest estimate (having a general accuracy of 98.4%, false positive rate of 0.08%, and false negative rate of 2.90%). This indicates that our Naive Bayes classifier algorithm will work viably on the off chance, connected to a real-world database, which is more common but not the largest.

Download Full-text

Accelerating the optimization of enzyme-catalyzed synthesis conditions via machine learning and reactivity descriptors

Organic & Biomolecular Chemistry ◽

10.1039/d1ob01066b ◽

2021 ◽

Author(s):

Zhongyu Wan ◽

Quan-De Wang ◽

Dongchang Liu ◽

Jinhu Liang

Keyword(s):

Machine Learning ◽

Optimal Synthesis ◽

Human Knowledge ◽

Crucial Importance ◽

Reactivity Descriptors ◽

Synthesis Conditions ◽

Wide Range ◽

Synthesis Reactions ◽

Selection Of

Enzyme-catalyzed synthesis reactions are of crucial importance for a wide range of applications. An accurate and rapid selection of optimal synthesis conditions is crucial and challenging for both human knowledge...

Download Full-text

A machine learning platform for the discovery of materials

Journal of Cheminformatics ◽

10.1186/s13321-021-00518-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Carl E. Belle ◽

Vural Aksakalli ◽

Salvy P. Russo

Keyword(s):

Machine Learning ◽

Density Functional ◽

Gw Approximation ◽

Dft Methods ◽

Functional Theory ◽

Unit Cell Size ◽

Learning Platform ◽

Photovoltaic Materials ◽

Wide Range ◽

Quantum Espresso

AbstractFor photovoltaic materials, properties such as band gap $$E_{g}$$ E g are critical indicators of the material’s suitability to perform a desired function. Calculating $$E_{g}$$ E g is often performed using Density Functional Theory (DFT) methods, although more accurate calculation are performed using methods such as the GW approximation. DFT software often used to compute electronic properties includes applications such as VASP, CRYSTAL, CASTEP or Quantum Espresso. Depending on the unit cell size and symmetry of the material, these calculations can be computationally expensive. In this study, we present a new machine learning platform for the accurate prediction of properties such as $$E_{g}$$ E g of a wide range of materials.

Download Full-text