Machine Learning for Adaptive Planning

This chapter is concerned with the enhancement of planning systems using techniques from Machine Learning in order to automatically configure their planning parameters according to the morphology of the problem in hand. It presents two different adaptive systems that set the planning parameters of a highly adjustable planner based on measurable characteristics of the problem instance. The planners have acquired their knowledge from a large data set produced by results from experiments on many problems from various domains. The first planner is a rule-based system that employs propositional rule learning to induce knowledge that suggests effective configuration of planning parameters based on the problem’s characteristics. The second planner employs instance-based learning in order to find problems with similar structure and adopt the planner configuration that has proved in the past to be effective on these problems. The validity of the two adaptive systems is assessed through experimental results that demonstrate the boost in performance in problems of both known and unknown domains. Comparative experimental results for the two planning systems are presented along with a discussion of their advantages and disadvantages.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Machine learning model for feature recognition of sports competition based on improved TLD algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189312 ◽

2020 ◽

pp. 1-12

Author(s):

Qinglong Ding ◽

Zhenfeng Ding

Keyword(s):

Machine Learning ◽

Feature Recognition ◽

Experimental Results ◽

Pedestrian Tracking ◽

Data Set ◽

Recognition Model ◽

Standard Data ◽

Machine Learning Model ◽

Environmental Background

Sports competition characteristics play an important role in judging the fairness of the game and improving the skills of the athletes. At present, the feature recognition of sports competition is affected by the environmental background, which causes problems in feature recognition. In order to improve the effect of feature recognition of sports competition, this study improves the TLD algorithm, and uses machine learning to build a feature recognition model of sports competition based on the improved TLD algorithm. Moreover, this study applies the TLD algorithm to the long-term pedestrian tracking of PTZ cameras. In view of the shortcomings of the TLD algorithm, this study improves the TLD algorithm. In addition, the improved TLD algorithm is experimentally analyzed on a standard data set, and the improved TLD algorithm is experimentally verified. Finally, the experimental results are visually represented by mathematical statistics methods. The research shows that the method proposed by this paper has certain effects.

Download Full-text

Precision-Recall versus Accuracy and the Role of Large Data Sets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014039 ◽

2019 ◽

Vol 33 ◽

pp. 4039-4048 ◽

Cited By ~ 8

Author(s):

Brendan Juba ◽

Hai S. Le

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Large Data ◽

Constant Factor ◽

Data Sets ◽

Data Set ◽

Small Constant ◽

Classifier Performance ◽

Necessary And Sufficient

Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that a larger number of examples is necessary and sufficient to address class imbalance, a finding we also illustrate empirically.

Download Full-text

HOMPer: A new hybrid system for opinion mining in the Persian language

Journal of Information Science ◽

10.1177/0165551519827886 ◽

2019 ◽

Vol 46 (1) ◽

pp. 101-117 ◽

Cited By ~ 3

Author(s):

Mohammad Ehsan Basiri ◽

Arman Kabiri

Keyword(s):

Machine Learning ◽

Language Processing ◽

Opinion Mining ◽

Feature Selection Method ◽

Large Data ◽

Data Set ◽

Persian Language ◽

Rating Prediction ◽

Bayes Algorithm ◽

Component Feature

Opinion mining is a subfield of data mining and natural language processing that concerns with extracting users’ opinion and attitude towards products or services from their comments on the Web. Persian opinion mining, in contrast to its counterpart in English, is a totally new field of study and hence, it has not received the attention it deserves. Existing methods for opinion mining in the Persian language may be classified into machine learning– and lexicon-based approaches. These methods have been proposed and successfully used for polarity-detection problem. However, when they should be used for more complex tasks like rating prediction, their results are not desirable. In this study, first an exhaustive investigation of machine learning– and lexicon-based methods is performed. Then, a new hybrid method is proposed for rating-prediction problem in the Persian language. Finally, the effect of machine learning component, feature-selection method, normalisation method and combination level are investigated. The experimental results on a large data set containing 16,000 Persian customers’ review show that this proposed system achieves higher performance in comparison to Naïve Bayes algorithm and a pure lexicon-based method. Moreover, results demonstrate that this proposed method may also be successfully used for polarity detection.

Download Full-text

Machine Learning for Predicting Mycotoxin Occurrence in Maize

Frontiers in Microbiology ◽

10.3389/fmicb.2021.661132 ◽

2021 ◽

Vol 12 ◽

Author(s):

Marco Camardo Leggieri ◽

Marco Mazzoni ◽

Paola Battilani

Keyword(s):

Machine Learning ◽

Mechanistic Model ◽

Cropping System ◽

Large Data ◽

Predictive Performance ◽

Added Value ◽

Linear Regression Models ◽

Data Set ◽

Input Variables ◽

Aflatoxin B

Meteorological conditions are the main driving variables for mycotoxin-producing fungi and the resulting contamination in maize grain, but the cropping system used can mitigate this weather impact considerably. Several researchers have investigated cropping operations’ role in mycotoxin contamination, but these findings were inconclusive, precluding their use in predictive modeling. In this study a machine learning (ML) approach was considered, which included weather-based mechanistic model predictions for AFLA-maize and FER-maize [predicting aflatoxin B1 (AFB1) and fumonisins (FBs), respectively], and cropping system factors as the input variables. The occurrence of AFB1 and FBs in maize fields was recorded, and their corresponding cropping system data collected, over the years 2005–2018 in northern Italy. Two deep neural network (DNN) models were trained to predict, at harvest, which maize fields were contaminated beyond the legal limit with AFB1 and FBs. Both models reached an accuracy >75% demonstrating the ML approach added value with respect to classical statistical approaches (i.e., simple or multiple linear regression models). The improved predictive performance compared with that obtained for AFLA-maize and FER-maize was clearly demonstrated. This coupled to the large data set used, comprising a 13-year time series, and the good results for the statistical scores applied, together confirmed the robustness of the models developed here.

Download Full-text

Prediction of Misclassification Data using Cognitive Bayes Computation Techniques (COBACO)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c7975.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 928-932

Keyword(s):

Machine Learning ◽

Missing Data ◽

Large Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Accuracy Rate ◽

Data Set ◽

Predictive Values ◽

Time Operation

Missing data arise major issues in the large database regarding quantitative analysis. Due to this issues, the inference of the computational process produce bias results, more damage of data, the error rate can increase, and more difficult to accomplish the process of imputation. Prediction of disguised missing data occurs in the large data sets are another major problems in real time operation. Machine learning (ML) techniques to connect with the classification of measurement to enforce the accuracy rate of predictive values. These techniques overcome the various challenges to the problem of losing data. Recent work based on the prediction of misclassification using supervised ML approach; to predict an output for an unseen input with limited parameters in a data set. When increase the size of parameter, then it generates the outcome of less accuracy rate. This article presented a new approach COBACO, an effective supervised machine learning technique. Several strategies describe the classification of predictive techniques for missing data analysis in efficient supervised machine learning techniques. The proposed predictive techniques COBACO generated more precise, accurate results than the other predictive approaches. The Experimental results obtained using both real and synthetic data set show that the proposed approach offers a valuable and promising insight to the problem of prediction of missing information.

Download Full-text

Automated detection of glaucoma with interpretable machine learning using clinical data and multi-modal retinal images

10.1101/2020.02.26.967208 ◽

2020 ◽

Author(s):

Parmita Mehta ◽

Christine Petersen ◽

Joanne C. Wen ◽

Michael R. Banitt ◽

Philip P. Chen ◽

...

Keyword(s):

Machine Learning ◽

Model Performance ◽

Image Data ◽

Large Data ◽

Population Level ◽

High Rate ◽

Data Set ◽

Glaucoma Diagnosis ◽

Model Interpretation ◽

Glaucoma Detection

AbstractGlaucoma, the leading cause of irreversible blindness worldwide, is a disease that damages the optic nerve. Current machine learning (ML) approaches for glaucoma detection rely on features such as retinal thickness maps; however, the high rate of segmentation errors when creating these maps increase the likelihood of faulty diagnoses. This paper proposes a new, comprehensive, and more accurate ML-based approach for population-level glaucoma screening. Our contributions include: (1) a multi-modal model built upon a large data set that includes demographic, systemic and ocular data as well as raw image data taken from color fundus photos (CFPs) and macular Optical Coherence Tomography (OCT) scans, (2) model interpretation to identify and explain data features that lead to accurate model performance, and (3) model validation via comparison of model output with clinician interpretation of CFPs. We also validated the model on a cohort that was not diagnosed with glaucoma at the time of imaging but eventually received a glaucoma diagnosis. Results show that our model is highly accurate (AUC 0.97) and interpretable. It validated biological features known to be related to the disease, such as age, intraocular pressure and optic disc morphology. Our model also points to previously unknown or disputed features, such as pulmonary capacity and retinal outer layers.

Download Full-text

Research on SVM environment performance of parallel computing based on large data set of machine learning

The Journal of Supercomputing ◽

10.1007/s11227-019-02894-7 ◽

2019 ◽

Vol 75 (9) ◽

pp. 5966-5983

Author(s):

Yunlu Gong ◽

Lianguo Jia

Keyword(s):

Machine Learning ◽

Parallel Computing ◽

Large Data ◽

Data Set ◽

Large Data Set ◽

Environment Performance

Download Full-text

SMMPPI: a machine learning-based approach for prediction of modulators of protein–protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2

Briefings in Bioinformatics ◽

10.1093/bib/bbab111 ◽

2021 ◽

Author(s):

Priya Gupta ◽

Debasisa Mohanty

Keyword(s):

Machine Learning ◽

Small Molecule ◽

Protein Interactions ◽

Large Data ◽

Docking Studies ◽

Protein Protein Interactions ◽

Data Set ◽

Drug Candidates ◽

Test Sets ◽

Small Molecule Modulators

Abstract Small molecule modulators of protein–protein interactions (PPIs) are being pursued as novel anticancer, antiviral and antimicrobial drug candidates. We have utilized a large data set of experimentally validated PPI modulators and developed machine learning classifiers for prediction of new small molecule modulators of PPI. Our analysis reveals that using random forest (RF) classifier, general PPI Modulators independent of PPI family can be predicted with ROC-AUC higher than 0.9, when training and test sets are generated by random split. The performance of the classifier on data sets very different from those used in training has also been estimated by using different state of the art protocols for removing various types of bias in division of data into training and test sets. The family-specific PPIM predictors developed in this work for 11 clinically important PPI families also have prediction accuracies of above 90% in majority of the cases. All these ML-based predictors have been implemented in a freely available software named SMMPPI for prediction of small molecule modulators for clinically relevant PPIs like RBD:hACE2, Bromodomain_Histone, BCL2-Like_BAX/BAK, LEDGF_IN, LFA_ICAM, MDM2-Like_P53, RAS_SOS1, XIAP_Smac, WDR5_MLL1, KEAP1_NRF2 and CD4_gp120. We have identified novel chemical scaffolds as inhibitors for RBD_hACE PPI involved in host cell entry of SARS-CoV-2. Docking studies for some of the compounds reveal that they can inhibit RBD_hACE2 interaction by high affinity binding to interaction hotspots on RBD. Some of these new scaffolds have also been found in SARS-CoV-2 viral growth inhibitors reported recently; however, it is not known if these molecules inhibit the entry phase.

Download Full-text