Feature partitioning for robust tree ensembles and their certification in adversarial scenarios

AbstractMachine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the brute-force approach would be unfeasible.

Download Full-text

An Efficient SMOTE-Based Deep Learning Model for Heart Attack Prediction

Scientific Programming ◽

10.1155/2021/6621622 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Muhammad Waqar ◽

Hassan Dawood ◽

Hussain Dawood ◽

Nadeem Majeed ◽

Ameen Banjar ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Heart Attack ◽

High Reliability ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Feature Engineering ◽

Unequal Distribution ◽

The Given

Cardiac disease treatments are often being subjected to the acquisition and analysis of vast quantity of digital cardiac data. These data can be utilized for various beneficial purposes. These data’s utilization becomes more important when we are dealing with critical diseases like a heart attack where patient life is often at stake. Machine learning and deep learning are two famous techniques that are helping in making the raw data useful. Some of the biggest problems that arise from the usage of the aforementioned techniques are massive resource utilization, extensive data preprocessing, need for features engineering, and ensuring reliability in classification results. The proposed research work presents a cost-effective solution to predict heart attack with high accuracy and reliability. It uses a UCI dataset to predict the heart attack via various machine learning algorithms without the involvement of any feature engineering. Moreover, the given dataset has an unequal distribution of positive and negative classes which can reduce performance. The proposed work uses a synthetic minority oversampling technique (SMOTE) to handle given imbalance data. The proposed system discarded the need of feature engineering for the classification of the given dataset. This led to an efficient solution as feature engineering often proves to be a costly process. The results show that among all machine learning algorithms, SMOTE-based artificial neural network when tuned properly outperformed all other models and many existing systems. The high reliability of the proposed system ensures that it can be effectively used in the prediction of the heart attack.

Download Full-text

Introduction and Implementation of Machine Learning Algorithms in R

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch008 ◽

2019 ◽

pp. 126-147

Author(s):

S. R. Mani Sekhar ◽

G. M. Siddesh

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machine ◽

Discriminant Analysis ◽

Computer Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Linear Discriminant ◽

The Given

Machine learning is one of the important areas in the field of computer science. It helps to provide an optimized solution for the real-world problems by using past knowledge or previous experience data. There are different types of machine learning algorithms present in computer science. This chapter provides the overview of some selected machine learning algorithms such as linear regression, linear discriminant analysis, support vector machine, naive Bayes classifier, neural networks, and decision trees. Each of these methods is illustrated in detail with an example and R code, which in turn assists the reader to generate their own solutions for the given problems.

Download Full-text

Intelligent Malware Detection Using Deep Dilated Residual Networks for Cyber Security

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch050 ◽

2021 ◽

pp. 1085-1099

Author(s):

S. Abijah Roseline ◽

S. Geetha

Keyword(s):

Machine Learning ◽

Cyber Security ◽

Machine Learning Algorithms ◽

Human Interaction ◽

Machine Learning Techniques ◽

Detection Methods ◽

Security Threat ◽

Signature Detection ◽

Learning Techniques ◽

Feature Based

Malware is the most serious security threat, which possibly targets billions of devices like personal computers, smartphones, etc. across the world. Malware classification and detection is a challenging task due to the targeted, zero-day, and stealthy nature of advanced and new malwares. The traditional signature detection methods like antivirus software were effective for detecting known malwares. At present, there are various solutions for detection of such unknown malwares employing feature-based machine learning algorithms. Machine learning techniques detect known malwares effectively but are not optimal and show a low accuracy rate for unknown malwares. This chapter explores a novel deep learning model called deep dilated residual network model for malware image classification. The proposed model showed a higher accuracy of 98.50% and 99.14% on Kaggle Malimg and BIG 2015 datasets, respectively. The new malwares can be handled in real-time with minimal human interaction using the proposed deep residual model.

Download Full-text

Automatic Pulmonary Nodule Detection Applying Deep Learning or Machine Learning Algorithms to the LIDC-IDRI Database: A Systematic Review

Diagnostics ◽

10.3390/diagnostics9010029 ◽

2019 ◽

Vol 9 (1) ◽

pp. 29 ◽

Cited By ~ 20

Author(s):

Lea Pehrson ◽

Michael Nielsen ◽

Carsten Ammitzbøl Lauridsen

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Ct Scans ◽

Lung Nodules ◽

Original Research ◽

Feature Based ◽

High Level ◽

Meta Analyses

The aim of this study was to provide an overview of the literature available on machine learning (ML) algorithms applied to the Lung Image Database Consortium Image Collection (LIDC-IDRI) database as a tool for the optimization of detecting lung nodules in thoracic CT scans. This systematic review was compiled according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Only original research articles concerning algorithms applied to the LIDC-IDRI database were included. The initial search yielded 1972 publications after removing duplicates, and 41 of these articles were included in this study. The articles were divided into two subcategories describing their overall architecture. The majority of feature-based algorithms achieved an accuracy >90% compared to the deep learning (DL) algorithms that achieved an accuracy in the range of 82.2%–97.6%. In conclusion, ML and DL algorithms are able to detect lung nodules with a high level of accuracy, sensitivity, and specificity using ML, when applied to an annotated archive of CT scans of the lung. However, there is no consensus on the method applied to determine the efficiency of ML algorithms.

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text

Exploring the Predictability of Temperatures in a Scaled Model of a Smarthome

Sensors ◽

10.3390/s21186052 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6052

Author(s):

Thomas Burns ◽

Gregory Fichthorn ◽

Jason Ling ◽

Sharare Zehtabian ◽

Salih S. Bacanlı ◽

...

Keyword(s):

Machine Learning ◽

Air Conditioning ◽

Smart Home ◽

Additional Data ◽

Machine Learning Algorithms ◽

Scaled Model ◽

Emergent Technologies ◽

Additional Information ◽

The Given ◽

Machine Learning Models

In modern smarthomes, temperature regulation is achieved through a mix of traditional and emergent technologies including air conditioning, heating, intelligent utilization of the effects of sun, wind, and shade as well as using stored heat and cold. To achieve the desired comfort for the inhabitants while minimizing environmental impact and cost, the home controller must predict how its actions will impact the temperature and other environmental factors in various parts of the home. The question we are investigating in this paper is whether the temperature values in different rooms in a home are predictable based on readings from sensors in the home. We are also interested in whether increased accuracy can be achieved by adding sensors to capture the state of doors and windows of the given room and/or the whole home, and what type of machine learning algorithms can take advantage of the additional information. As experimentation on real-world homes is highly expensive, we use ScaledHome, a 1:12 scale, IoT-enabled model of a smart home for data acquisition. Our experiments show that while additional data can improve the accuracy of the prediction, the type of machine learning models needs to be carefully adapted to the number of data features available.

Download Full-text

Detection of Phishing Websites using an Efficient Feature-Based Machine Learning Framework

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5909.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2857-2862

Keyword(s):

Machine Learning ◽

Personal Information ◽

Machine Learning Algorithms ◽

Sensitive Information ◽

Cyber Attack ◽

Learning Framework ◽

Internet Users ◽

User Data ◽

Feature Based ◽

Classification Prediction

Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.

Download Full-text

Product “In-Use” Context Identification Using Feature Learning Methods

Volume 1B: 36th Computers and Information in Engineering Conference ◽

10.1115/detc2016-59645 ◽

2016 ◽

Cited By ~ 2

Author(s):

Dipanjan D. Ghosh ◽

Andrew Olewnik ◽

Kemper Lewis

Keyword(s):

Machine Learning ◽

Product Design ◽

Learning Algorithms ◽

Feature Learning ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Product Usage ◽

Feature Based ◽

Product Interaction ◽

Usage Context

Usage context is considered a critical driving factor for customers’ product choices. In addition, the physical use of a product (i.e., user-product interaction) dictates a number of customer perceptions (e.g. level of comfort, ease-of-use or users’ physical fatigue). In the emerging Internet-of-Things (IoT), this work hypothesizes that it is possible to understand product usage while it is ‘in-use’ by capturing the user-product interaction data. Mining the data and understanding the comfort of the user adds a new dimension to the product design field. There has been tremendous progress in the field of data analytics, but the application in product design is still nascent. In this work, application of ‘feature learning’ methods for the identification of product usage context is demonstrated, where usage context is limited to the activity of the user. Two feature learning methods are applied for a walking activity classification using smartphone accelerometer data. Results are compared with feature-based machine learning algorithms (neural networks and support vector machines), and demonstrate the benefits of using the ‘feature learning’ methods over the feature based machine-learning algorithms.

Download Full-text

Neuro-Based Prognosticative Analytics for Parkinson Disease using Random Forest Approach

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7434.0991120 ◽

2020 ◽

Vol 9 (11) ◽

pp. 11-15

Keyword(s):

Machine Learning ◽

Random Forest ◽

Parkinson Disease ◽

Neurodegenerative Disorder ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Single Test ◽

Specificity And Sensitivity ◽

The World ◽

The Given

Parkinson’s malady is the most current neurodegenerative disorder poignant quite ten million folks across the world. There's no single test at which may be administered for diagnosis Parkinson’s malady. Our aim is to analyze machine learning based mostly techniques for Parkinson malady identification in patients. Our machine learning-based technique is employed to accurately predict the malady by speech and handwriting patterns of humans and by predicting leads to the shape of best accuracy and in addition compare the performance of assorted machine learning algorithms from the given hospital dataset with analysis and classification report and additionally determine the result and prove against with best accuracy and exactness, Recall ,F1 Score specificity and sensitivity.

Download Full-text

Build Orientation Optimization for Strength Enhancement of FDM Parts Using Machine Learning based Algorithm

10.31224/osf.io/3dh9s ◽

2019 ◽

Author(s):

Manoj Malviya ◽

Kaushal A Desai

Keyword(s):

Machine Learning ◽

Brute Force ◽

Initial Sample ◽

Bayesian Algorithm ◽

Build Orientation ◽

Optimization Framework ◽

Sample Data ◽

Artificial Neural Network Ann ◽

Brute Force Approach ◽

Orientation Optimization

The layered fabrication approach induces directional anisotropy and impacts mechanical strength of FDM components significantly. This paper proposes generalized machine learning based parameter optimization framework to determine optimal build orientation for FDM components. The algorithm determines ideal build orientation by maximizing the minimum Factor of Safety (FoS) for the component under prescribed loading conditions ensuring its even distribution. An Artificial Neural Network (ANN) coupled with Bayesian algorithm has been employed to accelerate the optimization process. The algorithm begins with an initial sample data collected using brute force approach; uses single layered ANN for approximation and optimization is achieved using Bayesian algorithm. A series of computational experiments considering five different test components has been devised to evaluate the performance and efficacy of the proposed algorithm. These experiments demonstrated that the proposed algorithm can determine the optimum building orientation effectively with certain limitations

Download Full-text