Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification

Human Computation ◽

10.15346/hc.v5i1.2 ◽

2018 ◽

Vol 5 ◽

pp. 13-30

Author(s):

Gloria Re Calegari ◽

Gioele Nasi ◽

Irene Celino

Keyword(s):

Machine Learning ◽

Image Classification ◽

Learning System ◽

Human Computation ◽

Experimental Comparison ◽

Learning Approaches ◽

Domain Experts ◽

To Come ◽

Combined Strategy

Image classification is a classical task heavily studied in computer vision and widely required in many concrete scientific and industrial scenarios. Is it better to rely on human eyes, thus asking people to classify pictures, or to train a machine learning system to automatically solve the task? The answer largely depends on the specific case and the required accuracy: humans may be more reliable - especially if they are domain experts - but automatic processing can be cheaper, even if less capable to demonstrate an "intelligent" behaviour.In this paper, we present an experimental comparison of different Human Computation and Machine Learning approaches to solve the same image classification task on a set of pictures used in light pollution research. We illustrate the adopted methods and the obtained results and we compare and contrast them in order to come up with a long term combined strategy to address the specific issue at scale: while it is hard to ensure a long-term engagement of users to exclusively rely on the Human Computation approach, the human classification is indispensable to overcome the "cold start" problem of automated data modelling.

Download Full-text

A Review of Recent Deep Learning Approaches in Human-Centered Machine Learning

Sensors ◽

10.3390/s21072514 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2514

Author(s):

Tharindu Kaluarachchi ◽

Andrew Reis ◽

Suranga Nanayakkara

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Review Paper ◽

Learning Systems ◽

Learning Approaches ◽

Application Development ◽

Research Gaps ◽

Domain Experts ◽

Working Definition ◽

Real World Application

After Deep Learning (DL) regained popularity recently, the Artificial Intelligence (AI) or Machine Learning (ML) field is undergoing rapid growth concerning research and real-world application development. Deep Learning has generated complexities in algorithms, and researchers and users have raised concerns regarding the usability and adoptability of Deep Learning systems. These concerns, coupled with the increasing human-AI interactions, have created the emerging field that is Human-Centered Machine Learning (HCML). We present this review paper as an overview and analysis of existing work in HCML related to DL. Firstly, we collaborated with field domain experts to develop a working definition for HCML. Secondly, through a systematic literature review, we analyze and classify 162 publications that fall within HCML. Our classification is based on aspects including contribution type, application area, and focused human categories. Finally, we analyze the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges, and presenting future HCML research opportunities.

Download Full-text

Exploring long-term trends in marine ecosystems: machine-learning approaches to global change biology

10.1109/metrosea52177.2021.9611584 ◽

2021 ◽

Author(s):

Domenico D'Alelio ◽

Salvatore Rampone ◽

Luigi Maria Cusano ◽

Nadia Sanseverino ◽

Luca Russo ◽

...

Keyword(s):

Machine Learning ◽

Global Change ◽

Marine Ecosystems ◽

Learning Approaches ◽

Global Change Biology ◽

Long Term Trends

Download Full-text

Learning similarity measures from data

Progress in Artificial Intelligence ◽

10.1007/s13748-019-00201-2 ◽

2019 ◽

Vol 9 (2) ◽

pp. 129-143 ◽

Cited By ~ 4

Author(s):

Bjørn Magnus Mathisen ◽

Agnar Aamodt ◽

Kerstin Bach ◽

Helge Langseth

Keyword(s):

Machine Learning ◽

Similarity Measure ◽

State Of The Art ◽

Similarity Measures ◽

Learning System ◽

Case Based Reasoning ◽

Training Time ◽

Domain Experts ◽

Trained Classifier ◽

Clustering Data

Abstract Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or a set of cases most similar to the query case. Describing a similarity measure analytically is challenging, even for domain experts working with CBR experts. However, datasets are typically gathered as part of constructing a CBR or machine learning system. These datasets are assumed to contain the features that correctly identify the solution from the problem features; thus, they may also contain the knowledge to construct or learn such a similarity measure. The main motivation for this work is to automate the construction of similarity measures using machine learning. Additionally, we would like to do this while keeping training time as low as possible. Working toward this, our objective is to investigate how to apply machine learning to effectively learn a similarity measure. Such a learned similarity measure could be used for CBR systems, but also for clustering data in semi-supervised learning, or one-shot learning tasks. Recent work has advanced toward this goal which relies on either very long training times or manually modeling parts of the similarity measure. We created a framework to help us analyze the current methods for learning similarity measures. This analysis resulted in two novel similarity measure designs: The first design uses a pre-trained classifier as basis for a similarity measure, and the second design uses as little modeling as possible while learning the similarity measure from data and keeping training time low. Both similarity measures were evaluated on 14 different datasets. The evaluation shows that using a classifier as basis for a similarity measure gives state-of-the-art performance. Finally, the evaluation shows that our fully data-driven similarity measure design outperforms state-of-the-art methods while keeping training time low.

Download Full-text

Research Trend of Causal Machine Learning Method: A Literature Review

IJID (International Journal on Informatics for Development) ◽

10.14421/ijid.2020.09208 ◽

2020 ◽

Vol 9 (2) ◽

pp. 111-118

Author(s):

Shindy Arti ◽

Indriana Hidayah ◽

Sri Suning Kusumawardhani

Keyword(s):

Machine Learning ◽

Learning System ◽

Research Trend ◽

Learning Approaches ◽

Algorithm Performance ◽

Combination Technique ◽

Causal Impact ◽

Learning Research ◽

The Relationship ◽

Selection Of

Machine learning is commonly used to predict and implement pattern recognition and the relationship between variables. Causal machine learning combines approaches for analyzing the causal impact of intervention on the result, asumming a considerably ambigous variables. The combination technique of causality and machine learning is adequate for predicting and understanding the cause and effect of the results. The aim of this study is a systematic review to identify which causal machine learning approaches are generally used. This paper focuses on what data characteristics are applied to causal machine learning research and how to assess the output of algorithms used in the context of causal machine learning research. The review paper analyzes 20 papers with various approaches. This study categorizes data characteristics based on the type of data, attribute value, and the data dimension. The Bayesian Network (BN) commonly used in the context of causality. Meanwhile, the propensity score is the most extensively used in causality research. The variable value will affect algorithm performance. This review can be as a guide in the selection of a causal machine learning system.

Download Full-text

Development of a Recognition System for Spraying Areas from Unmanned Aerial Vehicles Using a Machine Learning Approach

Sensors ◽

10.3390/s19020313 ◽

2019 ◽

Vol 19 (2) ◽

pp. 313 ◽

Cited By ~ 3

Author(s):

Pengbo Gao ◽

Yan Zhang ◽

Linhuan Zhang ◽

Ryozo Noguchi ◽

Tofael Ahamed

Keyword(s):

Machine Learning ◽

Field Experiments ◽

Recognition Accuracy ◽

Recognition System ◽

Learning System ◽

Computational Time ◽

Learning Approaches ◽

Online Systems ◽

Online Recognition ◽

Spray System

Unmanned aerial vehicle (UAV)-based spraying systems have recently become important for the precision application of pesticides, using machine learning approaches. Therefore, the objective of this research was to develop a machine learning system that has the advantages of high computational speed and good accuracy for recognizing spray and non-spray areas for UAV-based sprayers. A machine learning system was developed by using the mutual subspace method (MSM) for images collected from a UAV. Two target lands: agricultural croplands and orchard areas, were considered in building two classifiers for distinguishing spray and non-spray areas. The field experiments were conducted in target areas to train and test the system by using a commercial UAV (DJI Phantom 3 Pro) with an onboard 4K camera. The images were collected from low (5 m) and high (15 m) altitudes for croplands and orchards, respectively. The recognition system was divided into offline and online systems. In the offline recognition system, 74.4% accuracy was obtained for the classifiers in recognizing spray and non-spray areas for croplands. In the case of orchards, the average classifier recognition accuracy of spray and non-spray areas was 77%. On the other hand, the online recognition system performance had an average accuracy of 65.1% for croplands, and 75.1% for orchards. The computational time for the online recognition system was minimal, with an average of 0.0031 s for classifier recognition. The developed machine learning system had an average recognition accuracy of 70%, which can be implemented in an autonomous UAV spray system for recognizing spray and non-spray areas for real-time applications.

Download Full-text

An Experimental Comparison between Deep Learning and Classical Machine Learning Approaches for Writer Identification in Medieval Documents

Journal of Imaging ◽

10.3390/jimaging6090089 ◽

2020 ◽

Vol 6 (9) ◽

pp. 89

Author(s):

Nicole Dalia Cilia ◽

Claudio De Stefano ◽

Francesco Fontanella ◽

Claudio Marrocco ◽

Mario Molinara ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

Ad Hoc ◽

Digital Images ◽

Experimental Comparison ◽

Learning Approaches ◽

Test Bed ◽

Ancient Manuscripts ◽

Ancient Documents

In the framework of palaeography, the availability of both effective image analysis algorithms, and high-quality digital images has favored the development of new applications for the study of ancient manuscripts and has provided new tools for decision-making support systems. The quality of the results provided by such applications, however, is strongly influenced by the selection of effective features, which should be able to capture the distinctive aspects to which the paleography expert is interested in. This process is very difficult to generalize due to the enormous variability in the type of ancient documents, produced in different historical periods with different languages and styles. The effect is that it is very difficult to define standard techniques that are general enough to be effectively used in any case, and this is the reason why ad-hoc systems, generally designed according to paleographers’ suggestions, have been designed for the analysis of ancient manuscripts. In recent years, there has been a growing scientific interest in the use of techniques based on deep learning (DL) for the automatic processing of ancient documents. This interest is not only due to their capability of designing high-performance pattern recognition systems, but also to their ability of automatically extracting features from raw data, without using any a priori knowledge. Moving from these considerations, the aim of this study is to verify if DL-based approaches may actually represent a general methodology for automatically designing machine learning systems for palaeography applications. To this purpose, we compared the performance of a DL-based approach with that of a “classical” machine learning one, in a particularly unfavorable case for DL, namely that of highly standardized schools. The rationale of this choice is to compare the obtainable results even when context information is present and discriminating: this information is ignored by DL approaches, while it is used by machine learning methods, making the comparison more significant. The experimental results refer to the use of a large sets of digital images extracted from an entire 12th-century Bibles, the “Avila Bible”. This manuscript, produced by several scribes who worked in different periods and in different places, represents a severe test bed to evaluate the efficiency of scribe identification systems.

Download Full-text

Comparative Study of Machine Learning Approaches for Predicting Creep Behavior of Polyurethane Elastomer

Polymers ◽

10.3390/polym13111768 ◽

2021 ◽

Vol 13 (11) ◽

pp. 1768

Author(s):

Chunhao Yang ◽

Wuning Ma ◽

Jianlin Zhong ◽

Zhendong Zhang

Keyword(s):

Machine Learning ◽

Support Vector ◽

Polyurethane Elastomer ◽

Learning Approaches ◽

Creep Stress ◽

Machine Learning Approach ◽

Creep Time ◽

Multilayer Perceptron Network ◽

Testing Set

The long-term mechanical properties of viscoelastic polymers are among their most important aspects. In the present research, a machine learning approach was proposed for creep properties’ prediction of polyurethane elastomer considering the effect of creep time, creep temperature, creep stress and the hardness of the material. The approaches are based on multilayer perceptron network, random forest and support vector machine regression, respectively. While the genetic algorithm and k-fold cross-validation were used to tune the hyper-parameters. The results showed that the three models all proposed excellent fitting ability for the training set. Moreover, the three models had different prediction capabilities for the testing set by focusing on various changing factors. The correlation coefficient values between the predicted and experimental strains were larger than 0.913 (mostly larger than 0.998) on the testing set when choosing the reasonable model.

Download Full-text

Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods

Applied Sciences ◽

10.3390/app12010060 ◽

2021 ◽

Vol 12 (1) ◽

pp. 60

Author(s):

Samuel Ndichu ◽

Sangwook Kim ◽

Seiichi Ozawa ◽

Tao Ban ◽

Takeshi Takahashi ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ensemble Methods ◽

Learning Approaches ◽

Rule Mining ◽

Web Based ◽

Tree Form ◽

Domain Experts ◽

Domain Names ◽

Ensemble Machine Learning

Attacks using Uniform Resource Locators (URLs) and their JavaScript (JS) code content to perpetrate malicious activities on the Internet are rampant and continuously evolving. Methods such as blocklisting, client honeypots, domain reputation inspection, and heuristic and signature-based systems are used to detect these malicious activities. Recently, machine learning approaches have been proposed; however, challenges still exist. First, blocklist systems are easily evaded by new URLs and JS code content, obfuscation, fast-flux, cloaking, and URL shortening. Second, heuristic and signature-based systems do not generalize well to zero-day attacks. Third, the Domain Name System allows cybercriminals to easily migrate their malicious servers to hide their Internet protocol addresses behind domain names. Finally, crafting fully representative features is challenging, even for domain experts. This study proposes a feature selection and classification approach for malicious JS code content using Shapley additive explanations and tree ensemble methods. The JS code features are obtained from the Abstract Syntax Tree form of the JS code, sample JS attack codes, and association rule mining. The malicious and benign JS code datasets obtained from Hynek Petrak and the Majestic Million Service were used for performance evaluation. We compared the performance of the proposed method to those of other feature selection methods in the task of malicious JS code content detection. With a recall of 0.9989, our experimental results show that the proposed approach is a better prediction model.

Download Full-text

Comparison of Empirical Mode Decomposition, Wavelets, and Different Machine Learning Approaches for Patient-Specific Seizure Detection Using Signal-Derived Empirical Dictionary Approach

Frontiers in Digital Health ◽

10.3389/fdgth.2021.738996 ◽

2021 ◽

Vol 3 ◽

Author(s):

Muhammad Kaleem ◽

Aziz Guergachi ◽

Sridhar Krishnan

Keyword(s):

Machine Learning ◽

Seizure Detection ◽

Reconstruction Error ◽

Data Driven ◽

Patient Specific ◽

Learning Approaches ◽

Mode Decomposition ◽

Eeg Data ◽

Classifier Training

Analysis of long-term multichannel EEG signals for automatic seizure detection is an active area of research that has seen application of methods from different domains of signal processing and machine learning. The majority of approaches developed in this context consist of extraction of hand-crafted features that are used to train a classifier for eventual seizure detection. Approaches that are data-driven, do not use hand-crafted features, and use small amounts of patients' historical EEG data for classifier training are few in number. The approach presented in this paper falls in the latter category, and is based on a signal-derived empirical dictionary approach, which utilizes empirical mode decomposition (EMD) and discrete wavelet transform (DWT) based dictionaries learned using a framework inspired by traditional methods of dictionary learning. Three features associated with traditional dictionary learning approaches, namely projection coefficients, coefficient vector and reconstruction error, are extracted from both EMD and DWT based dictionaries for automated seizure detection. This is the first time these features have been applied for automatic seizure detection using an empirical dictionary approach. Small amounts of patients' historical multi-channel EEG data are used for classifier training, and multiple classifiers are used for seizure detection using newer data. In addition, the seizure detection results are validated using 5-fold cross-validation to rule out any bias in the results. The CHB-MIT benchmark database containing long-term EEG recordings of pediatric patients is used for validation of the approach, and seizure detection performance comparable to the state-of-the-art is obtained. Seizure detection is performed using five classifiers, thereby allowing a comparison of the dictionary approaches, features extracted, and classifiers used. The best seizure detection performance is obtained using EMD based dictionary and reconstruction error feature and support vector machine classifier, with accuracy, sensitivity and specificity values of 88.2, 90.3, and 88.1%, respectively. Comparison is also made with other recent studies using the same database. The methodology presented in this paper is shown to be computationally efficient and robust for patient-specific automatic seizure detection. A data-driven methodology utilizing a small amount of patients' historical data is hence demonstrated as a practical solution for automatic seizure detection.

Download Full-text