SELM: Software Engineering of Machine Learning Models

Mapping Intimacies ◽

10.3233/faia210007 ◽

2021 ◽

Author(s):

Nafiseh Jafari ◽

Mohammad Reza Besharati ◽

Maryam Hourali

Keyword(s):

Machine Learning ◽

Software Engineering ◽

Interdisciplinary Approach ◽

Interdisciplinary Teams ◽

Process Efficiency ◽

Training Dataset ◽

Learning Models ◽

Machine Learning Model ◽

Machine Learning Models

One of the pillars of any machine learning model is its concepts. Using software engineering, we can engineer these concepts and then develop and expand them. In this article, we present a SELM framework for Software Engineering of machine Learning Models. We then evaluate this framework through a case study. Using the SELM framework, we can improve a machine learning process efficiency and provide more accuracy in learning with less processing hardware resources and a smaller training dataset. This issue highlights the importance of an interdisciplinary approach to machine learning. Therefore, in this article, we have provided interdisciplinary teams’ proposals for machine learning.

Download Full-text

Sargassum Detection Using Machine Learning Models: A Case Study with the First 6 Months of GOCI-II Imagery

Remote Sensing ◽

10.3390/rs13234844 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4844

Author(s):

Jisun Shin ◽

Jong-Seok Lee ◽

Lee-Hyun Jang ◽

Jinwook Lim ◽

Boo-Keun Khim ◽

...

Keyword(s):

Machine Learning ◽

Ground Truth ◽

Support Vector ◽

Atmospheric Conditions ◽

Learning Models ◽

Adaptive Boosting ◽

Machine Learning Model ◽

High Resolution Images ◽

Machine Learning Models

A record-breaking agglomeration of Sargassum was packed along the northern Jeju coast in Korea in 2021, and laborers suffered from removing them from the beach. If remote sensing can be used to detect the locations at which Sargassum accumulated in a timely and accurate manner, we could remove them before their arrival and reduce the damage caused by Sargassum. This study aims to detect Sargassum distribution on the coast of Jeju Island using the Geostationary KOMPSAT 2B (GK2B) Geostationary Ocean Color Imager-II (GOCI-II) imagery that was launched in February 2020, with measurements available since October 2020. For this, we used GOCI-II imagery during the first 6 months and machine learning models including Fine Tree, a Fine Gaussian support vector machine (SVM), and Gentle adaptive boosting (GentleBoost). We trained the models with the GOCI-II Rayleigh-corrected reflectance (RhoC) image and a ground truth map extracted from high-resolution images as input and output, respectively. Qualitative and quantitative assessments were carried out using the three machine learning models and traditional methods such as Sargassum indexes. We found that GentleBoost showed a lower false positive (6.2%) and a high F-measure level (0.82), and a more appropriate Sargassum distribution compared to other methods. The application of the machine learning model to GOCI-II images in various atmospheric conditions is therefore considered successful for mapping Sargassum extent quickly, enabling reduction of laborers’ efforts to remove them.

Download Full-text

A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering ◽

10.1145/3368089.3417063 ◽

2020 ◽

Author(s):

Zi Peng ◽

Jinqiu Yang ◽

Tse-Hsun (Peter) Chen ◽

Lei Ma

Keyword(s):

Machine Learning ◽

Autonomous Driving ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Enhancing the understanding of hydrological responses induced by ecological water replenishment using improved machine learning models: A case study in Yongding River

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.145489 ◽

2021 ◽

Vol 768 ◽

pp. 145489

Author(s):

Kangning Sun ◽

Litang Hu ◽

Jianli Guo ◽

Zhengqiu Yang ◽

Yuanzheng Zhai ◽

...

Keyword(s):

Machine Learning ◽

Learning Models ◽

Hydrological Responses ◽

Yongding River ◽

Machine Learning Models

Download Full-text

Comparison of two optimized machine learning models for predicting displacement of rainfall-induced landslide: A case study in Sichuan Province, China

Engineering Geology ◽

10.1016/j.enggeo.2017.01.022 ◽

2017 ◽

Vol 218 ◽

pp. 213-222 ◽

Cited By ~ 29

Author(s):

Xing Zhu ◽

Qiang Xu ◽

Minggao Tang ◽

Wen Nie ◽

Shuqi Ma ◽

...

Keyword(s):

Machine Learning ◽

Sichuan Province ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Validation Machine Learning Models To Predict Score On Graduate Tests Based On High School Test And Other Factors, Case Study: Colombia.

10.18687/laccei2021.1.1.343 ◽

2021 ◽

Author(s):

Maryori Sabalza Mejia ◽

Carolina Campillo Jimenez ◽

Juan Carlos Martinez Santos

Keyword(s):

Machine Learning ◽

High School ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Severity Analysis of Heavy Vehicle Crashes Using Machine Learning Models: A Case Study in New Jersey

International Conference on Transportation and Development 2021 ◽

10.1061/9780784483534.025 ◽

2021 ◽

Author(s):

Ahmed Sajid Hasan ◽

Md. Asif Bin Kabir ◽

Mohammad Jalayer

Keyword(s):

Machine Learning ◽

New Jersey ◽

Heavy Vehicle ◽

Vehicle Crashes ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Comparison of machine learning models based on time domain and frequency domain features for faults diagnosis in rotating machines

MATEC Web of Conferences ◽

10.1051/matecconf/201821117009 ◽

2018 ◽

Vol 211 ◽

pp. 17009

Author(s):

Natalia Espinoza Sepulveda ◽

Jyoti Sinha

Keyword(s):

Machine Learning ◽

Frequency Domain ◽

Time Domain ◽

Intelligent Systems ◽

Learning Models ◽

Machine Vibration ◽

Vibration Data ◽

Machine Learning Model ◽

The Time Domain ◽

Machine Learning Models

The development of technologies for the maintenance industry has taken an important role to meet the demanding challenges. One of the important challenges is to predict the defects, if any, in machines as early as possible to manage the machines downtime. The vibration-based condition monitoring (VCM) is well-known for this purpose but requires the human experience and expertise. The machine learning models using the intelligent systems and pattern recognition seem to be the future avenue for machine fault detection without the human expertise. Several such studies are published in the literature. This paper is also on the machine learning model for the different machine faults classification and detection. Here the time domain and frequency domain features derived from the measured machine vibration data are used separated in the development of the machine learning models using the artificial neutral network method. The effectiveness of both the time and frequency domain features based models are compared when they are applied to an experimental rig. The paper presents the proposed machine learning models and their performance in terms of the observations and results.

Download Full-text

Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis

Diagnostics ◽

10.3390/diagnostics12010040 ◽

2021 ◽

Vol 12 (1) ◽

pp. 40

Author(s):

Meike Nauta ◽

Ricky Walsh ◽

Adam Dubowski ◽

Christin Seifert

Keyword(s):

Machine Learning ◽

Clinical Practice ◽

Skin Cancer ◽

Cancer Diagnosis ◽

Image Inpainting ◽

Relevant Information ◽

Black Box ◽

Training Dataset ◽

Learning Models ◽

Machine Learning Models

Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.

Download Full-text

Automated Retraining of Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3322.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 445-452

Keyword(s):

Machine Learning ◽

Input Data ◽

Research Work ◽

Learning Models ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Crucial Component ◽

Conventional Machine ◽

Over Time ◽

Machine Learning Models

Data is the most crucial component of a successful ML system. Once a machine learning model is developed, it gets obsolete over time due to presence of new input data being generated every second. In order to keep our predictions accurate we need to find a way to keep our models up to date. Our research work involves finding a mechanism which can retrain the model with new data automatically. This research also involves exploring the possibilities of automating machine learning processes. We started this project by training and testing our model using conventional machine learning methods. The outcome was then compared with the outcome of those experiments conducted using the AutoML methods like TPOT. This helped us in finding an efficient technique to retrain our models. These techniques can be used in areas where people do not deal with the actual working of a ML model but only require the outputs of ML processes

Download Full-text

Explainable machine learning: A case study on impedance tube measurements

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2342 ◽

2021 ◽

Vol 263 (3) ◽

pp. 3223-3234

Author(s):

Merten Stender ◽

Mathies Wedler ◽

Norbert Hoffmann ◽

Christian Adams

Keyword(s):

Machine Learning ◽

Absorption Coefficient ◽

Specimen Thickness ◽

Learning Models ◽

Impedance Tube ◽

Frequency Regime ◽

Hidden Patterns ◽

Model Diagnosis ◽

Machine Learning Models

Machine learning (ML) techniques allow for finding hidden patterns and signatures in data. Currently, these methods are gaining increased interest in engineering in general and in vibroacoustics in particular. Although ML methods are successfully applied, it is hardly understood how these black box-type methods make their decisions. Explainable machine learning aims at overcoming this issue by deepening the understanding of the decision-making process through perturbation-based model diagnosis. This paper introduces machine learning methods and reviews recent techniques for explainability and interpretability. These methods are exemplified on sound absorption coefficient spectra of one sound absorbing foam material measured in an impedance tube. Variances of the absorption coefficient measurements as a function of the specimen thickness and the operator are modeled by univariate and multivariate machine learning models. In order to identify the driving patterns, i.e. how and in which frequency regime the measurements are affected by the setup specifications, Shapley additive explanations are derived for the ML models. It is demonstrated how explaining machine learning models can be used to discover and express complicated relations in experimental data, thereby paving the way to novel knowledge discovery strategies in evidence-based modeling.

Download Full-text