Enhancing robustness of machine learning systems via data transformations

Data collection and machine learning are changing the world. Whether it is medicine, sports or education, companies and institutions are investing a lot of time and money in systems that gather, process and analyse data. Likewise, to improve competitiveness, a lot of countries are making changes to their educational policy by supporting STEM disciplines. Therefore, it’s important to put effort into using various data sources to help students succeed in STEM. In this paper, we present a platform that can analyse student’s activity on various contest and e-learning systems, combine and process the data, and then present it in various ways that are easy to understand. This in turn enables teachers and organizers to recognize talented and hardworking students, identify issues, and/or motivate students to practice and work on areas where they’re weaker.

Download Full-text

Development of Machine Learning Models to Predict Compressed Sward Height in Walloon Pastures Based on Sentinel-1, Sentinel-2 and Meteorological Data Using Multiple Data Transformations

Remote Sensing ◽

10.3390/rs13030408 ◽

2021 ◽

Vol 13 (3) ◽

pp. 408

Author(s):

Charles Nickmilder ◽

Anthony Tedde ◽

Isabelle Dufrasne ◽

Françoise Lessire ◽

Bernard Tychon ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Support System ◽

Cross Validation ◽

Learning Models ◽

Data Transformations ◽

Independent Validation ◽

Sward Height ◽

Machine Learning Models ◽

Sentinel 2

Accurate information about the available standing biomass on pastures is critical for the adequate management of grazing and its promotion to farmers. In this paper, machine learning models are developed to predict available biomass expressed as compressed sward height (CSH) from readily accessible meteorological, optical (Sentinel-2) and radar satellite data (Sentinel-1). This study assumed that combining heterogeneous data sources, data transformations and machine learning methods would improve the robustness and the accuracy of the developed models. A total of 72,795 records of CSH with a spatial positioning, collected in 2018 and 2019, were used and aggregated according to a pixel-like pattern. The resulting dataset was split into a training one with 11,625 pixellated records and an independent validation one with 4952 pixellated records. The models were trained with a 19-fold cross-validation. A wide range of performances was observed (with mean root mean square error (RMSE) of cross-validation ranging from 22.84 mm of CSH to infinite-like values), and the four best-performing models were a cubist, a glmnet, a neural network and a random forest. These models had an RMSE of independent validation lower than 20 mm of CSH at the pixel-level. To simulate the behavior of the model in a decision support system, performances at the paddock level were also studied. These were computed according to two scenarios: either the predictions were made at a sub-parcel level and then aggregated, or the data were aggregated at the parcel level and the predictions were made for these aggregated data. The results obtained in this study were more accurate than those found in the literature concerning pasture budgeting and grassland biomass evaluation. The training of the 124 models resulting from the described framework was part of the realization of a decision support system to help farmers in their daily decision making.

Download Full-text

A Review of Recent Deep Learning Approaches in Human-Centered Machine Learning

Sensors ◽

10.3390/s21072514 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2514

Author(s):

Tharindu Kaluarachchi ◽

Andrew Reis ◽

Suranga Nanayakkara

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Review Paper ◽

Learning Systems ◽

Learning Approaches ◽

Application Development ◽

Research Gaps ◽

Domain Experts ◽

Working Definition ◽

Real World Application

After Deep Learning (DL) regained popularity recently, the Artificial Intelligence (AI) or Machine Learning (ML) field is undergoing rapid growth concerning research and real-world application development. Deep Learning has generated complexities in algorithms, and researchers and users have raised concerns regarding the usability and adoptability of Deep Learning systems. These concerns, coupled with the increasing human-AI interactions, have created the emerging field that is Human-Centered Machine Learning (HCML). We present this review paper as an overview and analysis of existing work in HCML related to DL. Firstly, we collaborated with field domain experts to develop a working definition for HCML. Secondly, through a systematic literature review, we analyze and classify 162 publications that fall within HCML. Our classification is based on aspects including contribution type, application area, and focused human categories. Finally, we analyze the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges, and presenting future HCML research opportunities.

Download Full-text

Explainable AI: A Review of Machine Learning Interpretability Methods

Entropy ◽

10.3390/e23010018 ◽

2020 ◽

Vol 23 (1) ◽

pp. 18

Author(s):

Pantelis Linardatos ◽

Vasilis Papastefanopoulos ◽

Sotiris Kotsiantis

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Black Box ◽

Learning Systems ◽

Model Complexity ◽

Learning Models ◽

New Methods ◽

Industrial Adoption ◽

Machine Learning Models ◽

The Way

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

Download Full-text

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) ◽

10.1109/icse43902.2021.00033 ◽

2021 ◽

Author(s):

Yiming Tang ◽

Raffi Khatchadourian ◽

Mehdi Bagherzadeh ◽

Rhia Singh ◽

Ajani Stewart ◽

...

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Learning Systems ◽

Technical Debt

Download Full-text

Derivation of Constituent Problem Characteristics for the Application of Machine Learning Systems

10.1109/icict52872.2021.00014 ◽

2021 ◽

Author(s):

Gunther Schuh ◽

Paul Scholz ◽

Timon Burger

Keyword(s):

Machine Learning ◽

Learning Systems

Download Full-text

Cardiovascular Diseases Classification Via Machine Learning Systems

10.1109/3ict53449.2021.9581384 ◽

2021 ◽

Author(s):

Fadheela Hussain ◽

Mustafa Hammad ◽

Wael El-Medany ◽

Riadh Ksantini

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Learning Systems

Download Full-text

From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems

Product-Focused Software Process Improvement - Lecture Notes in Computer Science ◽

10.1007/978-3-030-64148-1_12 ◽

2020 ◽

pp. 185-201

Author(s):

Lucy Ellen Lwakatare ◽

Ivica Crnkovic ◽

Ellinor Rånge ◽

Jan Bosch

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Systems ◽

Continuous Delivery

Download Full-text

Contextual Integrity Up and Down the Data Food Chain

Theoretical Inquiries in Law ◽

10.1515/til-2019-0008 ◽

2019 ◽

Vol 20 (1) ◽

pp. 221-256 ◽

Cited By ~ 9

Author(s):

Helen Nissenbaum

Keyword(s):

Machine Learning ◽

Food Chain ◽

Lower Order ◽

Higher Order ◽

Learning Systems ◽

Civic Life ◽

Information Flows ◽

Contextual Integrity ◽

Information Type ◽

Motion Detectors

Abstract According to the theory of contextual integrity (CI), privacy norms prescribe information flows with reference to five parameters — sender, recipient, subject, information type, and transmission principle. Because privacy is grasped contextually (e.g., health, education, civic life, etc.), the values of these parameters range over contextually meaningful ontologies — of information types (or topics) and actors (subjects, senders, and recipients), in contextually defined capacities. As an alternative to predominant approaches to privacy, which were ineffective against novel information practices enabled by IT, CI was able both to pinpoint sources of disruption and provide grounds for either accepting or rejecting them. Mounting challenges from a burgeoning array of networked, sensor-enabled devices (IoT) and data-ravenous machine learning systems, similar in form though magnified in scope, call for renewed attention to theory. This Article introduces the metaphor of a data (food) chain to capture the nature of these challenges. With motion up the chain, where data of higher order is inferred from lower-order data, the crucial question is whether privacy norms governing lower-order data are sufficient for the inferred higher-order data. While CI has a response to this question, a greater challenge comes from data primitives, such as digital impulses of mouse clicks, motion detectors, and bare GPS coordinates, because they appear to have no meaning. Absent a semantics, they escape CI’s privacy norms entirely.

Download Full-text

Machine Learning at the Network Edge: A Survey

ACM Computing Surveys ◽

10.1145/3469029 ◽

2022 ◽

Vol 54 (8) ◽

pp. 1-37

Author(s):

M. G. Sarwar Murshed ◽

Christopher Murphy ◽

Daqing Hou ◽

Nazar Khan ◽

Ganesh Ananthanarayanan ◽

...

Keyword(s):

Machine Learning ◽

Learning Systems ◽

Major Research ◽

Sensors And Actuators ◽

Computing Systems ◽

Privacy Concerns ◽

Iot Devices ◽

Cloud Servers ◽

Typical Solution ◽

Operational Aspects

Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous in recent years. This has led to the generation of large quantities of data in real-time, which is an appealing target for AI systems. However, deploying machine learning models on such end-devices is nearly impossible. A typical solution involves offloading data to external computing systems (such as cloud servers) for further processing but this worsens latency, leads to increased communication costs, and adds to privacy concerns. To address this issue, efforts have been made to place additional computing devices at the edge of the network, i.e., close to the IoT devices where the data is generated. Deploying machine learning systems on such edge computing devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning systems have been deployed at the edge of computer networks, focusing on the operational aspects including compression techniques, tools, frameworks, and hardware used in successful applications of intelligent edge systems.

Download Full-text