scholarly journals Differential Replication for Credit Scoring in Regulated Environments

Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 407
Author(s):  
Irene Unceta ◽  
Jordi Nin ◽  
Oriol Pujol

Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solutions.

Author(s):  
Julien Siebert ◽  
Lisa Joeckel ◽  
Jens Heidrich ◽  
Adam Trendowicz ◽  
Koji Nakamichi ◽  
...  

AbstractNowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its components (such as ISO/IEC 25010). Due to the different nature of ML, we have to re-interpret existing qualities for ML systems or add new ones (such as trustworthiness). We have to be very precise about which quality property is relevant for which entity of interest (such as completeness of training data or correctness of trained model), and how to objectively evaluate adherence to quality requirements. In this article, we present how to systematically construct quality models for ML systems based on an industrial use case. This quality model enables practitioners to specify and assess qualities for ML systems objectively. In addition to the overall construction process described, the main outcomes include a meta-model for specifying quality models for ML systems, reference elements regarding relevant views, entities, quality properties, and measures for ML systems based on existing research, an example instantiation of a quality model for a concrete industrial use case, and lessons learned from applying the construction process. We found that it is crucial to follow a systematic process in order to come up with measurable quality properties that can be evaluated in practice. In the future, we want to learn how the term quality differs between different types of ML systems and come up with reference quality models for evaluating qualities of ML systems.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Alberto Hernandez ◽  
Adarsh Balasubramanian ◽  
Fenglin Yuan ◽  
Simon A. M. Mason ◽  
Tim Mueller

AbstractThe length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine-learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine-learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient many-body potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton-Chen embedded-atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded-atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed.


Author(s):  
Jim Wong ◽  
Laurence Fung ◽  
Tom Fong ◽  
Angela Sze

2008 ◽  
pp. 132-156 ◽  
Author(s):  
Jim Wong ◽  
Laurence Kang-Por Fung ◽  
Tom Pak-Wing Fong ◽  
Angela Sze

2010 ◽  
Vol 25 (5) ◽  
pp. 647-669 ◽  
Author(s):  
Mercury Wai-Yin Tam ◽  
Eddie Hui ◽  
Xian Zheng

Sign in / Sign up

Export Citation Format

Share Document