A Unified Definition of Mutual Information with Applications in Machine Learning

There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.

Download Full-text

Fintech Credit Scoring Techniques for Evaluating P2P Loan Applications – A Python Machine Learning Ensemble Approach

International Journal of Smart Business and Technology ◽

10.21742/ijsbt.2018.6.1.04 ◽

2018 ◽

Vol 6 (1) ◽

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Ensemble Approach

Download Full-text

A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201954 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9471-9484

Author(s):

Yilun Jin ◽

Yanan Liu ◽

Wenyu Zhang ◽

Shuai Zhang ◽

Yu Lou

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Credit Scoring ◽

Imbalanced Data ◽

Ensemble Model ◽

Selective Sampling ◽

Machine Learning Methods ◽

Multi Stage ◽

Proposed Model ◽

New Feature

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

Beneficial and harmful explanatory machine learning

Machine Learning ◽

10.1007/s10994-020-05941-0 ◽

2021 ◽

Author(s):

Lun Ai ◽

Stephen H. Muggleton ◽

Céline Hocquette ◽

Mark Gromowski ◽

Ute Schmid

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Beneficial Effect ◽

Empirical Evidence ◽

Task Performance ◽

Human Performance ◽

Human Learning ◽

Science Literature ◽

Definition Of ◽

Self Learning

AbstractGiven the recent successes of Deep Learning in AI there has been increased interest in the role and need for explanations in machine learned theories. A distinct notion in this context is that of Michie’s definition of ultra-strong machine learning (USML). USML is demonstrated by a measurable increase in human performance of a task following provision to the human of a symbolic machine learned theory for task performance. A recent paper demonstrates the beneficial effect of a machine learned logic theory for a classification task, yet no existing work to our knowledge has examined the potential harmfulness of machine’s involvement for human comprehension during learning. This paper investigates the explanatory effects of a machine learned theory in the context of simple two person games and proposes a framework for identifying the harmfulness of machine explanations based on the Cognitive Science literature. The approach involves a cognitive window consisting of two quantifiable bounds and it is supported by empirical evidence collected from human trials. Our quantitative and qualitative results indicate that human learning aided by a symbolic machine learned theory which satisfies a cognitive window has achieved significantly higher performance than human self learning. Results also demonstrate that human learning aided by a symbolic machine learned theory that fails to satisfy this window leads to significantly worse performance than unaided human learning.

Download Full-text

Application of explainable machine learning based on Catboost in credit scoring

Journal of Physics Conference Series ◽

10.1088/1742-6596/1955/1/012039 ◽

2021 ◽

Vol 1955 (1) ◽

pp. 012039

Author(s):

Ji Qi ◽

Ruicheng Yang ◽

Pucong Wang

Keyword(s):

Machine Learning ◽

Credit Scoring

Download Full-text

Conditional Rényi Entropy and the Relationships between Rényi Capacities

Entropy ◽

10.3390/e22050526 ◽

2020 ◽

Vol 22 (5) ◽

pp. 526

Author(s):

Gautam Aishwarya ◽

Mokshay Madiman

Keyword(s):

Mutual Information ◽

Renyi Entropy ◽

Rényi Entropy ◽

Basic Properties ◽

Reference Measure ◽

Definition Of ◽

Discrete Setting

The analogues of Arimoto’s definition of conditional Rényi entropy and Rényi mutual information are explored for abstract alphabets. These quantities, although dependent on the reference measure, have some useful properties similar to those known in the discrete setting. In addition to laying out some such basic properties and the relations to Rényi divergences, the relationships between the families of mutual informations defined by Sibson, Augustin-Csiszár, and Lapidoth-Pfister, as well as the corresponding capacities, are explored.

Download Full-text

Estimating Discrete Joint Probability Distributions for Demographic Characteristics at the Store Level Given Store Level Marginal Distributions and a City-Wide Joint Distribution

Quantitative Marketing and Economics ◽

10.1007/s11129-005-0259-9 ◽

2005 ◽

Vol 3 (1) ◽

pp. 71-93 ◽

Cited By ~ 9

Author(s):

Charles J. Romeo

Keyword(s):

Joint Distribution ◽

Probability Distributions ◽

Joint Probability ◽

Demographic Characteristics ◽

Marginal Distributions ◽

Joint Probability Distributions

Download Full-text

Evaluation of invulnerability of scale-free networks using total information of local sub-graph

International Journal of Modern Physics C ◽

10.1142/s0129183118500754 ◽

2018 ◽

Vol 29 (08) ◽

pp. 1850075

Author(s):

Tingyuan Nie ◽

Xinling Guo ◽

Mengda Lin ◽

Kun Zhao

Keyword(s):

Mutual Information ◽

Complex Network ◽

Fundamental Problem ◽

The Self ◽

Practical Significance ◽

Scale Free ◽

Total Information ◽

Scale Free Networks ◽

Influential Nodes ◽

Definition Of

The quantification for the invulnerability of complex network is a fundamental problem in which identifying influential nodes is of theoretical and practical significance. In this paper, we propose a novel definition of centrality named total information (TC) which derives from a local sub-graph being constructed by a node and its neighbors. The centrality is then defined as the sum of the self-information of the node and the mutual information of its neighbor nodes. We use the proposed centrality to identify the importance of nodes through the evaluation of the invulnerability of scale-free networks. It shows both the efficiency and the effectiveness of the proposed centrality are improved, compared with traditional centralities.

Download Full-text

Algorithmic fairness in credit scoring

Oxford Review of Economic Policy ◽

10.1093/oxrep/grab020 ◽

2021 ◽

Vol 37 (3) ◽

pp. 585-617

Author(s):

Teresa Bono ◽

Karen Croxson ◽

Adam Giles

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Large Data ◽

Error Rates ◽

The Past ◽

Ensemble Machine Learning ◽

Hidden Patterns ◽

Credit Scoring Model ◽

Distributional Impacts ◽

Modelling Approach

Abstract The use of machine learning as an input into decision-making is on the rise, owing to its ability to uncover hidden patterns in large data and improve prediction accuracy. Questions have been raised, however, about the potential distributional impacts of these technologies, with one concern being that they may perpetuate or even amplify human biases from the past. Exploiting detailed credit file data for 800,000 UK borrowers, we simulate a switch from a traditional (logit) credit scoring model to ensemble machine-learning methods. We confirm that machine-learning models are more accurate overall. We also find that they do as well as the simpler traditional model on relevant fairness criteria, where these criteria pertain to overall accuracy and error rates for population subgroups defined along protected or sensitive lines (gender, race, health status, and deprivation). We do observe some differences in the way credit-scoring models perform for different subgroups, but these manifest under a traditional modelling approach and switching to machine learning neither exacerbates nor eliminates these issues. The paper discusses some of the mechanical and data factors that may contribute to statistical fairness issues in the context of credit scoring.

Download Full-text

Central Limit Theorems for Interchangeable Processes

Canadian Journal of Mathematics ◽

10.4153/cjm-1958-026-0 ◽

1958 ◽

Vol 10 ◽

pp. 222-229 ◽

Cited By ~ 40

Author(s):

J. R. Blum ◽

H. Chernoff ◽

M. Rosenblatt ◽

H. Teicher

Keyword(s):

Stochastic Process ◽

Probability Measure ◽

Central Limit ◽

Joint Distribution ◽

Limit Theorems ◽

Random Variables ◽

Central Limit Theorems ◽

Probability Measures ◽

Image Position ◽

Positive Integers

Let {Xn} (n = 1, 2 , …) be a stochastic process. The random variables comprising it or the process itself will be said to be interchangeable if, for any choice of distinct positive integers i 1, i 2, H 3 … , ik, the joint distribution of depends merely on k and is independent of the integers i 1, i 2, … , i k. It was shown by De Finetti (3) that the probability measure for any interchangeable process is a mixture of probability measures of processes each consisting of independent and identically distributed random variables.

Download Full-text