Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination

The increasing impact of algorithmic decisions on people’s lives compels us to scrutinize their fairness and, in particular, the disparate impacts that ostensibly color-blind algorithms can have on different groups. Examples include credit decisioning, hiring, advertising, criminal justice, personalized medicine, and targeted policy making, where in some cases legislative or regulatory frameworks for fairness exist and define specific protected classes. In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data. This is particularly a problem in lending and healthcare. We consider the use of an auxiliary data set, such as the U.S. census, to construct models that predict the protected class from proxy variables, such as surname and geolocation. We show that even with such data, a variety of common disparity measures are generally unidentifiable, providing a new perspective on the documented biases of popular proxy-based methods. We provide exact characterizations of the tightest possible set of all possible true disparities that are consistent with the data (and possibly additional assumptions). We further provide optimization-based algorithms for computing and visualizing these sets and statistical tools to assess sampling uncertainty. Together, these enable reliable and robust assessments of disparities—an important tool when disparity assessment can have far-reaching policy implications. We demonstrate this in two case studies with real data: mortgage lending and personalized medicine dosing. This paper was accepted by Hamid Nazerzadeh, Guest Editor for the Special Issue on Data-Driven Prescriptive Analytics.

Download Full-text

Discovery of Knowledge by using Data Warehousing as well as ETL Processing

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1180.0782s619 ◽

2019 ◽

Vol 8 (2S6) ◽

pp. 936-945

Keyword(s):

Data Warehouse ◽

Real Life ◽

Real Data ◽

Automated Testing ◽

Quality Of Data ◽

Data Set ◽

The Real ◽

Using Data ◽

Data Warehouse Quality

Testing is very essential in Data warehouse systems for decision making because the accuracy, validation and correctness of data depends on it. By looking to the characteristics and complexity of iData iwarehouse, iin ithis ipaper, iwe ihave itried ito ishow the scope of automated testing in assuring ibest data iwarehouse isolutions. Firstly, we developed a data set generator for creating synthetic but near to real data; then in isynthesized idata, with ithe help of hand icoded Extraction, Transformation and Loading (ETL) routine, anomalies are classified. For the quality assurance of data for a Data warehouse and to give the idea of how important the iExtraction, iTransformation iand iLoading iis, some very important test cases were identified. After that, to ensure the quality of data, the procedures of automated testing iwere iembedded iin ihand icoded iETL iroutine. Statistical analysis was done and it revealed a big enhancement in the quality of data with the procedures of automated testing. It enhances the fact that automated testing gives promising results in the data warehouse quality. For effective and easy maintenance of distributed data,a novel architecture was proposed. Although the desired result of this research is achieved successfully and the objectives are promising, but still there's a need to validate the results with the real life environment, as this research was done in simulated environment, which may not always give the desired results in real life environment. Hence, the overall potential of the proposed architecture can be seen until it is deployed to manage the real data which is distributed globally.

Download Full-text

The economic reality of Islamic banks’ transactions: a qualitative inquiry

International Journal of Islamic and Middle Eastern Finance and Management ◽

10.1108/imefm-04-2020-0172 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Bassam Mohammad Maali ◽

Usama Adnan Fendi ◽

Muhannad Ahmad Atmeh

Keyword(s):

Accounting Standards ◽

Qualitative Inquiry ◽

Policy Implications ◽

Islamic Banks ◽

Content Type ◽

Regulatory Frameworks ◽

Economic Reality ◽

New Perspective ◽

Conventional Banks ◽

Islamic Economic

Purpose This paper aims to investigate the economic substance of Islamic banks’ transaction as perceived by the employees and regulators of banks and the effect of such substance on the need for special accounting standards for Islamic banks. If there is a distinctive “Islamic economic substance”, then special accounting practices may be necessary such as the standards of the Accounting and Auditing Organization for Islamic Financial Institutions. Design/methodology/approach A qualitative inquiry on one of the leading Islamic banks in the Middle East was conducted to investigate the economic substance of the bank’s main two transactions; the deposit system and Murabaha financing, as perceived by informants within one of the earliest Islamic banks and its regulators. Findings It is found that despite the belief that the transactions under examination were different from equivalents within conventional banking, practice within the bank was not consistent with such a belief. Informants largely perceived the economic reality of the investigated transaction as being not different from conventional banks’ transactions, and this would affect the need for special accounting and regulatory frameworks. Research limitations/implications This investigation is confined to informants working within one Islamic bank; their views and perceptions may not coincide with those working in other Islamic banks in the world. Practical implications The results of this investigation provide policy implications for Islamic banks, regulators and standards setters in regard to the need for special accounting standards for Islamic banks. Originality/value The paper is one of the first papers that uses a qualitative inquiry on the main transactions of Islamic banks and the related need for special accounting practices. The paper provides a new perspective on the debate over whether Islamic banking is genuinely innovative or is merely a replicate for conventional banking.

Download Full-text

EXPONENTIATED HALF-LOGISTIC LOMAX DISTRIBUTION WITH PROPERTIES AND APPLICATION

NED University Journal of Research ◽

10.35453/nedjr-ascn-2018-0033 ◽

2019 ◽

Vol XVI (2) ◽

pp. 1-11

Author(s):

Farrukh Jamal ◽

Hesham Mohammed Reyad ◽

Soha Othman Ahmed ◽

Muhammad Akbar Ali Shah ◽

Emrah Altun

Keyword(s):

Real Data ◽

Continuous Model ◽

Model Parameters ◽

Data Set ◽

Lomax Distribution ◽

Mathematical Properties ◽

Proposed Model ◽

Probability Weighted Moment ◽

Record Statistics ◽

Maximum Likelihood Criterion

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

HCVS: Pinpointing Chromatin States Through Hierarchical Clustering and Visualization Scheme

Current Bioinformatics ◽

10.2174/1574893613666180402141107 ◽

2019 ◽

Vol 14 (2) ◽

pp. 148-156

Author(s):

Nighat Noureen ◽

Sahar Fazal ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Hierarchical Clustering ◽

Real Data ◽

Cell Types ◽

Computational Scheme ◽

Data Set ◽

Chromatin States ◽

Functional Regions ◽

Visualization Strategy ◽

Hidden States ◽

Next Generation Sequencing Ngs

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.

Download Full-text

Factors Affecting Driver Injury Severity in the Wrong-Way Crash: Accounting for Potential Heterogeneity in Means and Variances of Random Parameters

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211009882 ◽

2021 ◽

pp. 036119812110098

Author(s):

Miao Yu ◽

Jinxing Shen ◽

Changxi Ma

Keyword(s):

High Speed ◽

Injury Severity ◽

Unobserved Heterogeneity ◽

Research Effort ◽

Random Parameter ◽

Policy Implications ◽

Contributing Factors ◽

Severe Injuries ◽

Data Set ◽

Severity Levels

Because of the high percentage of fatalities and severe injuries in wrong-way driving (WWD) crashes, numerous studies have focused on identifying contributing factors to the occurrence of WWD crashes. However, a limited number of research effort has investigated the factors associated with driver injury-severity in WWD crashes. This study intends to bridge the gap using a random parameter logit model with heterogeneity in means and variances approach that can account for the unobserved heterogeneity in the data set. Police-reported crash data collected from 2014 to 2017 in North Carolina are used. Four injury-severity levels are defined: fatal injury, severe injury, possible injury, and no injury. Explanatory variables, including driver characteristics, roadway characteristics, environmental characteristics, and crash characteristics, are used. Estimation results demonstrate that factors, including the involvement of alcohol, rural area, principal arterial, high speed limit (>60 mph), dark-lighted conditions, run-off-road collision, and head-on collision, significantly increase the severity levels in WWD crashes. Several policy implications are designed and recommended based on findings.

Download Full-text

Implementation of a Modified Faster R-CNN for Target Detection Technology of Coastal Defense Radar

Remote Sensing ◽

10.3390/rs13091703 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1703

Author(s):

He Yan ◽

Chao Chen ◽

Guodong Jin ◽

Jindong Zhang ◽

Xudong Wang ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Target Detection ◽

Real Data ◽

Detection Performance ◽

Detection Accuracy ◽

Constant False Alarm Rate ◽

Data Set ◽

Detection Technology ◽

Coastal Defense

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.

Download Full-text

Comparison study in statistical estimation of gene effects based on a real data set

Journal of Physics Conference Series ◽

10.1088/1742-6596/1978/1/012047 ◽

2021 ◽

Vol 1978 (1) ◽

pp. 012047

Author(s):

Xiaona Sheng ◽

Yuqiu Ma ◽

Jiabin Zhou ◽

Jingjing Zhou

Keyword(s):

Statistical Estimation ◽

Real Data ◽

Comparison Study ◽

Data Set ◽

Gene Effects

Download Full-text

Intuitionistic fuzzy two-factor variance analysis of movie ticket sales

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219212 ◽

2021 ◽

pp. 1-11

Author(s):

Velichka Traneva ◽

Stoyan Tranev

Keyword(s):

Comparative Analysis ◽

Real Data ◽

Intuitionistic Fuzzy Sets ◽

Real Numbers ◽

Data Set ◽

Important Method ◽

Ticket Sales ◽

Intuitionistic Fuzzy ◽

The One ◽

First Time

Analysis of variance (ANOVA) is an important method in data analysis, which was developed by Fisher. There are situations when there is impreciseness in data In order to analyze such data, the aim of this paper is to introduce for the first time an intuitionistic fuzzy two-factor ANOVA (2-D IFANOVA) without replication as an extension of the classical ANOVA and the one-way IFANOVA for a case where the data are intuitionistic fuzzy rather than real numbers. The proposed approach employs the apparatus of intuitionistic fuzzy sets (IFSs) and index matrices (IMs). The paper also analyzes a unique set of data on daily ticket sales for a year in a multiplex of Cinema City Bulgaria, part of Cineworld PLC Group, applying the two-factor ANOVA and the proposed 2-D IFANOVA to study the influence of “ season ” and “ ticket price ” factors. A comparative analysis of the results, obtained after the application of ANOVA and 2-D IFANOVA over the real data set, is also presented.

Download Full-text

A Rank-Based Nonparametric Method for Mapping Quantitative Trait Loci in Outbred Half-Sib Pedigrees: Application to Milk Production in a Granddaughter Design

Genetics ◽

10.1093/genetics/149.3.1547 ◽

1998 ◽

Vol 149 (3) ◽

pp. 1547-1555 ◽

Cited By ~ 2

Author(s):

Wouter Coppieters ◽

Alexandre Kvasz ◽

Frédéric Farnir ◽

Juan-Jose Arranz ◽

Bernard Grisart ◽

...

Keyword(s):

Quantitative Trait Loci ◽

Quantitative Trait ◽

Real Data ◽

Interval Mapping ◽

Mapping Method ◽

Residual Variance ◽

Data Set ◽

Wilcoxon Rank Sum Test ◽

Holstein Friesian ◽

Trait Loci

Abstract We describe the development of a multipoint nonparametric quantitative trait loci mapping method based on the Wilcoxon rank-sum test applicable to outbred half-sib pedigrees. The method has been evaluated on a simulated dataset and its efficiency compared with interval mapping by using regression. It was shown that the rank-based approach is slightly inferior to regression when the residual variance is homoscedastic normal; however, in three out of four other scenarios envisaged, i.e., residual variance heteroscedastic normal, homoscedastic skewed, and homoscedastic positively kurtosed, the latter outperforms the former one. Both methods were applied to a real data set analyzing the effect of bovine chromosome 6 on milk yield and composition by using a 125-cM map comprising 15 microsatellites and a granddaughter design counting 1158 Holstein-Friesian sires.

Download Full-text