scholarly journals mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers

Author(s):  
Xavier Gitiaux ◽  
Huzefa Rangwala

Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool,  mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness.  Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.

The field of biosciences have advanced to a larger extent and have generated large amounts of information from Electronic Health Records. This have given rise to the acute need of knowledge generation from this enormous amount of data. Data mining methods and machine learning play a major role in this aspect of biosciences. Chronic Kidney Disease(CKD) is a condition in which the kidneys are damaged and cannot filter blood as they always do. A family history of kidney diseases or failure, high blood pressure, type 2 diabetes may lead to CKD. This is a lasting damage to the kidney and chances of getting worser by time is high. The very common complications that results due to a kidney failure are heart diseases, anemia, bone diseases, high potasium and calcium. The worst case situation leads to complete kidney failure and necessitates kidney transplant to live. An early detection of CKD can improve the quality of life to a greater extent. This calls for good prediction algorithm to predict CKD at an earlier stage . Literature shows a wide range of machine learning algorithms employed for the prediction of CKD. This paper uses data preprocessing,data transformation and various classifiers to predict CKD and also proposes best Prediction framework for CKD. The results of the framework show promising results of better prediction at an early stage of CKD


2021 ◽  
Author(s):  
Jason Williams ◽  
Sally Potter-McIntyre ◽  
Justin Filiberto ◽  
Shaunna Morrison ◽  
Daniel Hummer

<p>Indicator minerals have special physical and chemical properties that can be analyzed to glean information concerning the composition of host rocks and formational (or altering) fluids. Clay, zeolite, and tourmaline mineral groups are all ubiquitous at the Earth’s surface and shallow crust and distributed through a wide variety of sedimentary, igneous, metamorphic, and hydrothermal systems. Traditional studies of indicator mineral-bearing deposits have provided a wealth of data that could be integral to discovering new insights into the formation and evolution of naturally occurring systems. This study evaluates the relationships that exist between different environmental indicator mineral groups through the implementation of machine learning algorithms and network diagrams. Mineral occurrence data for thousands of localities hosting clay, zeolite, and tourmaline minerals were retrieved from mineral databases. Clustering techniques (e.g., agglomerative hierarchical clustering and density based spatial clustering of applications with noise) combined with network analyses were used to analyze the compiled dataset in an effort to characterize and identify geological processes operating at different localities across the United States. Ultimately, this study evaluates the ability of machine learning algorithms to act as supplementary diagnostic and interpretive tools in geoscientific studies.</p>


2021 ◽  
Author(s):  
Kate Bentley ◽  
Kelly Zuromski ◽  
Rebecca Fortgang ◽  
Emily Madsen ◽  
Daniel Kessler ◽  
...  

Background: Interest in developing machine learning algorithms that use electronic health record data to predict patients’ risk of suicidal behavior has recently proliferated. Whether and how such models might be implemented and useful in clinical practice, however, remains unknown. In order to ultimately make automated suicide risk prediction algorithms useful in practice, and thus better prevent patient suicides, it is critical to partner with key stakeholders (including the frontline providers who will be using such tools) at each stage of the implementation process.Objective: The aim of this focus group study was to inform ongoing and future efforts to deploy suicide risk prediction models in clinical practice. The specific goals were to better understand hospital providers’ current practices for assessing and managing suicide risk; determine providers’ perspectives on using automated suicide risk prediction algorithms; and identify barriers, facilitators, recommendations, and factors to consider for initiatives in this area. Methods: We conducted 10 two-hour focus groups with a total of 40 providers from psychiatry, internal medicine and primary care, emergency medicine, and obstetrics and gynecology departments within an urban academic medical center. Audio recordings of open-ended group discussions were transcribed and coded for relevant and recurrent themes by two independent study staff members. All coded text was reviewed and discrepancies resolved in consensus meetings with doctoral-level staff. Results: Though most providers reported using standardized suicide risk assessment tools in their clinical practices, existing tools were commonly described as unhelpful and providers indicated dissatisfaction with current suicide risk assessment methods. Overall, providers’ general attitudes toward the practical use of automated suicide risk prediction models and corresponding clinical decision support tools were positive. Providers were especially interested in the potential to identify high-risk patients who might be missed by traditional screening methods. Some expressed skepticism about the potential usefulness of these models in routine care; specific barriers included concerns about liability, alert fatigue, and increased demand on the healthcare system. Key facilitators included presenting specific patient-level features contributing to risk scores, emphasizing changes in risk over time, and developing systematic clinical workflows and provider trainings. Participants also recommended considering risk-prediction windows, timing of alerts, who will have access to model predictions, and variability across treatment settings.Conclusions: Providers were dissatisfied with current suicide risk assessment methods and open to the use of a machine learning-based risk prediction system to inform clinical decision-making. They also raised multiple concerns about potential barriers to the usefulness of this approach and suggested several possible facilitators. Future efforts in this area will benefit from incorporating systematic qualitative feedback from providers, patients, administrators, and payers on the use of new methods in routine care, especially given the complex, sensitive, and unfortunately still stigmatized nature of suicide risk.


2020 ◽  
pp. 97-102
Author(s):  
Benjamin Wiggins

Can risk assessment be made fair? The conclusion of Calculating Race returns to actuarial science’s foundations in probability. The roots of probability rest in a pair of problems posed to Blaise Pascal and Pierre de Fermat in the summer of 1654: “the Dice Problem” and “the Division Problem.” From their very foundation, the mathematics of probability offered the potential not only to be used to gain an advantage (as in the case of the Dice Problem), but also to divide material fairly (as in the case of the Division Problem). As the United States and the world enter an age driven by Big Data, algorithms, artificial intelligence, and machine learning and characterized by an actuarialization of everything, we must remember that risk assessment need not be put to use for individual, corporate, or government advantage but, rather, that it has always been capable of guiding how to distribute risk equitably instead.


10.29007/lt5p ◽  
2019 ◽  
Author(s):  
Sophie Siebert ◽  
Frieder Stolzenburg

Commonsense reasoning is an everyday task that is intuitive for humans but hard to implement for computers. It requires large knowledge bases to get the required data from, although this data is still incomplete or even inconsistent. While machine learning algorithms perform rather well on these tasks, the reasoning process remains a black box. To close this gap, our system CoRg aims to build an explainable and well-performing system, which consists of both an explainable deductive derivation process and a machine learning part. We conduct our experiments on the Copa question-answering benchmark using the ontologies WordNet, Adimen-SUMO, and ConceptNet. The knowledge is fed into the theorem prover Hyper and in the end the conducted models will be analyzed using machine learning algorithms, to derive the most probable answer.


Author(s):  
Marley Bacelar

Introduction Machine learning algorithms are quickly gaining traction in both the private and public sectors for their ability to automate both simple and complex decision-making processes. The vast majority of economic sectors, including transportation, retail, advertisement, and energy, are being disrupted by widespread data digitization and the emerging technologies that leverage it. Computerized systems are being introduced in government operations to improve accuracy and objectivity, and AI is having an impact on democracy and governance [1]. Numerous businesses are using machine learning to analyze massive quantities of data, from calculating credit for loan applications to scanning legal contracts for errors to analyzing employee interactions with customers to detect inappropriate behavior. New tools make it easier than ever for developers to design and deploy machine-learning algorithms [2] [3].


2021 ◽  
Author(s):  
Saya R Dennis ◽  
Tanya Simuni ◽  
Yuan Luo

Parkinson's Disease is the second most common neurodegenerative disorder in the United States, and is characterized by a largely irreversible worsening of motor and non-motor symptoms as the disease progresses. A prominent characteristic of the disease is its high heterogeneity in manifestation as well as the progression rate. For sporadic Parkinson's Disease, which comprises ~90% of all diagnoses, the relationship between the patient genome and disease onset or progression subtype remains largely elusive. Machine learning algorithms are increasingly adopted to study the genomics of diseases due to their ability to capture patterns within the vast feature space of the human genome that might be contributing to the phenotype of interest. In our study, we develop two machine learning models that predict the onset as well as the progression subtype of Parkinson's Disease based on subjects' germline mutations. Our best models achieved an ROC of 0.77 and 0.61 for disease onset and subtype prediction, respectively. To the best of our knowledge, our models present state-of-the-art prediction performances of PD onset and subtype solely based on the subjects' germline variants. The genes with high importance in our best-performing models were enriched for several canonical pathways related to signaling, immune system, and protein modifications, all of which have been previously associated with PD symptoms or pathogenesis. These high-importance gene sets provide us with promising candidate genes for future biomedical and clinical research.


10.2196/18401 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e18401
Author(s):  
Jane M Zhu ◽  
Abeed Sarker ◽  
Sarah Gollust ◽  
Raina Merchant ◽  
David Grande

Background Twitter is a potentially valuable tool for public health officials and state Medicaid programs in the United States, which provide public health insurance to 72 million Americans. Objective We aim to characterize how Medicaid agencies and managed care organization (MCO) health plans are using Twitter to communicate with the public. Methods Using Twitter’s public application programming interface, we collected 158,714 public posts (“tweets”) from active Twitter profiles of state Medicaid agencies and MCOs, spanning March 2014 through June 2019. Manual content analyses identified 5 broad categories of content, and these coded tweets were used to train supervised machine learning algorithms to classify all collected posts. Results We identified 15 state Medicaid agencies and 81 Medicaid MCOs on Twitter. The mean number of followers was 1784, the mean number of those followed was 542, and the mean number of posts was 2476. Approximately 39% of tweets came from just 10 accounts. Of all posts, 39.8% (63,168/158,714) were classified as general public health education and outreach; 23.5% (n=37,298) were about specific Medicaid policies, programs, services, or events; 18.4% (n=29,203) were organizational promotion of staff and activities; and 11.6% (n=18,411) contained general news and news links. Only 4.5% (n=7142) of posts were responses to specific questions, concerns, or complaints from the public. Conclusions Twitter has the potential to enhance community building, beneficiary engagement, and public health outreach, but appears to be underutilized by the Medicaid program.


2020 ◽  
Vol 12 (18) ◽  
pp. 3076
Author(s):  
Ju-Young Shin ◽  
Bu-Yo Kim ◽  
Junsang Park ◽  
Kyu Rang Kim ◽  
Joo Wan Cha

Leaf wetness duration (LWD) and plant diseases are strongly associated with each other. Therefore, LWD is a critical ecological variable for plant disease risk assessment. However, LWD is rarely used in the analysis of plant disease epidemiology and risk assessment because it is a non-standard meteorological variable. The application of satellite observations may facilitate the prediction of LWD as they may represent important related parameters and are particularly useful for meteorologically ungauged locations. In this study, the applicability of geostationary satellite observations for LWD prediction was investigated. GEO-KOMPSAT-2A satellite observations were used as inputs and six machine learning (ML) algorithms were employed to arrive at hourly LW predictions. The performances of these models were compared with that of a physical model through systematic evaluation. Results indicated that the LWD could be predicted using satellite observations and ML. A random forest model exhibited larger accuracy (0.82) than that of the physical model (0.79) in leaf wetness prediction. The performance of the proposed approach was comparable to that of the physical model in predicting LWD. Overall, the artificial intelligence (AI) models exhibited good performances in predicting LWD in South Korea.


Sign in / Sign up

Export Citation Format

Share Document