scholarly journals Enabling Equal Opportunity in Logistic Regression Algorithm

Author(s):  
Sandro Radovanović ◽  
Marko Ivić

Research Question: This paper aims at adjusting the logistic regression algorithm to mitigate unwanted discrimination shown towards race, gender, etc. Motivation: Decades of research in the field of algorithm design have been dedicated to making a better prediction model. Many algorithms are designed and improved, which made them better than the judgments of people and even experts. However, in recent years it has been discovered that predictive models can make unwanted discrimination. Such unwanted discrimination in the predictive model can lead to legal consequences. In order to mitigate the problem of unwanted discrimination, we propose equal opportunity between privileged and discriminated groups in the logistic regression algorithm. Idea: Our idea is to add a regularization term in the goal function of the logistic regression. Therefore, our predictive model will solve both the social problem and the predictive problem. More specifically, our model will provide fair and accurate predictions. Data: The data used in this research present U.S. census data describing individuals using personal characteristics with a goal to provide a binary classification model for predicting if an individual has an annual salary above $50k. The dataset used is known for disparate impact regarding female individuals. In addition, we used the COMPAS dataset aimed at predicting recidivism. COMPAS is biased toward African-Americans. Tools: We developed a novel regularization technique for equal opportunity in the logistic regression algorithm. The proposed regularization is compared against classical logistic regression and fairness constraint logistic regression, using a ten-fold cross-validation. Findings: The results suggest that equal opportunity logistic regression manages to create a fair prediction model. More specifically, our model improved both disparate impact and equal opportunity compared to classical logistic regression, with a minor loss in prediction accuracy. Compared to the disparate impact constrained logistic regression, our approach has higher prediction accuracy and equal opportunity, while having a lower disparate impact. By inspecting the coefficients of our approach and classical logistic regression, one can see that proxy attribute coefficients are reduced to very low values. Contribution: The main contribution of this paper is in the methodological part. More specifically, we implemented an equal opportunity in the logistic regression algorithm.

2021 ◽  
Vol 9 ◽  
Author(s):  
Keiko Ogawa ◽  
Seikou Nakamura ◽  
Haruka Oguri ◽  
Kaori Ryu ◽  
Taichi Yoneda ◽  
...  

Natural products are an excellent source of skeletons for medicinal seeds. Triterpenes and saponins are representative natural products that exhibit anti-herpes simplex virus type 1 (HSV-1) activity. However, there has been a lack of comprehensive information on the anti-HSV-1 activity of triterpenes. Therefore, expanding information on the anti-HSV-1 activity of triterpenes and improving the efficiency of their exploration are urgently required. To improve the efficiency of the development of anti-HSV-1 active compounds, we constructed a predictive model for the anti-HSV-1 activity of triterpenes by using the information obtained from previous studies using machine learning methods. In this study, we constructed a binary classification model (i.e., active or inactive) using a logistic regression algorithm. As a result of the evaluation of predictive model, the accuracy for the test data is 0.79, and the area under the curve (AUC) is 0.86. Additionally, to enrich the information on the anti-HSV-1 activity of triterpenes, a plaque reduction assay was performed on 20 triterpenes. As a result, chikusetsusaponin IVa (11: IC50 = 13.06 μM) was found to have potent anti-HSV-1 with three potentially anti-HSV-1 active triterpenes. The assay result was further used for external validation of predictive model. The prediction of the test compounds in the activity test showed a high accuracy (0.83) and AUC (0.81). We also found that this predictive model was found to be able to successfully narrow down the active compounds. This study provides more information on the anti-HSV-1 activity of triterpenes. Moreover, the predictive model can improve the efficiency of the development of active triterpenes by integrating many previous studies to clarify potential relationships.


2019 ◽  
Vol 11 (13) ◽  
pp. 3525 ◽  
Author(s):  
Han He ◽  
Sicheng Li ◽  
Lin Hu ◽  
Nelson Duarte ◽  
Otilia Manta ◽  
...  

In order to investigate the factors influencing the sustainable guarantee network and its differences in different spatial and temporal scales, logistic regression algorithm is used to analyze the data of listed companies in 31 provinces, municipalities and autonomous regions in China from 2008 to 2017 (excluding Hong Kong, Macau and Taiwan). The study finds that, overall, companies with better profitability, poor solvency, poor operational capability and higher levels of economic development are more likely to join the guarantee network. On the temporal scale, solvency and regional economic development exert increasing higher impact on the companies’ accession to the guarantee network, and operational capacity has increasingly smaller impact. On the spatial scale, the less close link between company executives and companies in the western region suggests higher possibility to join the guarantee network. The predictive accuracy test results of the logistic regression algorithm show that the training model of the western sample enterprises has the highest prediction accuracy when predicting enterprise behavior of joining the guarantee network, while the accuracy is the lowest in the central region. When forecasting enterprises’ failure to join the guarantee network, the training model of the central sample enterprise has the highest accuracy, while the accuracy is the lowest in the eastern region. This paper discusses the internal and external factors influencing the guarantee network risk from the perspective of spatial and temporal differences of the guarantee network, and discriminates the prediction accuracy of the training model, which means certain guiding significance for listed company management, bank and government to identify and control the guarantee network risk.


2021 ◽  
Author(s):  
Xiaoli Lei ◽  
Junli Wang ◽  
Lijie Kou ◽  
Zhigang Yang

Abstract Background: Because of the lack of compelling evidence for predicting the duration of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA shedding, the purpose of this retrospective study was to establish a predictive model for long-term SARS-CoV-2 RNA shedding in non-death hospitalized patients with coronavirus disease-19 (COVID-19).Methods: 97 non-death hospitalized patients with COVID-19 admitted to two hospitals in Henan province of China from February 3, 2020 to March 31, 2020 were retrospectively enrolled. Multivariate logistic regression was performed to identify the high risk factors associated with long-term SARS-CoV-2 RNA shedding and a predictive model was established and represented by a nomogram. Its performance was assessed with discrimination and calibration.Results: 97 patients were divided into the long-term (>21 days) group (n = 27, 27.8%) and the short-term (≤ 21 days) group (n = 70, 72.2%) based on their viral shedding duration. Multivariate logistic regression analysis showed that time from illness onset to diagnosis (OR 1.224, 95% CI 1.070-1.400, P = 0.003) and interstitial opacity in chest computerized tomography(CT) scan (OR 6.516, 95% CI 2.041-20.798, P = 0.002) were independent risk factors for long-term SARS-CoV-2 RNA shedding. A prediction model, which is presented with a nomogram, was established by incorporating the two risk factors. The goodness-of-fit statistics for the nomogram was not statistically significant (χ2 = 8.292; P = 0.406), and its area under the receiver operator characteristic curve was 0.834 (95% CI 0.731- 0.936; P < 0.001).Conclusion: The established model has a good predictive performance on the long-term viral RNA shedding in non-death hospitalized patients with COVID-19, but it still needs further validation by independent data set of large samples in the future.


Traffic accidents are one of the most life-threatening dangers to human being. Deaths and injuries due to traffic accidents have a great impact on society. Traffic accidents information and data provided by public can be useful to classify these accidents according to their type and severity, and consequently try to build predictive model. Detecting and identifying injury severity in traffic accidents in real time is primordial for speeding post-accidents protocols as well as developing general road safety policies. In this project we are using Logistic Regression algorithm to classify accident data. The data to be analysed is collected from various sources, is both structured and unstructured and has several attributes. In this project we are going to detect and analyse data together to generate decision trees that give insights on previous accidents.


2019 ◽  
Vol 9 (19) ◽  
pp. 3981 ◽  
Author(s):  
Bin Deng ◽  
Ren Jie Chin ◽  
Yao Tang ◽  
Changbo Jiang ◽  
Sai Hin Lai

Under the action of gravity, buoyancy, and surface tension, bubbles generated by wave breaking will rupture and polymerize, causing the occurrence of high-speed jets and strong turbulence in nearby water bodies, which in turn affects sea–air exchange, sediment transport, and pollutant movement. These interactions are closely related to the shape and velocity changes in single bubbles. Therefore, understanding the motion characteristics of single bubbles is essential. In this research, a large number of experiments were carried out to serve this purpose. The experimental data were used to develop three machine learning models for the bubble final velocity, bubble drag coefficient, and bubble shape, respectively. The performance of the feed forward back propagation neural network (FBNN) models for the final velocity and drag coefficient were evaluated. The coefficient of determination (R2) and root mean squared error (RMSE) value of final velocity prediction model was recorded at 0.83 and 0.0518, respectively. Meanwhile, for the drag coefficient prediction model, the values are 0.92 for R2 and 0.1534 for RMSE. The models can provide a more accurate output if compared to that from the empirical formulas. K-nearest neighbours (KNN), logistic regression, and random forest were applied as the algorithm while developing the bubble shape classification model. The best performance is achieved by the logistic regression.


2019 ◽  
Vol 7 (3) ◽  
pp. 1255
Author(s):  
Ahmad Shaker Abdalrada ◽  
Omar Hashim Yahya ◽  
Abdul Hadi M. Alaidi ◽  
Nasser Ali Hussein ◽  
Haider TH. Alrikabi ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Xichao Dai ◽  
Yumei Ding

In order to improve the accuracy of the evaluation results of multiperception intelligent wearable devices, the mathematical statistical characteristics based on speech, behavior, environment, and physical signs are proposed; first, the PCA feature compression algorithm was used to reduce the dimension of these features, and the differences among different training samples were compared and analyzed; then, three weak classifiers are designed using the logistic regression algorithm, and finally, a strong classifier with higher prediction accuracy is designed according to the boosting decision fusion method and ensemble learning idea. The results showed that the accuracy of the logistic regression model trained with the feature data of voice PCA was 0.964, but the recall rate and crossover results were significantly reduced to 0.844 and 0.846, respectively. The accuracy, accuracy and recall of the decision fusion model based on the boosting method and integrated learning are 0.969, and the prediction accuracy of K-folds cross-validation is also as high as 0.956; the superposition fusion results of three weak classifiers achieve a better classification effect.


2014 ◽  
Vol 989-994 ◽  
pp. 1517-1521
Author(s):  
Min Jiang ◽  
Na Chu ◽  
Xiao Ming Bi

At present, the competition is increasingly fierce between the securities company, whether can effectively prevent the loss of users, reducing loss rate is a difficult problem at present each securities company urgently needs to solve. The model based on the principle of data mining, proposes a prediction method based on Logistic regression algorithm. Prediction model is built based on Logistic regression algorithm and the validity and accuracy of the model is verified by experiment, provides a new method and thinking for the securities company customer churn prediction.


2019 ◽  
Vol 2019 ◽  
pp. 1-6 ◽  
Author(s):  
Luca Giannella ◽  
Lillo Bruno Cerami ◽  
Tiziano Setti ◽  
Ezio Bergamini ◽  
Fausto Boselli

Objective. To create a prediction model including clinical variables for the prediction of premalignant/malignant endometrial pathology in premenopausal women with abnormal uterine bleeding (AUB). Methods. This is an observational retrospective study including 240 premenopausal women with AUB referred to diagnostic hysteroscopy. Based on the presence of endometrial hyperplasia (EH) or cancer (EC), the women were divided into cases (EH/EC) and controls (no EH/EC). Univariate, stepwise logistic regression and ROC curve analysis were performed. Results. 12 women had EH/EC (5%). Stepwise logistic regression analysis showed that EH/EC associated significantly with BMI ≥ 30 (OR=7.70, 95% CI 1.90 to 31.17), diabetes (OR=9.71, 95% CI 1.63 to 57.81), and a thickened endometrium (OR=1.20, 95% CI 1.08 to 1.34, criterion > 11 mm). The AUC was 0.854 (95% confidence intervals 0.803 to 0.896, p<0.0001). Considering the pretest probability for EH/EC of 5%, the prediction model with a positive likelihood ratio of 8.14 showed a posttest probability of 30%. The simultaneous presence of two or three risk factors was significantly more common in women with EH/EC than controls (50% vs. 6.6 and 25% vs. 0%, respectively, p<0.0001). Conclusion. When premenopausal vaginal bleeding occurs in diabetic obese women with ET > 11 mm, the percentage of premalignant/malignant endometrial pathology increases by 25%. It is likely that the simultaneous presence of several risk factors is necessary to significantly increase the probability of endometrial pathology.


Sign in / Sign up

Export Citation Format

Share Document