Better Practices in the Development and Validation of Recidivism Risk Assessments: The Minnesota Sex Offender Screening Tool–4

Grant Duwe

doi:10.1177/0887403417718608

Better Practices in the Development and Validation of Recidivism Risk Assessments: The Minnesota Sex Offender Screening Tool–4

Criminal Justice Policy Review ◽

10.1177/0887403417718608 ◽

2017 ◽

Vol 30 (4) ◽

pp. 538-564 ◽

Cited By ~ 9

Author(s):

Grant Duwe

Keyword(s):

Sex Offenders ◽

Sex Offender ◽

Screening Tool ◽

Performance Metrics ◽

Area Under The Curve ◽

Predictive Performance ◽

Test Set ◽

And Performance ◽

Logistic Regression Algorithm ◽

Development And Validation

This study examines the development and validation of the Minnesota Sex Offender Screening Tool–4 (MnSOST-4) on a dataset consisting of 5,745 sex offenders released from Minnesota prisons between 2003 and 2012. Bootstrap resampling was used to select predictors, and k-fold and split-sample methods were used to internally validate the MnSOST-4. Using sex offense reconviction within 4 years of release from prison as the failure criterion, the data showed that 130 (2.3%) offenders in the overall sample were recidivists. Multiple classification methods and performance metrics were used to develop the MnSOST-4 and evaluate its predictive performance on the test set. The results from the regularized logistic regression algorithm showed that the MnSOST-4 performed well in predicting sexual recidivism in the test set, achieving an area under the curve (AUC) of 0.835. Additional analyses on the test set revealed that the MnSOST-4 outperformed the Minnesota Sex Offender Screening Tool–3 (MnSOST-3), Minnesota Sex Offender Screening Tool–Revised (MnSOST-R), and Static-99 in predicting sexual reoffending.

Get full-text (via PubEx)

A robust framework for shoulder implant X-ray image classification

Data Technologies and Applications ◽

10.1108/dta-08-2021-0210 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Minh Thanh Vo ◽

Anh H. Vo ◽

Tuong Le

Keyword(s):

Image Classification ◽

Performance Metrics ◽

State Of The Art ◽

Area Under The Curve ◽

Predictive Performance ◽

Extraction Process ◽

Content Type ◽

X Ray ◽

Robust Model ◽

Input Dataset

PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.

Get full-text (via PubEx)

Similar Predictive Accuracy of the Static-99R Risk Tool for White, Black, and Hispanic Sex Offenders in California

Criminal Justice and Behavior ◽

10.1177/0093854817711477 ◽

2017 ◽

Vol 44 (9) ◽

pp. 1125-1140 ◽

Cited By ~ 3

Author(s):

Seung C. Lee ◽

R. Karl Hanson

Keyword(s):

Sex Offenders ◽

Ethnic Groups ◽

Predictive Validity ◽

Sex Offender ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Cultural Bias ◽

Recidivism Rates ◽

Assessment Procedures

Although considerable research has found overall moderate predictive validity of Static-99R, a sex offender risk prediction tool, relatively little research has addressed its potential for cultural bias. This prospective study evaluated the predictive validity of Static-99R across the three major ethnic groups (White, n = 789; Black, n = 466; Hispanic, n = 719) in the state of California. Static-99R was able to discriminate recidivists from nonrecidivists among White, Black, and Hispanic sex offenders (all area under the curve [AUC] values >.70; odds ratios >1.39). Base rates (at a Static-99R score of 2) with a fixed 5-year follow-up across ethnic groups were very similar (2.4%-3.0%) but were significantly lower than the norms (5.6%). The current findings support the use of Static-99R in risk assessment procedures for sex offenders of White, Black, and Hispanic heritage, but it should be used with caution in estimating absolute sexual recidivism rates, particularly for Hispanic sex offenders.

Get full-text (via PubEx)

Minimum Relevant features to Obtain Explainable Systems for Predicting Cardiovascular Disease Using the Statlog Dataset

10.20944/preprints202012.0318.v1 ◽

2020 ◽

Author(s):

Roberto Porto ◽

Jose M. Molina ◽

Antonio Berlanga ◽

Miguel A. Patricio

Keyword(s):

Cardiovascular Disease ◽

Performance Metrics ◽

Predictive Performance ◽

Large Set ◽

Data Set ◽

Error Metrics ◽

Automated Learning ◽

And Performance ◽

The Cost

Learning systems have been very focused on creating models that are capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in order to interpret and explain their results. The need for interpretation is greater when these models are used to support decision making. In some areas this becomes an indispensable requirement, such as in medicine. This paper focuses on the prediction of cardiovascular disease by analyzing the well-known Statlog (Heart) Data Set from the UCI’s Automated Learning Repository. This study will analyze the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. It will be analyzed on a large set of classification techniques and performance metrics. Demonstrating that it is possible to make explainable and reliable models that have a good commitment to predictive performance.

Get full-text (via PubEx)

The Development and Validation of a Classification System Predicting Severe and Frequent Prison Misconduct

The Prison Journal ◽

10.1177/0032885519894587 ◽

2019 ◽

Vol 100 (2) ◽

pp. 173-200

Author(s):

Grant Duwe

Keyword(s):

Area Under The Curve ◽

Predictive Performance ◽

Assessment System ◽

Female Prisoner ◽

Prison Misconduct ◽

Test Sets ◽

High Level ◽

Multiple Metrics ◽

Development And Validation ◽

Gender Specific

This study presents the results from the development and validation of a fully automated, gender-specific risk assessment system designed to predict severe and frequent prison misconduct on a recurring, semiannual basis. K-fold and split-population methods were applied to train and test the predictive models. Regularized logistic regression was the classifier used on the training and test sets that contained 35,506 males and 3,849 females who were released from Minnesota prisons between 2006 and 2011. Using multiple metrics, the results showed the models achieved a relatively high level of predictive performance. For example, the average area under the curve (AUC) was 0.832 for the female prisoner models and 0.836 for the male prisoner models. The findings provide support for the notion that better predictive performance can be obtained by developing assessments that are customized to the population on which they will be used.

Get full-text (via PubEx)

Minimum Relevant Features to Obtain Explainable Systems for Predicting Cardiovascular Disease Using the Statlog Data Set

Applied Sciences ◽

10.3390/app11031285 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1285

Author(s):

Roberto Porto ◽

José M. Molina ◽

Antonio Berlanga ◽

Miguel A. Patricio

Keyword(s):

Cardiovascular Disease ◽

Performance Metrics ◽

Predictive Performance ◽

Large Set ◽

Data Set ◽

Automated Learning ◽

Prediction Systems ◽

And Performance ◽

The Cost ◽

The University

Learning systems have been focused on creating models capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in the interpretation and explanation of the results. The need for interpretation is greater when these models are used to support decision making. In some areas, this becomes an indispensable requirement, such as in medicine. The goal of this study was to define a simple process to construct a system that could be easily interpreted based on two principles: (1) reduction of attributes without degrading the performance of the prediction systems and (2) selecting a technique to interpret the final prediction system. To describe this process, we selected a problem, predicting cardiovascular disease, by analyzing the well-known Statlog (Heart) data set from the University of California’s Automated Learning Repository. We analyzed the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. We performed an analysis on a large set of classification techniques and performance metrics, demonstrating that it is possible to construct explainable and reliable models that provide high quality predictive performance.

Get full-text (via PubEx)

Development of a Minimally Invasive Screening Tool to Identify Obese Pediatric Population at Risk of Obstructive Sleep Apnea/Hypopnea Syndrome

Bioengineering ◽

10.3390/bioengineering7040131 ◽

2020 ◽

Vol 7 (4) ◽

pp. 131

Author(s):

José Miguel Calderón ◽

Julio Álvarez-Pitti ◽

Irene Cuenca ◽

Francisco Ponce ◽

Pau Redon

Keyword(s):

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Screening Tool ◽

Pediatric Population ◽

Sleep Apnea Syndrome ◽

Area Under The Curve ◽

Support Vector ◽

Obstructive Sleep ◽

Logistic Regression Algorithm

Obstructive sleep apnea syndrome is a reduction of the airflow during sleep which not only produces a reduction in sleep quality but also has major health consequences. The prevalence in the obese pediatric population can surpass 50%, and polysomnography is the current gold standard method for its diagnosis. Unfortunately, it is expensive, disturbing and time-consuming for experienced professionals. The objective is to develop a patient-friendly screening tool for the obese pediatric population to identify those children at higher risk of suffering from this syndrome. Three supervised learning classifier algorithms (i.e., logistic regression, support vector machine and AdaBoost) common in the field of machine learning were trained and tested on two very different datasets where oxygen saturation raw signal was recorded. The first dataset was the Childhood Adenotonsillectomy Trial (CHAT) consisting of 453 individuals, with ages between 5 and 9 years old and one-third of the patients being obese. Cross-validation was performed on the second dataset from an obesity assessment consult at the Pediatric Department of the Hospital General Universitario of Valencia. A total of 27 patients were recruited between 5 and 17 years old; 42% were girls and 63% were obese. The performance of each algorithm was evaluated based on key performance indicators (e.g., area under the curve, accuracy, recall, specificity and positive predicted value). The logistic regression algorithm outperformed (accuracy = 0.79, specificity = 0.96, area under the curve = 0.9, recall = 0.62 and positive predictive value = 0.94) the support vector machine and the AdaBoost algorithm when trained with the CHAT datasets. Cross-validation tests, using the Hospital General de Valencia (HG) dataset, confirmed the higher performance of the logistic regression algorithm in comparison with the others. In addition, only a minor loss of performance (accuracy = 0.75, specificity = 0.88, area under the curve = 0.85, recall = 0.62 and positive predictive value = 0.83) was observed despite the differences between the datasets. The proposed minimally invasive screening tool has shown promising performance when it comes to identifying children at risk of suffering obstructive sleep apnea syndrome. Moreover, it is ideal to be implemented in an outpatient consult in primary and secondary care.

Get full-text (via PubEx)

Crossvalidation in brain imaging analysis

10.1101/017418 ◽

2015 ◽

Cited By ~ 3

Author(s):

Nikolaus Kriegeskorte

Keyword(s):

Brain Imaging ◽

Empirical Test ◽

Predictive Performance ◽

Training Data ◽

Multiple Models ◽

Imaging Analysis ◽

Training Set ◽

Test Set ◽

And Performance ◽

Performance Estimates

Crossvalidation is a method for estimating predictive performance and adjudicating between multiple models. On each of k folds of the process, k-1 of k independent subsets of the data (training set) are used to fit the parameters of each model and the left-out subset (test set) is used to estimate predictive performance. The method is statistically efficient, because training data are reused for testing and performance estimates combined across folds. The method requires no assumptions, provides nearly unbiased (slightly conservative) estimates of predictive performance, and is generally applicable because it amounts to a direct empirical test of each model.

Get full-text (via PubEx)

Prospective predictive performance comparison between clinical gestalt and validated COVID-19 mortality scores

Journal of Investigative Medicine ◽

10.1136/jim-2021-002037 ◽

2021 ◽

pp. jim-2021-002037

Author(s):

Adrian Soto-Mota ◽

Braulio Alejandro Marfil-Garza ◽

Santiago Castiello-de Obeso ◽

Erick Jose Martinez Rodriguez ◽

Daniel Alberto Carrillo Vazquez ◽

...

Keyword(s):

Clinical Outcomes ◽

Mexico City ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Predictive Performance ◽

Tertiary Hospital ◽

Performance Comparison ◽

Evidence Based ◽

Likelihood Ratios ◽

And Performance

Most COVID-19 mortality scores were developed at the beginning of the pandemic and clinicians now have more experience and evidence-based interventions. Therefore, we hypothesized that the predictive performance of COVID-19 mortality scores is now lower than originally reported. We aimed to prospectively evaluate the current predictive accuracy of six COVID-19 scores and compared it with the accuracy of clinical gestalt predictions. 200 patients with COVID-19 were enrolled in a tertiary hospital in Mexico City between September and December 2020. The area under the curve (AUC) of the LOW-HARM, qSOFA, MSL-COVID-19, NUTRI-CoV, and NEWS2 scores and the AUC of clinical gestalt predictions of death (as a percentage) were determined. In total, 166 patients (106 men and 60 women aged 56±9 years) with confirmed COVID-19 were included in the analysis. The AUC of all scores was significantly lower than originally reported: LOW-HARM 0.76 (95% CI 0.69 to 0.84) vs 0.96 (95% CI 0.94 to 0.98), qSOFA 0.61 (95% CI 0.53 to 0.69) vs 0.74 (95% CI 0.65 to 0.81), MSL-COVID-19 0.64 (95% CI 0.55 to 0.73) vs 0.72 (95% CI 0.69 to 0.75), NUTRI-CoV 0.60 (95% CI 0.51 to 0.69) vs 0.79 (95% CI 0.76 to 0.82), NEWS2 0.65 (95% CI 0.56 to 0.75) vs 0.84 (95% CI 0.79 to 0.90), and neutrophil to lymphocyte ratio 0.65 (95% CI 0.57 to 0.73) vs 0.74 (95% CI 0.62 to 0.85). Clinical gestalt predictions were non-inferior to mortality scores, with an AUC of 0.68 (95% CI 0.59 to 0.77). Adjusting scores with locally derived likelihood ratios did not improve their performance; however, some scores outperformed clinical gestalt predictions when clinicians’ confidence of prediction was <80%. Despite its subjective nature, clinical gestalt has relevant advantages in predicting COVID-19 clinical outcomes. The need and performance of most COVID-19 mortality scores need to be evaluated regularly.

Get full-text (via PubEx)