Incorporating Robustness to Imaging Physics into Radiomic Feature Selection for Breast Cancer Risk Estimation

Digital mammography has seen an explosion in the number of radiomic features used for risk-assessment modeling. However, having more features is not necessarily beneficial, as some features may be overly sensitive to imaging physics (contrast, noise, and image sharpness). To measure the effects of imaging physics, we analyzed the feature variation across imaging acquisition settings (kV, mAs) using an anthropomorphic phantom. We also analyzed the intra-woman variation (IWV), a measure of how much a feature varies between breasts with similar parenchymal patterns—a woman’s left and right breasts. From 341 features, we identified “robust” features that minimized the effects of imaging physics and IWV. We also investigated whether robust features offered better case-control classification in an independent data set of 575 images, all with an overall BI-RADS® assessment of 1 (negative) or 2 (benign); 115 images (cases) were of women who developed cancer at least one year after that screening image, matched to 460 controls. We modeled cancer occurrence via logistic regression, using cross‑validated area under the receiver-operating-characteristic curve (AUC) to measure model performance. Models using features from the most-robust quartile of features yielded an AUC = 0.59, versus 0.54 for the least-robust, with p < 0.005 for the difference among the quartiles.

Download Full-text

One- year mortality in patients with advanced hepatocellular carcinoma on immunotherapy: Prediction using machine learning models (Preprint)

10.2196/preprints.32281 ◽

2021 ◽

Author(s):

Thomas Ka-Luen Lui ◽

Ka Shing, Michael Cheung ◽

Wai Keung Leung

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Characteristic Curve ◽

False Negative ◽

False Negative Rate ◽

Absolute Error ◽

Advanced Hepatocellular Carcinoma ◽

Data Set ◽

One Year ◽

Related Mortality

BACKGROUND Immunotherapy is a new promising treatment for patients with advanced hepatocellular carcinoma (HCC), but is costly and potentially associated with considerable side effects. OBJECTIVE This study aimed to evaluate the role of machine learning (ML) models in predicting the one-year cancer-related mortality in advanced HCC patients treated with immunotherapy METHODS 395 HCC patients who had received immunotherapy (including nivolumab, pembrolizumab or ipilimumab) in 2014 - 2019 in Hong Kong were included. The whole data set were randomly divided into training (n=316) and validation (n=79) set. The data set, including 45 clinical variables, was used to construct six different ML models in predicting the risk of one-year mortality. The performances of ML models were measured by the area under receiver operating characteristic curve (AUC) and the mean absolute error (MAE) using calibration analysis. RESULTS The overall one-year cancer-related mortality was 51.1%. Of the six ML models, the random forest (RF) has the highest AUC of 0.93 (95%CI: 0.86-0.98), which was better than logistic regression (0.82, p=0.01) and XGBoost (0.86, p=0.04). RF also had the lowest false positive (6.7%) and false negative rate (2.8%). High baseline AFP, bilirubin and alkaline phosphatase were three common risk factors identified by all ML models. CONCLUSIONS ML models could predict one-year cancer-related mortality of HCC patients treated with immunotherapy, which may help to select patients who would most benefit from this new treatment option.

Download Full-text

Using Machine Learning and the Electronic Health Record to Predict Complicated Clostridium difficile Infection

Open Forum Infectious Diseases ◽

10.1093/ofid/ofz186 ◽

2019 ◽

Vol 6 (5) ◽

Cited By ~ 14

Author(s):

Benjamin Y Li ◽

Jeeheh Oh ◽

Vincent B Young ◽

Krishna Rao ◽

Jenna Wiens

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Characteristic Curve ◽

Model Performance ◽

Health Record ◽

Data Set ◽

Icu Admission ◽

Clostridioides Difficile ◽

Electronic Health ◽

Health Care Associated Infection

Abstract Background Clostridium (Clostridioides) difficile infection (CDI) is a health care–associated infection that can lead to serious complications. Potential complications include intensive care unit (ICU) admission, development of toxic megacolon, need for colectomy, and death. However, identifying the patients most likely to develop complicated CDI is challenging. To this end, we explored the utility of a machine learning (ML) approach for patient risk stratification for complications using electronic health record (EHR) data. Methods We considered adult patients diagnosed with CDI between October 2010 and January 2013 at the University of Michigan hospitals. Cases were labeled complicated if the infection resulted in ICU admission, colectomy, or 30-day mortality. Leveraging EHR data, we trained a model to predict subsequent complications on each of the 3 days after diagnosis. We compared our EHR-based model to one based on a small set of manually curated features. We evaluated model performance using a held-out data set in terms of the area under the receiver operating characteristic curve (AUROC). Results Of 1118 cases of CDI, 8% became complicated. On the day of diagnosis, the model achieved an AUROC of 0.69 (95% confidence interval [CI], 0.55–0.83). Using data extracted 2 days after CDI diagnosis, performance increased (AUROC, 0.90; 95% CI, 0.83–0.95), outperforming a model based on a curated set of features (AUROC, 0.84; 95% CI, 0.75–0.91). Conclusions Using EHR data, we can accurately stratify CDI cases according to their risk of developing complications. Such an approach could be used to guide future clinical studies investigating interventions that could prevent or mitigate complicated CDI.

Download Full-text

Non-Spatial Impairments Affect False-Positive Neglect Diagnosis Based on Cancellation Tasks

Journal of the International Neuropsychological Society ◽

10.1017/s1355617720000041 ◽

2020 ◽

Vol 26 (7) ◽

pp. 668-678 ◽

Cited By ~ 1

Author(s):

Hanne Huygelier ◽

Margaret Jane Moore ◽

Nele Demeyere ◽

Céline R. Gillebert

Keyword(s):

Test Performance ◽

False Positive ◽

Standard Procedure ◽

Model Performance ◽

False Positives ◽

Spatial Bias ◽

Multiple Tests ◽

Left And Right ◽

The Difference ◽

The Impact

AbstractObjective:To diagnose egocentric neglect after stroke, the spatial bias of performance on cancellation tasks is typically compared to a single cutoff. This standard procedure relies on the assumption that the measurement error of cancellation performance does not depend on non-spatial impairments affecting the total number of cancelled targets. Here we assessed the impact of this assumption on false-positive diagnoses.Method:We estimated false positives by simulating cancellation data using a binomial model. Performance was summarised by the difference in left and right cancelled targets (R-L) and the Centre of Cancellation (CoC). Diagnosis was based on a fixed cutoff versus cutoffs adjusted for the total number of cancelled targets and on single test performance versus unanimous or proportional agreement across multiple tests. Finally, we compared the simulation findings to empirical cancellation data acquired from 651 stroke patients.Results:Using a fixed cutoff, the rate of false positives depended on the total number of cancelled targets and ranged from 10% to 30% for R-L scores and from 10% to 90% for CoC scores. The rate of false positives increased even further when diagnosis was based on proportional agreement across multiple tests. Adjusted cutoffs and unanimous agreement across multiple tests were effective at controlling false positives. For empirical data, fixed versus adjusted cutoffs differ in estimation of neglect prevalence by 13%, and this difference was largest for patients with non-spatial impairments.Conclusions:Our findings demonstrate the importance of considering non-spatial impairments when diagnosing neglect based on cancellation performance.

Download Full-text

Modeling sleep onset misperception in insomnia

SLEEP ◽

10.1093/sleep/zsaa014 ◽

2020 ◽

Vol 43 (8) ◽

Cited By ~ 2

Author(s):

Lieke W A Hermans ◽

Merel M van Gilst ◽

Marta Regis ◽

Leonie C E van den Heuvel ◽

Hanneke Langen ◽

...

Keyword(s):

Sleep Onset ◽

Model Performance ◽

Minimum Length ◽

Model Parameters ◽

Sleep Onset Latency ◽

Sleep State ◽

Data Set ◽

Sleep Diaries ◽

The Difference ◽

Mean Square Errors

Abstract Objectives To extend and validate a previously suggested model of the influence of uninterrupted sleep bouts on sleep onset misperception in a large independent data set. Methods Polysomnograms and sleep diaries of 139 insomnia patients and 92 controls were included. We modeled subjective sleep onset as the start of the first uninterrupted sleep fragment longer than Ls minutes, where parameter Ls reflects the minimum length of a sleep fragment required to be perceived as sleep. We compared the so-defined sleep onset latency (SOL) for various values of Ls. Model parameters were compared between groups, and across insomnia subgroups with respect to sleep onset misperception, medication use, age, and sex. Next, we extended the model to incorporate the length of wake fragments. Model performance was assessed by calculating root mean square errors (RMSEs) of the difference between estimated and perceived SOL. Results Participants with insomnia needed a median of 34 minutes of undisturbed sleep to perceive sleep onset, while healthy controls needed 22 minutes (Mann–Whitney U = 4426, p < 0.001). Similar statistically significant differences were found between sleep onset misperceivers and non-misperceivers (median 40 vs. 20 minutes, Mann–Whitney U = 984.5, p < 0.001). Model outcomes were similar across other subgroups. Extended models including wake bout lengths resulted in only marginal improvements of model outcome. Conclusions Patients with insomnia, particularly sleep misperceivers, need larger continuous sleep bouts to perceive sleep onset. The modeling approach yields a parameter for which we coin the term Sleep Fragment Perception Index, providing a useful measure to further characterize sleep state misperception.

Download Full-text

Uncertainty in the Number of Calibration Repetitions of a Hydrologic Model in Varying Climatic Conditions

Water ◽

10.3390/w12092362 ◽

2020 ◽

Vol 12 (9) ◽

pp. 2362

Author(s):

Patrik Sleziak ◽

Ladislav Holko ◽

Michal Danko ◽

Juraj Parajka

Keyword(s):

Differential Evolution Algorithm ◽

Model Performance ◽

Hydrologic Model ◽

Climatic Conditions ◽

Data Set ◽

Climate Conditions ◽

Daily Data ◽

Model Efficiency ◽

The Difference ◽

The Impact

The objective of this study is to examine the impact of the number of calibration repetitions on hydrologic model performance and parameter uncertainty in varying climatic conditions. The study is performed in a pristine alpine catchment in the Western Tatra Mountains (the Jalovecký Creek catchment, Slovakia) using daily data from the period 1989–2018. The entire data set has been divided into five 6-years long periods; the division was based on the wavelet analysis of precipitation, air temperature and runoff data. A lumped conceptual hydrologic model TUW (“Technische Universität Wien”) was calibrated by an automatic optimisation using the differential evolution algorithm approach. To test the effect of the number of calibrations in the optimisation procedure, we have conducted 10, 50, 100, 300, 500 repetitions of calibrations in each period and validated them against selected runoff and snow-related model efficiency criteria. The results showed that while the medians of different groups of calibration repetitions were similar, the ranges (max–min) of model efficiency criteria and parameter values differed. An increasing number of calibration repetitions tend to increase the ranges of model efficiency criteria during model validation, particularly for the runoff volume error and snow error, which were not directly used in model calibration. Comparison of model efficiencies in climate conditions that varied among the five periods documented changes in model performance in different periods but the difference between 10 and 500 calibration repetitions did not change much between the selected time periods. The results suggest that ten repetitions of model calibrations provided the same median of model efficiency criteria as a greater number of calibration repetitions and model parameter variability and uncertainty were smaller.

Download Full-text

Difference Fourier Analysis of Glucose Embedded and Frozen Hydrated Purple Membrane

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100053164 ◽

1982 ◽

Vol 40 ◽

pp. 74-75

Author(s):

Jules S. Jaffe ◽

Robert M. Glaeser

Keyword(s):

Purple Membrane ◽

Data Set ◽

High Resolution Data ◽

X Ray ◽

X Ray Crystallography ◽

Fourier Techniques ◽

Versus Protein ◽

The Difference ◽

Difference Fourier ◽

Ideal Method

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.

Download Full-text

Data Augmentation Using Generative Adversarial Network for Automatic Machine Fault Detection Based on Vibration Signals

Applied Sciences ◽

10.3390/app11052166 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2166

Author(s):

Van Bui ◽

Tung Lam Pham ◽

Huy Nguyen ◽

Yeong Min Jang

Keyword(s):

Fault Detection ◽

Data Augmentation ◽

Model Performance ◽

Original Data ◽

Fault Classification ◽

Training Process ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Network ◽

Machine Fault

In the last decade, predictive maintenance has attracted a lot of attention in industrial factories because of its wide use of the Internet of Things and artificial intelligence algorithms for data management. However, in the early phases where the abnormal and faulty machines rarely appeared in factories, there were limited sets of machine fault samples. With limited fault samples, it is difficult to perform a training process for fault classification due to the imbalance of input data. Therefore, data augmentation was required to increase the accuracy of the learning model. However, there were limited methods to generate and evaluate the data applied for data analysis. In this paper, we introduce a method of using the generative adversarial network as the fault signal augmentation method to enrich the dataset. The enhanced data set could increase the accuracy of the machine fault detection model in the training process. We also performed fault detection using a variety of preprocessing approaches and classified the models to evaluate the similarities between the generated data and authentic data. The generated fault data has high similarity with the original data and it significantly improves the accuracy of the model. The accuracy of fault machine detection reaches 99.41% with 20% original fault machine data set and 93.1% with 0% original fault machine data set (only use generate data only). Based on this, we concluded that the generated data could be used to mix with original data and improve the model performance.

Download Full-text

Event detection of different English data sources based on transfer learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189798 ◽

2021 ◽

pp. 1-11

Author(s):

Yanan Huang ◽

Yuji Miao ◽

Zhenjing Da

Keyword(s):

Transfer Learning ◽

Event Detection ◽

Visual Analysis ◽

Learning Algorithm ◽

Data Sources ◽

Data Set ◽

Data Source ◽

Single Data Source ◽

The Difference ◽

Single Data

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.

Download Full-text

Deep Learning Versus Logistic Regression to Predict Opioid Dose Reduction After Spinal Cord Stimulation

Neurosurgery ◽

10.1093/neuros/nyaa447_638 ◽

2020 ◽

Vol 67 (Supplement_1) ◽

Author(s):

Syed M Adil ◽

Lefko T Charalambous ◽

Kelly R Murphy ◽

Shervin Rahimpour ◽

Stephen C Harward ◽

...

Keyword(s):

Neural Network ◽

Spinal Cord ◽

Deep Learning ◽

Dose Reduction ◽

Spinal Cord Stimulation ◽

Network Architecture ◽

Characteristic Curve ◽

Model Performance ◽

Health Crisis ◽

Opioid Dose

Abstract INTRODUCTION Opioid misuse persists as a public health crisis affecting approximately one in four Americans.1 Spinal cord stimulation (SCS) is a neuromodulation strategy to treat chronic pain, with one goal being decreased opioid consumption. Accurate prognostication about SCS success is key in optimizing surgical decision making for both physicians and patients. Deep learning, using neural network models such as the multilayer perceptron (MLP), enables accurate prediction of non-linear patterns and has widespread applications in healthcare. METHODS The IBM MarketScan® (IBM) database was queried for all patients ≥ 18 years old undergoing SCS from January 2010 to December 2015. Patients were categorized into opioid dose groups as follows: No Use, ≤ 20 morphine milligram equivalents (MME), 20–50 MME, 50–90 MME, and >90 MME. We defined “opiate weaning” as moving into a lower opioid dose group (or remaining in the No Use group) during the 12 months following permanent SCS implantation. After pre-processing, there were 62 predictors spanning demographics, comorbidities, and pain medication history. We compared an MLP with four hidden layers to the LR model with L1 regularization. Model performance was assessed using area under the receiver operating characteristic curve (AUC) with 5-fold nested cross-validation. RESULTS Ultimately, 6,124 patients were included, of which 77% had used opioids for >90 days within the 1-year pre-SCS and 72% had used >5 types of medications during the 90 days prior to SCS. The mean age was 56 ± 13 years old. Collectively, 2,037 (33%) patients experienced opiate weaning. The AUC was 0.74 for the MLP and 0.73 for the LR model. CONCLUSION To our knowledge, we present the first use of deep learning to predict opioid weaning after SCS. Model performance was slightly better than regularized LR. Future efforts should focus on optimization of neural network architecture and hyperparameters to further improve model performance. Models should also be calibrated and externally validated on an independent dataset. Ultimately, such tools may assist both physicians and patients in predicting opioid dose reduction after SCS.

Download Full-text

Application of portrait recognition system for emergency evacuation in mass emergencies

Journal of Intelligent Systems ◽

10.1515/jisys-2021-0052 ◽

2021 ◽

Vol 30 (1) ◽

pp. 893-902

Author(s):

Ke Xu

Keyword(s):

Detection Rate ◽

Characteristic Curve ◽

Recognition Rate ◽

Recognition System ◽

Emergency Evacuation ◽

Shopping Mall ◽

Single Shot ◽

Data Set ◽

Linear Discriminant ◽

Adaboost Algorithm

Abstract A portrait recognition system can play an important role in emergency evacuation in mass emergencies. This paper designed a portrait recognition system, analyzed the overall structure of the system and the method of image preprocessing, and used the Single Shot MultiBox Detector (SSD) algorithm for portrait detection. It also designed an improved algorithm combining principal component analysis (PCA) with linear discriminant analysis (LDA) for portrait recognition and tested the system by applying it in a shopping mall to collect and monitor the portrait and establish a data set. The results showed that the missing detection rate and false detection rate of the SSD algorithm were 0.78 and 2.89%, respectively, which were lower than those of the AdaBoost algorithm. Comparisons with PCA, LDA, and PCA + LDA algorithms demonstrated that the recognition rate of the improved PCA + LDA algorithm was the highest, which was 95.8%, the area under the receiver operating characteristic curve was the largest, and the recognition time was the shortest, which was 465 ms. The experimental results show that the improved PCA + LDA algorithm is reliable in portrait recognition and can be used for emergency evacuation in mass emergencies.

Download Full-text