scholarly journals Time Windows Voting Classifier for COVID-19 Mortality Prediction

Author(s):  
TIONG GOH ◽  
MengJun Liu

The ability to predict COVID-19 patients' level of severity (death or survival) enables clinicians to prioritise treatment. Recently, using three blood biomarkers, an interpretable machine learning model was developed to predict the mortality of COVID-19 patients. The method was reported to be suffering from performance stability because the identified biomarkers are not consistent predictors over an extended duration. To sustain performance, the proposed method partitioned data into three different time windows. For each window, an end-classifier, a mid-classifier and a front-classifier were designed respectively using the XGboost single tree approach. These time window classifiers were integrated into a majority vote classifier and tested with an isolated test data set. The voting classifier strengthens the overall performance of 90% cumulative accuracy from a 14 days window to a 21 days prediction window. An additional 7 days of prediction window can have a considerable impact on a patient's chance of survival. This study validated the feasibility of the time window voting classifier and further support the selection of biomarkers features set for the early prognosis of patients with a higher risk of mortality.

2020 ◽  
pp. 089686082097693
Author(s):  
Alix Clarke ◽  
Pietro Ravani ◽  
Matthew J Oliver ◽  
Mohamed Mahsin ◽  
Ngan N Lam ◽  
...  

Background: Technique failure is an important outcome measure in research and quality improvement in peritoneal dialysis (PD) programs, but there is a lack of consistency in how it is reported. Methods: We used data collected about incident dialysis patients from 10 Canadian dialysis programs between 1 January 2004 and 31 December 2018. We identified four main steps that are required when calculating the risk of technique failure. We changed one variable at a time, and then all steps, simultaneously, to determine the impact on the observed risk of technique failure at 24 months. Results: A total of 1448 patients received PD. Selecting different cohorts of PD patients changed the observed risk of technique failure at 24 months by 2%. More than one-third of patients who switched to hemodialysis returned to PD—90% returned within 180 days. The use of different time windows of observation for a return to PD resulted in risks of technique failure that differed by 16%. The way in which exit events were handled during the time window impacted the risk of technique failure by 4% and choice of statistical method changed results by 4%. Overall, the observed risk of technique failure at 24 months differed by 20%, simply by applying different approaches to the same data set. Conclusions: The approach to reporting technique failure has an important impact on the observed results. We present a robust and transparent methodology to track technique failure over time and to compare performance between programs.


1993 ◽  
Vol 115 (4A) ◽  
pp. 396-403 ◽  
Author(s):  
J. T. Baldwin ◽  
S. Deutsch ◽  
H. L. Petrie ◽  
J. M. Tarbell

The purpose of this study was to develop a method to accurately determine mean velocities and Reynolds stresses in pulsatile flows. The pulsatile flow used to develop this method was produced within a transparent model of a left ventricular assist device (LVAD). Velocity measurements were taken at locations within the LVAD using a two-component laser Doppler anemometry (LDA) system. At each measurement location, as many as 4096 realizations of two coincident orthogonal velocity components were collected during preselected time windows over the pump cycle. The number of realizations was varied to determine how the number of data points collected affects the accuracy of the results. The duration of the time windows was varied to determine the maximum window size consistent with an assumption of pseudostationary flow. Erroneous velocity realizations were discarded from individual data sets by implementing successive elliptical filters on the velocity components. The mean velocities and principal Reynolds stresses were determined for each of the filtered data sets. The filtering technique, while eliminating less than 5 percent of the original data points, significantly reduced the computed Reynolds stresses. The results indicate that, with proper filtering, reasonable accuracy can be achieved using a velocity data set of 250 points, provided the time window is small enough to ensure pseudostationary flow (typically 20 to 40 ms). The results also reveal that the time window which is required to assume pseudostationary flow varies with location and cycle time and can range from 100 ms to less than 20 ms. Rotation of the coordinate system to the principal stress axes can lead to large variations in the computed Reynolds stresses, up to 2440 dynes/cm2 for the normal stress and 7620 dynes/cm2 for the shear stress.


Author(s):  
Lisa Rienesl ◽  
Negar Khayatzadeh ◽  
Astrid Köck ◽  
Laura Dale ◽  
Andreas Werner ◽  
...  

Mid-infrared (MIR) spectroscopy is the method of choice for the standard milk recording system, to determine milk components including fat, protein, lactose and urea. Since milk composition is related to health and metabolic status of a cow, MIR spectra could be potentially used for disease detection. In dairy production, mastitis is one of the most prevalent diseases. The aim of this study was to develop a calibration equation to predict mastitis events from routinely recorded MIR spectra data. A further aim was to evaluate the use of test day somatic cell score (SCS) as covariate on the accuracy of the prediction model. The data for this study is from the Austrian milk recording system and its health monitoring system (GMON). Test day data including MIR spectra data was merged with diagnosis data of Fleckvieh, Brown Swiss and Holstein Friesian cows. As prediction variables, MIR absorbance data after first derivatives and selection of wavenumbers, corrected for days in milk, were used. The data set contained roughly 600,000 records and was split into calibration and validation sets by farm. Calibration sets were made to be balanced (as many healthy as mastitis cases), while the validation set was kept large and realistic. Prediction was done with Partial Least Squares Discriminant Analysis, key indicators of model fit were sensitivity and specificity. Results were extracted for association between spectra and diagnosis with different time windows (days between diagnosis and test days) in validation. The comparison of different sets of predictor variables (MIR, SCS, MIR + SCS) showed an advantage in prediction for MIR + SCS. For this prediction model, specificity was 0.79 and sensitivity was 0.68 in time window -7 to +7 days (calibration and validation). Corresponding values for MIR were 0.71 and 0.61, for SCS they were 0.81 and 0.62. In general, prediction of mastitis performed better with a shorter distance between test day and mastitis event, yet even for time windows of -21 to +21 days, prediction accuracies were still reasonable, with sensitivities ranging from 0.50 to 0.57 and specificities remaining unchanged (0.71 to 0.85). Additional research to further improve prediction equation, and studies on genetic correlations among clinical mastitis, SCS and MIR predicted mastitis are planned.


2020 ◽  
Vol 91 (3) ◽  
pp. 1646-1659 ◽  
Author(s):  
Fajun Miao ◽  
N. Seth Carpenter ◽  
Zhenming Wang ◽  
Andrew S. Holcomb ◽  
Edward W. Woolery

Abstract The manual separation of natural earthquakes from mine blasts in data sets recorded by local or regional seismic networks can be a labor-intensive process. An artificial neural network (ANN) applied to automate discriminating earthquakes from quarry and mining blasts in eastern Kentucky suggests that the analyst effort in this task can be significantly reduced. Based on a dataset of 152 local and regional earthquake and 4192 blast recordings over a three-year period in and around eastern Kentucky, ANNs of different configurations were trained and tested on amplitude spectra parameters. The parameters were extracted from different time windows of three-component broadband seismograms to learn the general characteristics of analyst-classified regional earthquake and blast signals. There was little variation in the accuracies and precisions of various models and ANN configurations. The best result used a network with two hidden layers of 256 neurons, trained on an input set of 132 spectral amplitudes and extracted from the P-wave time window and three overlapping time windows from the global maximum amplitude on all three components through the coda. For this configuration and input feature set, 97% of all recordings were accurately classified by our trained model. Furthermore, 96.7% of earthquakes in our data set were correctly classified with mean-event probabilities greater than 0.7. Almost all blasts (98.2%) were correctly classified by mean-event probabilities of at least 0.7. Our technique should greatly reduce the time required for manual inspection of blast recordings. Additionally, our technique circumvents the need for an analyst, or automatic locator, to locate the event ahead of time, a task that is difficult due to the emergent nature of P-wave arrivals induced by delay-fire mine blasts.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


Author(s):  
Hongguang Wu ◽  
Yuelin Gao ◽  
Wanting Wang ◽  
Ziyu Zhang

AbstractIn this paper, we propose a vehicle routing problem with time windows (TWVRP). In this problem, we consider a hard time constraint that the fleet can only serve customers within a specific time window. To solve this problem, a hybrid ant colony (HACO) algorithm is proposed based on ant colony algorithm and mutation operation. The HACO algorithm proposed has three innovations: the first is to update pheromones with a new method; the second is the introduction of adaptive parameters; and the third is to add the mutation operation. A famous Solomon instance is used to evaluate the performance of the proposed algorithm. Experimental results show that HACO algorithm is effective against solving the problem of vehicle routing with time windows. Besides, the proposed algorithm also has practical implications for vehicle routing problem and the results show that it is applicable and effective in practical problems.


OR Spectrum ◽  
2021 ◽  
Author(s):  
Christian Tilk ◽  
Katharina Olkis ◽  
Stefan Irnich

AbstractThe ongoing rise in e-commerce comes along with an increasing number of first-time delivery failures due to the absence of the customer at the delivery location. Failed deliveries result in rework which in turn has a large impact on the carriers’ delivery cost. In the classical vehicle routing problem (VRP) with time windows, each customer request has only one location and one time window describing where and when shipments need to be delivered. In contrast, we introduce and analyze the vehicle routing problem with delivery options (VRPDO), in which some requests can be shipped to alternative locations with possibly different time windows. Furthermore, customers may prefer some delivery options. The carrier must then select, for each request, one delivery option such that the carriers’ overall cost is minimized and a given service level regarding customer preferences is achieved. Moreover, when delivery options share a common location, e.g., a locker, capacities must be respected when assigning shipments. To solve the VRPDO exactly, we present a new branch-price-and-cut algorithm. The associated pricing subproblem is a shortest-path problem with resource constraints that we solve with a bidirectional labeling algorithm on an auxiliary network. We focus on the comparison of two alternative modeling approaches for the auxiliary network and present optimal solutions for instances with up to 100 delivery options. Moreover, we provide 17 new optimal solutions for the benchmark set for the VRP with roaming delivery locations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Diego Benavent ◽  
Diana Peiteado ◽  
María Ángeles Martinez-Huedo ◽  
María Hernandez-Hurtado ◽  
Alejandro Balsa ◽  
...  

AbstractTo analyze the epidemiology, clinical features and costs of hospitalized patients with gout during the last decade in Spain. Retrospective observational study based on data from the Minimum Basic Data Set (MBDS) from the Spanish National Health Service database. Patients ≥ 18 years with any gout diagnosis at discharge who had been admitted to public or private hospitals between 2005 and 2015 were included. Patients were divided in two periods: p1 (2005–2010) and p2 (2011–2015) to compare the number of hospitalizations, mean costs and mortality rates. Data from 192,037 patients with gout was analyzed. There was an increase in the number of hospitalized patients with gout (p < 0.001). The more frequent comorbidities were diabetes (27.6% of patients), kidney disease (26.6%) and heart failure (19.3%). Liver disease (OR 2.61), dementia (OR 2.13), cerebrovascular diseases (OR 1.57), heart failure (OR 1.41), and kidney disease (OR 1.34) were associated with a higher mortality risk. Women had a lower risk of mortality than men (OR 0.85). General mortality rates in these hospitalized patients progressively increased over the years (p < 0.001). In addition, costs gradually rose, presenting a significant increase in p2 even after adjusting for inflation (p = 0.001). A progressive increase in hospitalizations, mortality rates and cost in hospitalized patients with gout was observed. This harmful trend in a preventable illness highlights the need for change and the search for new healthcare strategies.


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


Sign in / Sign up

Export Citation Format

Share Document