Comparing regression modeling strategies for predicting hometime

Abstract Background Hometime, the total number of days a person is living in the community (not in a healthcare institution) in a defined period of time after a hospitalization, is a patient-centred outcome metric increasingly used in healthcare research. Hometime exhibits several properties which make its statistical analysis difficult: it has a highly non-normal distribution, excess zeros, and is bounded by both a lower and upper limit. The optimal methodology for the analysis of hometime is currently unknown. Methods Using administrative data we identified adult patients diagnosed with stroke between April 1, 2010 and December 31, 2017 in Ontario, Canada. 90-day hometime and clinically relevant covariates were determined through administrative data linkage. Fifteen different statistical and machine learning models were fit to the data using a derivation sample. The models’ predictive accuracy and bias were assessed using an independent validation sample. Results Seventy-five thousand four hundred seventy-five patients were identified (divided into a derivation set of 49,402 and a test set of 26,073). In general, the machine learning models had lower root mean square error and mean absolute error than the statistical models. However, some statistical models resulted in lower (or equal) bias than the machine learning models. Most of the machine learning models constrained predicted values between the minimum and maximum observable hometime values but this was not the case for the statistical models. The machine learning models also allowed for the display of complex non-linear interactions between covariates and hometime. No model captured the non-normal bucket shaped hometime distribution. Conclusions Overall, no model clearly outperformed the others. However, it was evident that machine learning methods performed better than traditional statistical methods. Among the machine learning methods, generalized boosting machines using the Poisson distribution as well as random forests regression were the best performing. No model was able to capture the bucket shaped hometime distribution and future research on factors which are associated with extreme values of hometime that are not available in administrative data is warranted.

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

Improved Interpretability of Machine Learning Model Using Unsupervised Clustering: Predicting Time to First Treatment in Chronic Lymphocytic Leukemia

JCO Clinical Cancer Informatics ◽

10.1200/cci.18.00137 ◽

2019 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

David Chen ◽

Gaurav Goyal ◽

Ronald S. Go ◽

Sameer A. Parikh ◽

Che G. Ngufor

Keyword(s):

Machine Learning ◽

Risk Stratification ◽

Unsupervised Clustering ◽

Support Vector ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Continuous Output ◽

Time To First Treatment ◽

Machine Learning Models

PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.

Download Full-text

P1925Machine learning versus classic electrocardiographic criteria for left ventricular hypertrophy in a young pre-participation cohort: results from the SAFE protocol study

European Heart Journal ◽

10.1093/eurheartj/ehz748.0672 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

G Sng ◽

D Y Z Lim ◽

C H Sia ◽

J S W Lee ◽

X Y Shen ◽

...

Keyword(s):

Machine Learning ◽

Left Ventricular Hypertrophy ◽

Ventricular Hypertrophy ◽

Left Ventricular ◽

Superior Performance ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Ecg Criteria ◽

Machine Learning Models

Abstract Background/Introduction Classic electrocardiographic (ECG) criteria for left ventricular hypertrophy (LVH) have been well studied in Western populations, particularly in hypertensive patients. However, their utility in Asian populations is not well studied, and their applicability to young pre-participation cohorts is unclear. We sought to evaluate the performance of classical criteria against that of machine learning models. Aims We sought to evaluate the performance of classical criteria against the performance of novel machine learning models in the identification of LVH. Methodology Between November 2009 and December 2014, pre-participation screening ECG and subsequent echocardiographic data was collected from 13,954 males aged 16 to 22, who reported for medical screening prior to military conscription. Final diagnosis of LVH was made on echocardiography, with LVH defined as a left ventricular mass index >115g/m2. The continuous and binary forms of classical criteria were compared against machine learning models using receiver-operating characteristics (ROC) curve analysis. An 80:20 split was used to divide the data into training and test sets for the machine learning models, and three fold cross validation was used in training the models. We also compared the important variables identified by machine learning models with the input variables of classical criteria. Results Prevalence of echocardiographic LVH in this population was 0.91% (127 cases). Classical ECG criteria had poor performance in predicting LVH, with the best predictions achieved by the continuous Sokolow-Lyon (AUC = 0.63, 95% CI = 0.58–0.68) and the continuous Modified Cornell (AUC = 0.63, 95% CI = 0.58–0.68). Machine learning methods achieved superior performance – Random Forest (AUC = 0.74, 95% CI = 0.66–0.82), Gradient Boosting Machines (AUC = 0.70, 95% CI = 0.61–0.79), GLMNet (AUC = 0.78, 95% CI = 0.70–0.86). Novel and less recognized ECG parameters identified by the machine learning models as being predictive of LVH included mean QT interval, mean QRS interval, R in V4, and R in I. ROC curves of models studies Conclusion The prevalence of LVH in our population is lower than that previously reported in other similar populations. Classical ECG criteria perform poorly in this context. Machine learning methods show superior predictive performance and demonstrate non-traditional predictors of LVH from ECG data. Further research is required to improve the predictive ability of machine learning models, and to understand the underlying pathology of the novel ECG predictors identified.

Download Full-text

Empirical asset pricing via machine learning: evidence from the European stock market

Journal of Asset Management ◽

10.1057/s41260-021-00237-x ◽

2021 ◽

Author(s):

Wolfgang Drobetz ◽

Tizian Otto

Keyword(s):

Machine Learning ◽

Stock Returns ◽

Network Architecture ◽

Risk Measures ◽

Predictive Performance ◽

Support Vector ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Models

AbstractThis paper evaluates the predictive performance of machine learning methods in forecasting European stock returns. Compared to a linear benchmark model, interactions and nonlinear effects help improve the predictive performance. But machine learning models must be adequately trained and tuned to overcome the high dimensionality problem and to avoid overfitting. Across all machine learning methods, the most important predictors are based on price trends and fundamental signals from valuation ratios. However, the models exhibit substantial variation in statistical predictive performance that translate into pronounced differences in economic profitability. The return and risk measures of long-only trading strategies indicate that machine learning models produce sizeable gains relative to our benchmark. Neural networks perform best, also after accounting for transaction costs. A classification-based portfolio formation, utilizing a support vector machine that avoids estimating stock-level expected returns, performs even better than the neural network architecture.

Download Full-text

Field Development Optimization Using Machine Learning Methods to Identify the Optimal Water Flooding Regime

10.2118/206533-ms ◽

2021 ◽

Author(s):

Alexey Vasilievich Timonov ◽

Arturas Rimo Shabonas ◽

Sergey Alexandrovich Schmidt

Keyword(s):

Machine Learning ◽

Optimization Algorithms ◽

Machine Learning Algorithms ◽

Water Flooding ◽

Learning Models ◽

Learning Methods ◽

Injection Wells ◽

Field Development ◽

Machine Learning Methods ◽

Machine Learning Models

Abstract The main technology used to optimize field development is hydrodynamic modeling, which is very costly in terms of computing resources and expert time to configure the model. And in the case of brownfields, the complexity increases exponentially. The paper describes the stages of developing a hybrid geological-physical-mathematical proxy model using machine learning methods, which allows performing multivariate calculations and predicting production including various injection well operating regimes. Based on the calculations, we search for the optimal ratio of injection volume distribution to injection wells under given infrastructural constraints. The approach implemented in this work takes into account many factors (some features of the geological structure, history of field development, mutual influence of wells, etc.) and can offer optimal options for distribution of injection volumes of injection wells without performing full-scale or sector hydrodynamic simulation. To predict production, we use machine learning methods (based on decision trees and neural networks) and methods for optimizing the target functions. As a result of this research, a unified algorithm for data verification and preprocessing has been developed for feature extraction tasks and the use of deep machine learning models as input data. Various machine learning algorithms were tested and it was determined that the highest prediction accuracy is achieved by building machine learning models based on Temporal Convolutional Networks (TCN) and gradient boosting. Developed and tested an algorithm for finding the optimal allocation of injection volumes, taking into account the existing infrastructure constraints. Different optimization algorithms are tested. It is determined that the choice and setting of boundary conditions is critical for optimization algorithms in this problem. An integrated approach was tested on terrigenous formations of the West Siberian field, where the developed algorithm showed effectiveness.

Download Full-text

Combining electronic and structural features in machine learning models to predict organic solar cells properties

Materials Horizons ◽

10.1039/c8mh01135d ◽

2019 ◽

Vol 6 (2) ◽

pp. 343-349 ◽

Cited By ~ 39

Author(s):

Daniele Padula ◽

Jack D. Simpson ◽

Alessandro Troisi

Keyword(s):

Machine Learning ◽

Solar Cells ◽

Organic Solar Cells ◽

Structural Similarity ◽

Structural Features ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Models

Combining electronic and structural similarity between organic donors in kernel based machine learning methods allows to predict photovoltaic efficiencies reliably.

Download Full-text

Exploring machine learning methods for absolute configuration determination with vibrational circular dichroism

Physical Chemistry Chemical Physics ◽

10.1039/d1cp02428k ◽

2021 ◽

Vol 23 (35) ◽

pp. 19781-19789

Author(s):

Tom Vermeyen ◽

Jure Brence ◽

Robin Van Echelpoel ◽

Roy Aerts ◽

Guillaume Acke ◽

...

Keyword(s):

Machine Learning ◽

Circular Dichroism ◽

Absolute Configuration ◽

Vibrational Circular Dichroism ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

The Absolute ◽

Circular Dichroïsm ◽

Machine Learning Models

The capabilities of machine learning models to extract the absolute configuration of a series of compounds from their vibrational circular dichroism spectra have been demonstrated. The important spectral areas are identified.

Download Full-text

Face Recognition System with Machine Learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012047 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012047

Author(s):

Vuppu Padmakar ◽

B V Ramana Murthy

Keyword(s):

Machine Learning ◽

Face Recognition ◽

Open Source ◽

Programming Language ◽

Recognition System ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Face Recognition System ◽

Machine Learning Models

Abstract This venture plans to give improved security by enabling a client to realize who is actually getting to the framework utilizing facial acknowledgment. The framework enables just approved clients to get entrance. Python is a programming language utilized alongside Machine learning methods and an open source library which is utilized to configuration, construct and train Machine learning models. Interface component is additionally accommodated unapproved clients to enroll to obtain entrance with the earlier authorization from the Admin.

Download Full-text

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

10.5194/hess-2019-648 ◽

2020 ◽

Author(s):

Mo Zhang ◽

Wenjiao Shi

Keyword(s):

Machine Learning ◽

Soil Texture ◽

Texture Classification ◽

Learning Models ◽

Learning Methods ◽

Particle Size Fractions ◽

Machine Learning Methods ◽

Transformation Methods ◽

Log Ratio ◽

Machine Learning Models

Abstract. Soil texture and soil particle size fractions (PSFs) play an increasing role in physical, chemical and hydrological processes. Many previous studies have used machine-learning and log ratio transformation methods for soil texture classification and soil PSFs interpolation to improve the prediction accuracy. However, few reports systematically compared the performance of them in both classification and interpolation. Here, a total of 45 evaluation models generated from five machine-learning models – K-nearest neighbor (KNN), multilayer perceptron neural network (MLP), random forest (RF), support vector machines (SVM), extreme gradient boosting (XGB), combined with original and three log ratio methods – additive log ratio (ALR), centered log ratio (CLR) and isometric log ratio (ILR), were applied to evaluate and compare both of them using 640 soil samples in the Heihe River Basin in China. The results demonstrated that log ratio transformation methods decreased skewness of distributions of soil PSFs data. For soil texture classification, RF and XGB showed better performance with the overall accuracy and kappa coefficients, they were also recommended to evaluate classification capacity of imbalanced data according to the area under the precision-recall curve (AUPRC) analysis. For soil PSFs interpolation, RF delivered the best performance among five machine-learning models with the lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and the highest coefficient of determination (R2, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio methods, especially CLR and ILR. For the comparison of direct and indirect classification, prediction maps were similar on the middle and upper reaches and different on the lower reaches of the HRB. Moreover, indirect classification maps based on log ratio transformed data had more detailed information. There is a pronounced improvement with 21.3 % of kappa coefficient using indirect methods for soil texture classification compared to the direct ones. RF was recommended as the best strategy among these five machine-learning models according to the accuracy evaluation of soil PSFs interpolation and soil texture classification, and ILR was recommended for component-wise machine-learning methods without multivariate treatment considering the constrained nature of compositional data. In addition, XGB was preferred than other models when trade-off of accuracy and time was considered. Our findings can provide a reference for other research of spatial prediction of soil PSFs and texture using machine-learning methods with skewed distribution soil PSFs data in a large area.

Download Full-text

Electronic Tongue Recognition with Feature Specificity Enhancement

Sensors ◽

10.3390/s20030772 ◽

2020 ◽

Vol 20 (3) ◽

pp. 772 ◽

Cited By ~ 2

Author(s):

Tao Liu ◽

Yanbing Chen ◽

Dongqi Li ◽

Tao Yang ◽

Jianhua Cao

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Extraction Method ◽

Sensor Array ◽

Electronic Tongue ◽

Learning Models ◽

Learning Methods ◽

Feature Extraction Method ◽

Machine Learning Methods ◽

Machine Learning Models

As a kind of intelligent instrument, an electronic tongue (E-tongue) realizes liquid analysis with an electrode-sensor array and certain machine learning methods. The large amplitude pulse voltammetry (LAPV) is a regular E-tongue type that prefers to collect a large amount of response data at a high sampling frequency within a short time. Therefore, a fast and effective feature extraction method is necessary for machine learning methods. Considering the fact that massive common-mode components (high correlated signals) in the sensor-array responses would depress the recognition performance of the machine learning models, we have proposed an alternative feature extraction method named feature specificity enhancement (FSE) for feature specificity enhancement and feature dimension reduction. The proposed FSE method highlights the specificity signals by eliminating the common mode signals on paired sensor responses. Meanwhile, the radial basis function is utilized to project the original features into a nonlinear space. Furthermore, we selected the kernel extreme learning machine (KELM) as the recognition part owing to its fast speed and excellent flexibility. Two datasets from LAPV E-tongues have been adopted for the evaluation of the machine-learning models. One is collected by a designed E-tongue for beverage identification and the other one is a public benchmark. For performance comparison, we introduced several machine-learning models consisting of different combinations of feature extraction and recognition methods. The experimental results show that the proposed FSE coupled with KELM demonstrates obvious superiority to other models in accuracy, time consumption and memory cost. Additionally, low parameter sensitivity of the proposed model has been demonstrated as well.

Download Full-text