scholarly journals Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE)

Mathematics ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 152
Author(s):  
Montserrat González Garibay ◽  
Andrej Srakar ◽  
Tjaša Bartolj ◽  
Jože Sambt

Do machine learning algorithms perform better than statistical survival analysis when predicting retirement decisions? This exploratory article addresses the question by constructing a pseudo-panel with retirement data from the Survey of Health, Ageing, and Retirement in Europe (SHARE). The analysis consists of two methodological steps prompted by the nature of the data. First, a discrete Cox survival model of transitions to retirement with time-dependent covariates is compared to a Cox model without time-dependent covariates and a survival random forest. Second, the best performing model (Cox with time-dependent covariates) is compared to random forests adapted to time-dependent covariates by means of simulations. The results from the analysis do not clearly favor a single method; whereas machine learning algorithms have a stronger predictive power, the variables they use in their predictions do not necessarily display causal relationships with the outcome variable. Therefore, the two methods should be seen as complements rather than substitutes. In addition, simulations shed a new light on the role of some variables—such as education and health—in retirement decisions. This amounts to both substantive and methodological contributions to the literature on the modeling of retirement.

Author(s):  
R. Suganya ◽  
Rajaram S. ◽  
Kameswari M.

Currently, thyroid disorders are more common and widespread among women worldwide. In India, seven out of ten women are suffering from thyroid problems. Various research literature studies predict that about 35% of Indian women are examined with prevalent goiter. It is very necessary to take preventive measures at its early stages, otherwise it causes infertility problem among women. The recent review discusses various analytics models that are used to handle different types of thyroid problems in women. This chapter is planned to analyze and compare different classification models, both machine learning algorithms and deep leaning algorithms, to classify different thyroid problems. Literature from both machine learning and deep learning algorithms is considered. This literature review on thyroid problems will help to analyze the reason and characteristics of thyroid disorder. The dataset used to build and to validate the algorithms was provided by UCI machine learning repository.


2021 ◽  
Author(s):  
Yiqi Jack Gao ◽  
Yu Sun

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
H Lea ◽  
E Hutchinson ◽  
A Meeson ◽  
S Nampally ◽  
G Dennis ◽  
...  

Abstract Background and introduction Accurate identification of clinical outcome events is critical to obtaining reliable results in cardiovascular outcomes trials (CVOTs). Current processes for event adjudication are expensive and hampered by delays. As part of a larger project to more reliably identify outcomes, we evaluated the use of machine learning to automate event adjudication using data from the SOCRATES trial (NCT01994720), a large randomized trial comparing ticagrelor and aspirin in reducing risk of major cardiovascular events after acute ischemic stroke or transient ischemic attack (TIA). Purpose We studied whether machine learning algorithms could replicate the outcome of the expert adjudication process for clinical events of ischemic stroke and TIA. Could classification models be trained on historical CVOT data and demonstrate performance comparable to human adjudicators? Methods Using data from the SOCRATES trial, multiple machine learning algorithms were tested using grid search and cross validation. Models tested included Support Vector Machines, Random Forest and XGBoost. Performance was assessed on a validation subset of the adjudication data not used for training or testing in model development. Metrics used to evaluate model performance were Receiver Operating Characteristic (ROC), Matthews Correlation Coefficient, Precision and Recall. The contribution of features, attributes of data used by the algorithm as it is trained to classify an event, that contributed to a classification were examined using both Mutual Information and Recursive Feature Elimination. Results Classification models were trained on historical CVOT data using adjudicator consensus decision as the ground truth. Best performance was observed on models trained to classify ischemic stroke (ROC 0.95) and TIA (ROC 0.97). Top ranked features that contributed to classification of Ischemic Stroke or TIA corresponded to site investigator decision or variables used to define the event in the trial charter, such as duration of symptoms. Model performance was comparable across the different machine learning algorithms tested with XGBoost demonstrating the best ROC on the validation set for correctly classifying both stroke and TIA. Conclusions Our results indicate that machine learning may augment or even replace clinician adjudication in clinical trials, with potential to gain efficiencies, speed up clinical development, and retain reliability. Our current models demonstrate good performance at binary classification of ischemic stroke and TIA within a single CVOT with high consistency and accuracy between automated and clinician adjudication. Further work will focus on harmonizing features between multiple historical clinical trials and training models to classify several different endpoint events across trials. Our aim is to utilize these clinical trial datasets to optimize the delivery of CVOTs in further cardiovascular drug development. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): AstraZenca Plc


2017 ◽  
Vol 135 (3) ◽  
pp. 234-246 ◽  
Author(s):  
André Rodrigues Olivera ◽  
Valter Roesler ◽  
Cirano Iochpe ◽  
Maria Inês Schmidt ◽  
Álvaro Vigo ◽  
...  

ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1806
Author(s):  
Silvio Semanjski ◽  
Ivana Semanjski ◽  
Wim De Wilde ◽  
Sidharta Gautama

Global Navigation Satellite System (GNSS) meaconing and spoofing are being considered as the key threats to the Safety-of-Life (SoL) applications that mostly rely upon the use of open service (OS) signals without signal or data-level protection. While a number of pre and post correlation techniques have been proposed so far, possible utilization of the supervised machine learning algorithms to detect GNSS meaconing and spoofing is currently being examined. One of the supervised machine learning algorithms, the Support Vector Machine classification (C-SVM), is proposed for utilization at the GNSS receiver level due to fact that at that stage of signal processing, a number of measurements and observables exists. It is possible to establish the correlation pattern among those GNSS measurements and observables and monitor it with use of the C-SVM classification, the results of which we present in this paper. By adding the real-world spoofing and meaconing datasets to the laboratory-generated spoofing datasets at the training stage of the C-SVM, we complement the experiments and results obtained in Part I of this paper, where the training was conducted solely with the use of laboratory-generated spoofing datasets. In two experiments presented in this paper, the C-SVM algorithm was cross-fed with the real-world meaconing and spoofing datasets, such that the meaconing addition to the training was validated by the spoofing dataset, and vice versa. The comparative analysis of all four experiments presented in this paper shows promising results in two aspects: (i) the added value of the training dataset enrichment seems to be relevant for real-world GNSS signal manipulation attempt detection and (ii) the C-SVM-based approach seems to be promising for GNSS signal manipulation attempt detection, as well as in the context of potential federated learning applications.


Sign in / Sign up

Export Citation Format

Share Document