scholarly journals Recurrent Stroke Prediction using Machine Learning Algorithms with Clinical Public Datasets: An Empirical Performance Evaluation

2021 ◽  
Vol 18 (4(Suppl.)) ◽  
pp. 1406
Author(s):  
Fadratul Hafinaz Hassan ◽  
Mohd Adib Omar

Recurrent strokes can be devastating, often resulting in severe disability or death. However, nearly 90% of the causes of recurrent stroke are modifiable, which means recurrent strokes can be averted by controlling risk factors, which are mainly behavioral and metabolic in nature. Thus, it shows that from the previous works that recurrent stroke prediction model could help in minimizing the possibility of getting recurrent stroke. Previous works have shown promising results in predicting first-time stroke cases with machine learning approaches. However, there are limited works on recurrent stroke prediction using machine learning methods. Hence, this work is proposed to perform an empirical analysis and to investigate machine learning algorithms implementation in the recurrent stroke prediction models. This research aims to investigate and compare the performance of machine learning algorithms using recurrent stroke clinical public datasets. In this study, Artificial Neural Network (ANN), Support Vector Machine (SVM) and Bayesian Rule List (BRL) are used and compared their performance in the domain of recurrent stroke prediction model. The result of the empirical experiments shows that ANN scores the highest accuracy at 80.00%, follows by BRL with 75.91% and SVM with 60.45%.

Author(s):  
Sheela Rani P ◽  
Dhivya S ◽  
Dharshini Priya M ◽  
Dharmila Chowdary A

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.


Author(s):  
Ruchika Malhotra ◽  
Anuradha Chug

Software maintenance is an expensive activity that consumes a major portion of the cost of the total project. Various activities carried out during maintenance include the addition of new features, deletion of obsolete code, correction of errors, etc. Software maintainability means the ease with which these operations can be carried out. If the maintainability can be measured in early phases of the software development, it helps in better planning and optimum resource utilization. Measurement of design properties such as coupling, cohesion, etc. in early phases of development often leads us to derive the corresponding maintainability with the help of prediction models. In this paper, we performed a systematic review of the existing studies related to software maintainability from January 1991 to October 2015. In total, 96 primary studies were identified out of which 47 studies were from journals, 36 from conference proceedings and 13 from others. All studies were compiled in structured form and analyzed through numerous perspectives such as the use of design metrics, prediction model, tools, data sources, prediction accuracy, etc. According to the review results, we found that the use of machine learning algorithms in predicting maintainability has increased since 2005. The use of evolutionary algorithms has also begun in related sub-fields since 2010. We have observed that design metrics is still the most favored option to capture the characteristics of any given software before deploying it further in prediction model for determining the corresponding software maintainability. A significant increase in the use of public dataset for making the prediction models has also been observed and in this regard two public datasets User Interface Management System (UIMS) and Quality Evaluation System (QUES) proposed by Li and Henry is quite popular among researchers. Although machine learning algorithms are still the most popular methods, however, we suggest that researchers working on software maintainability area should experiment on the use of open source datasets with hybrid algorithms. In this regard, more empirical studies are also required to be conducted on a large number of datasets so that a generalized theory could be made. The current paper will be beneficial for practitioners, researchers and developers as they can use these models and metrics for creating benchmark and standards. Findings of this extensive review would also be useful for novices in the field of software maintainability as it not only provides explicit definitions, but also lays a foundation for further research by providing a quick link to all important studies in the said field. Finally, this study also compiles current trends, emerging sub-fields and identifies various opportunities of future research in the field of software maintainability.


Author(s):  
Nabil Mohamed Eldakhly ◽  
Magdy Aboul-Ela ◽  
Areeg Abdalla

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.


2019 ◽  
Vol 27 (1) ◽  
pp. 13-21 ◽  
Author(s):  
Qiang Wei ◽  
Zongcheng Ji ◽  
Zhiheng Li ◽  
Jingcheng Du ◽  
Jingqi Wang ◽  
...  

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.


Author(s):  
Henock M. Deberneh ◽  
Intaek Kim

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.


2021 ◽  
Author(s):  
Nuno Moniz ◽  
Susana Barbosa

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>


2021 ◽  
Vol 309 ◽  
pp. 01043
Author(s):  
L. Chandrika ◽  
K. Madhavi

Cardiovascular Diseases (CVDs) are the primary cause for the sudden death in the world today from the past few years the disease has emerged greatly as a most unpredictable problem, not only in India the whole planet facing the criticality. So, there is a desperate need of valid, accurate and practical solution or application to diagnose the CVD problems in time for mandatory treatment. Predicting the CVD is a great challenge in the health care domain of clinical data analysis. Machine learning Algorithms (MLA) and Techniques has been vastly developed and proven to be effective and efficient in predicting the problems using the past data. Using these MLA techniques and taking the clinical dataset which provided by the healthcare industry. Different studies were takes place and tried only a small part into predicting CVD with ML Algorithms. In this thesis, we propose the different novel methodology which concentrates at finding appropriate features by using MLA techniques resulting at finding out the accurate model to predict CVD. In this prediction model we are trying to implement the models with different combinations of features and several known classification techniques such as Deep Learning, Random Forest, Generalised Linear Model, Naïve Bayes, Logistic Regression, Decision Tree, Gradient Boosted trees, Support Vector Machine, Vote and HRFLM and we have got an higher accuracy level and of 75.8%, 85.1%, 82.9%, 87.4%, 85%, 86.1%, 78.3%, 86.1%, 87.41%, and 88.4% through the prediction model for heart disease with the hybrid random forest with a linear model (HRFLM).


2021 ◽  
pp. 096032712199191
Author(s):  
B Behnoush ◽  
E Bazmi ◽  
SH Nazari ◽  
S Khodakarim ◽  
MA Looha ◽  
...  

Introduction: This study was designed to develop and evaluate machine learning algorithms for predicting seizure due to acute tramadol poisoning, identifying high-risk patients and facilitating appropriate clinical decision-making. Methods: Several characteristics of acute tramadol poisoning cases were collected in the Emergency Department (ED) (2013–2019). After selecting important variables in random forest method, prediction models were developed using the Support Vector Machine (SVM), Naïve Bayes (NB), Artificial Neural Network (ANN) and K-Nearest Neighbor (K-NN) algorithms. Area Under the Curve (AUC) and other diagnostic criteria were used to assess performance of models. Results: In 909 patients, 544 (59.8%) experienced seizures. The important predictors of seizure were sex, pulse rate, arterial blood oxygen pressure, blood bicarbonate level and pH. SVM (AUC = 0.68), NB (AUC = 0.71) and ANN (AUC = 0.70) models outperformed k-NN model (AUC = 0.58). NB model had a higher sensitivity and negative predictive value and k-NN model had higher specificity and positive predictive values than other models. Conclusion: A perfect prediction model may help improve clinicians’ decision-making and clinical care at EDs in hospitals and medical settings. SVM, ANN and NB models had no significant differences in the performance and accuracy; however, validated logistic regression (LR) was the superior model for predicting seizure due to acute tramadol poisoning.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S281-S281
Author(s):  
Chengbo Zeng ◽  
Yunyu Xiao

Abstract Background More than 360,000 people infected with COVID-19 in New York State (NYS) by the end of May 2020. Although expanded testing could effectively control statewide COVID-19 outbreak, the county-level factors predicting the number of testing are unknown. Accurately identifying the county-level predictors of testing may contribute to more effective testing allocation across counties in NYS. This study leveraged multiple public datasets and machine learning algorithms to construct and compare county-level prediction models of COVID-19 testing in NYS. Methods Testing data by May 15th was extracted from the Department of Health in NYS. A total of 28 county-level predictors derived from multiple public datasets (e.g., American Community Survey and US Health Data) were used to construct the prediction models. Three machine learning algorithms, including generalized linear regression with the least absolute shrinkage and selection operator(LASSO), ridge regression, and regression tree, were used to identify the most important county-level predictors, adjusting for prevalence and incidence. Model performances were assessed using the mean square error (MSE), with smaller MSE indicating a better model performance. Results The testing rate was 70.3 per 1,000 people in NYS. Counties (Rockland and Westchester) closed to the epicenter had high testing rates while counties (Chautauqua and Clinton) located at the boundary of NYS and were far away from the epicenter had low testing rates. The MSEs of linear regression with the LASSO penalty, ridge regression, and regression tree was 123.60, 40.59, and 298.0, respectively. Ridge regression was selected as the final model and revealed that the mental health provider rate was positively associated with testing (β=5.11, p=.04) while the proportion of religious adherents (β=-3.91, p=.05) was inversely related to the variation of testing rate across counties. Conclusion This study identified healthcare resources and religious environment as the strongest predictor of spatial variations of COVID-19 testing across NYS. Structural or policy efforts should address the spatial variations and target the relevant county-level predictors to promote statewide testing. Disclosures All Authors: No reported disclosures


2015 ◽  
Vol 5 (1) ◽  
pp. 56-73 ◽  
Author(s):  
Nicholas Ampazis

Managing inventory in a multi-level supply chain structure is a difficult task for big retail stores as it is particularly complex to predict demand for the majority of the items. This paper aims to highlight the potential of machine learning approaches as effective forecasting methods for predicting customer demand at the first level of organization of a supply chain where products are presented and sold to customers. For this purpose, we utilize Artificial Neural Networks (ANNs) trained with an effective second order algorithm, and Support Vector Machines (SVMs) for regression. We evaluated the effectiveness of the proposed approach using public data from the Netflix movie rental online DVD store in order to predict the demand for movie rentals during an especially critical for sales season, which is the Christmas holiday season. In our analysis we also integrated data from two other sources of information, namely an aggregator for movie reviews (Rotten Tomatoes), and a movie oriented social network (Flixster). Consequently, the approach presented in this paper combines the integration of data from various sources of information and the power of advanced machine learning algorithms for lowering the uncertainty barrier in forecasting supply chain demand.


Sign in / Sign up

Export Citation Format

Share Document