Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Test Time ◽

Minimal Amount ◽

Data Points ◽

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Journal of Engineering for Gas Turbines and Power ◽

10.1115/1.4045516 ◽

2020 ◽

Vol 142 (6) ◽

Author(s):

Michael McCartney ◽

Matthias Haeringer ◽

Wolfgang Polifke

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Spline Interpolation ◽

Learning Algorithms ◽

Predictive Performance ◽

Test Time ◽

Minimal Amount ◽

Data Points ◽

Abstract This paper examines and compares the commonly used machine learning algorithms in their performance in interpolation and extrapolation of flame describing function (FDFs), based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the extended FDF (xFDF) framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian processes (GPs) regressor. The data itself were found to be an important factor in defining the predictive performance of a model; therefore, a method of optimally selecting data points at test time using Gaussian processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian processes also give an indication of confidence on its predictions and are used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

The Impact of Undersampling on the Predictive Performance of Logistic Regression and Machine Learning Algorithms

Epidemiology ◽

10.1097/ede.0000000000001198 ◽

2020 ◽

Vol 31 (5) ◽

pp. e42-e44

Author(s):

Abigail R. Cartus ◽

Lisa M. Bodnar ◽

Ashley I. Naimi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Predictive Performance ◽

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

A machine learning-based predictor for the identification of the recurrence of patients with gastric cancer after operation

Scientific Reports ◽

10.1038/s41598-021-81188-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chengmao Zhou ◽

Junhong Hu ◽

Ying Wang ◽

Mu-Huo Ji ◽

Jianhua Tong ◽

...

Keyword(s):

Machine Learning ◽

Gastric Cancer ◽

Learning Algorithms ◽

Test Group ◽

Operation Time ◽

Predictive Performance ◽

Original Data ◽

Postoperative Recurrence ◽

Gastric Cancer Patients

AbstractTo explore the predictive performance of machine learning on the recurrence of patients with gastric cancer after the operation. The available data is divided into two parts. In particular, the first part is used as a training set (such as 80% of the original data), and the second part is used as a test set (the remaining 20% of the data). And we use fivefold cross-validation. The weight of recurrence factors shows the top four factors are BMI, Operation time, WGT and age in order. In training group:among the 5 machine learning models, the accuracy of gbm was 0.891, followed by gbm algorithm was 0.876; The AUC values of the five machine learning algorithms are from high to low as forest (0.962), gbm (0.922), GradientBoosting (0.898), DecisionTree (0.790) and Logistic (0.748). And the precision of the forest is the highest 0.957, followed by the GradientBoosting algorithm (0.878). At the same time, in the test group is as follows: the highest accuracy of Logistic was 0.801, followed by forest algorithm and gbm; the AUC values of the five algorithms are forest (0.795), GradientBoosting (0.774), DecisionTree (0.773), Logistic (0.771) and gbm (0.771), from high to low. Among the five machine learning algorithms, the highest precision rate of Logistic is 1.000, followed by the gbm (0.487). Machine learning can predict the recurrence of gastric cancer patients after an operation. Besides, the first four factors affecting postoperative recurrence of gastric cancer were BMI, Operation time, WGT and age.

Analyze the impact of the epidemic on New York taxis by machine learning algorithms and recommendations for optimal prediction algorithms

10.1145/3475851.3475861 ◽

2021 ◽

Author(s):

Zheng Liu ◽

Xinjing Xia ◽

Haipeng Zhang ◽

Zihui Xie

Keyword(s):

Machine Learning ◽

New York ◽

Learning Algorithms ◽

Optimal Prediction ◽

Prediction Algorithms ◽

An analytical survey on the role of machine learning algorithms in case of intrusion detection

ACCENTS Transactions on Information Security ◽

10.19101/tis.2020.517002 ◽

2020 ◽

Vol 5 (19) ◽

pp. 32-35

Author(s):

Anand Vijay ◽

Kailash Patidar ◽

Manoj Yadav ◽

Rishi Kushwah

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Learning Algorithms ◽

Handling Mechanism ◽

In this paper an analytical survey on the role of machine learning algorithms in case of intrusion detection has been presented and discussed. This paper shows the analytical aspects in the development of efficient intrusion detection system (IDS). The related study for the development of this system has been presented in terms of computational methods. The discussed methods are data mining, artificial intelligence and machine learning. It has been discussed along with the attack parameters and attack types. This paper also elaborates the impact of different attack and handling mechanism based on the previous papers.

Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: the Evaluation of Text Classification Algorithms Using Machine Learning

10.20944/preprints201912.0220.v1 ◽

2019 ◽

Author(s):

Francesc López Seguí ◽

Ricardo Ander Egg Aguilar ◽

Gabriel de Maeztu ◽

Anna García-Altés ◽

Francesc García Cuyàs ◽

...

Keyword(s):

Machine Learning ◽

Primary Care ◽

Text Classification ◽

Learning Strategy ◽

Care Service ◽

Learning Algorithms ◽

Face To Face ◽

Classification Tool ◽

Background: the primary care service in Catalonia has operated an asynchronous teleconsulting service between GPs and patients since 2015 (eConsulta), which has generated some 500,000 messages. New developments in big data analysis tools, particularly those involving natural language, can be used to accurately and systematically evaluate the impact of the service. Objective: the study was intended to examine the predictive potential of eConsulta messages through different combinations of vector representation of text and machine learning algorithms and to evaluate their performance. Methodology: 20 machine learning algorithms (based on 5 types of algorithms and 4 text representation techniques)were trained using a sample of 3,559 messages (169,102 words) corresponding to 2,268 teleconsultations (1.57 messages per teleconsultation) in order to predict the three variables of interest (avoiding the need for a face-to-face visit, increased demand and type of use of the teleconsultation). The performance of the various combinations was measured in terms of precision, sensitivity, F-value and the ROC curve. Results: the best-trained algorithms are generally effective, proving themselves to be more robust when approximating the two binary variables "avoiding the need of a face-to-face visit" and "increased demand" (precision = 0.98 and 0.97, respectively) rather than the variable "type of query"(precision = 0.48). Conclusion: to the best of our knowledge, this study is the first to investigate a machine learning strategy for text classification using primary care teleconsultation datasets. The study illustrates the possible capacities of text analysis using artificial intelligence. The development of a robust text classification tool could be feasible by validating it with more data, making it potentially more useful for decision support for health professionals.

Predicting Obstetric Disease With Machine Learning Applied to Patient-Reported Data (Preprint)

10.2196/preprints.11766 ◽

2018 ◽

Cited By ~ 1

Author(s):

Danielle Bradley ◽

Erin Landau ◽

Adam Wolfberg ◽

Alex Baron

Keyword(s):

Machine Learning ◽

At Risk ◽

Mobile Apps ◽

Learning Algorithms ◽

Supervised Machine Learning ◽

Obstetric Outcomes ◽

Patient Reported ◽

Data Points ◽

Reported Data

BACKGROUND The rise of highly engaging digital health mobile apps over the past few years has created repositories containing billions of patient-reported data points that have the potential to inform clinical research and advance medicine. OBJECTIVE To determine if self-reported data could be leveraged to create machine learning algorithms to predict the presence of, or risk for, obstetric outcomes and related conditions. METHODS More than 10 million women have downloaded Ovia Health’s three mobile apps (Ovia Fertility, Ovia Pregnancy, and Ovia Parenting). Data points logged by app users can include information about menstrual cycle, health history, current health status, nutrition habits, exercise activity, symptoms, or moods. Machine learning algorithms were developed using supervised machine learning methodologies, specifically, Gradient Boosting Decision Tree algorithms. Each algorithm was developed and trained using anywhere from 385 to 5770 features and data from 77,621 to 121,740 app users. RESULTS Algorithms were created to detect the risk of developing preeclampsia, gestational diabetes, and preterm delivery, as well as to identify the presence of existing preeclampsia. The positive predictive value (PPV) was set to 0.75 for all of the models, as this was the threshold where the researchers felt a clinical response—additional screening or testing—would be reasonable, due to the likelihood of a positive outcome. Sensitivity ranged from 24% to 75% across all models. When PPV was adjusted from 0.75 to 0.52, the sensitivity of the preeclampsia prediction algorithm rose from 24% to 85%. When PPV was adjusted from 0.75 to 0.65, the sensitivity of the preeclampsia detection or diagnostic algorithm increased from 37% to 79%. CONCLUSIONS Algorithms based on patient-reported data can predict serious obstetric conditions with accuracy levels sufficient to guide clinical screening by health care providers and health plans. Further research is needed to determine whether such an approach can improve outcomes for at-risk patients and reduce the cost of screening those not at risk. Presenting the results of these models to patients themselves could also provide important insight into otherwise unknown health risks.

The impact of Negative to Positive Training Dataset Ratio on Atrial Fibrillation Classification Machine Learning Algorithms Performance

Journal of Physics Conference Series ◽

10.1088/1742-6596/1500/1/012131 ◽

2020 ◽

Vol 1500 ◽

pp. 012131

Author(s):

Firdaus ◽

Andre Herviant Juliano ◽

Naufal Rachmatullah ◽

Sarifah Putri Rafflesia ◽

Dinna Yunika Hardiyanti ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Learning Algorithms ◽

Training Dataset ◽

Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms

Mathematics ◽

10.3390/math9202537 ◽

2021 ◽

Vol 9 (20) ◽

pp. 2537

Author(s):

Luis Rolando Guarneros-Nolasco ◽

Nancy Aracely Cruz-Ramos ◽

Giner Alor-Hernández ◽

Lisbeth Rodríguez-Mazahua ◽

José Luis Sánchez-Cervantes

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Performance Metrics ◽

Learning Algorithms ◽

Predictive Performance ◽

Algorithm Performance ◽

Body Regions ◽

Risks Factors ◽

Fold Cross Validation

Cardiovascular Diseases (CVDs) are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. As an effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc—using the train-test split technique and k-fold cross-validation. Our study identifies the top-two and top-four attributes from CVD datasets analyzing the performance of the accuracy metrics to determine that they are the best for predicting and diagnosing CVD. As our main findings, the ten ML classifiers exhibited appropriate diagnosis in classification and predictive performance with accuracy metric with top-two attributes, identifying three main attributes for diagnosis and prediction of a CVD such as arrhythmia and tachycardia; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.