A new framework based on features modeling and ensemble learning to predict query performance

Mohamed Zaghloul; Mofreh Salem; Amr Ali-Eldin

doi:10.1371/journal.pone.0258439

A new framework based on features modeling and ensemble learning to predict query performance

PLoS ONE ◽

10.1371/journal.pone.0258439 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258439

Author(s):

Mohamed Zaghloul ◽

Mofreh Salem ◽

Amr Ali-Eldin

Keyword(s):

Prediction Model ◽

Ensemble Learning ◽

Experimental Work ◽

Performance Prediction ◽

Missing Values ◽

Empirical Work ◽

Feature Modeling ◽

Training Dataset ◽

Query Performance ◽

New Framework

A query optimizer attempts to predict a performance metric based on the amount of time elapsed. Theoretically, this would necessitate the creation of a significant overhead on the core engine to provide the necessary query optimizing statistics. Machine learning is increasingly being used to improve query performance by incorporating regression models. To predict the response time for a query, most query performance approaches rely on DBMS optimizing statistics and the cost estimation of each operator in the query execution plan, which also focuses on resource utilization (CPU, I/O). Modeling query features is thus a critical step in developing a robust query performance prediction model. In this paper, we propose a new framework based on query feature modeling and ensemble learning to predict query performance and use this framework as a query performance predictor simulator to optimize the query features that influence query performance. In query feature modeling, we propose five dimensions used to model query features. The query features dimensions are syntax, hardware, software, data architecture, and historical performance logs. These features will be based on developing training datasets for the performance prediction model that employs the ensemble learning model. As a result, ensemble learning leverages the query performance prediction problem to deal with missing values. Handling overfitting via regularization. The section on experimental work will go over how to use the proposed framework in experimental work. The training dataset in this paper is made up of performance data logs from various real-world environments. The outcomes were compared to show the difference between the actual and expected performance of the proposed prediction model. Empirical work shows the effectiveness of the proposed approach compared to related work.

Reducing structured Big data benchmark cycle time using query performance prediction model

2015 International Conference on Computing, Communication and Security (ICCCS) ◽

10.1109/cccs.2015.7374126 ◽

2015 ◽

Author(s):

Rekha Singhal

Keyword(s):

Big Data ◽

Prediction Model ◽

Performance Prediction ◽

Cycle Time ◽

Query Performance

Project performance prediction model linking agility and flexibility demands to project type

Expert Systems ◽

10.1111/exsy.12675 ◽

2021 ◽

Author(s):

Marco Aurélio Oliveira ◽

Luiz V. O. Dalla Valentina ◽

André Hideto Futami ◽

Osmar Possamai ◽

Carlos Alberto Flesch

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Project Performance

Diabetic Retinopathy Prediction by Ensemble Learning Based on Biochemical and Physical Data

Sensors ◽

10.3390/s21113663 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3663

Author(s):

Zun Shen ◽

Qingfeng Wu ◽

Zhi Wang ◽

Guoyi Chen ◽

Bin Lin

Keyword(s):

Diabetic Retinopathy ◽

Prediction Model ◽

Ensemble Learning ◽

Positive Impact ◽

Reduction Rate ◽

Developed Countries ◽

Small Sample ◽

Feature Reduction ◽

Physical Data ◽

Diabetes Retinopathy

(1) Background: Diabetic retinopathy, one of the most serious complications of diabetes, is the primary cause of blindness in developed countries. Therefore, the prediction of diabetic retinopathy has a positive impact on its early detection and treatment. The prediction of diabetic retinopathy based on high-dimensional and small-sample-structured datasets (such as biochemical data and physical data) was the problem to be solved in this study. (2) Methods: This study proposed the XGB-Stacking model with the foundation of XGBoost and stacking. First, a wrapped feature selection algorithm, XGBIBS (Improved Backward Search Based on XGBoost), was used to reduce data feature redundancy and improve the effect of a single ensemble learning classifier. Second, in view of the slight limitation of a single classifier, a stacking model fusion method, Sel-Stacking (Select-Stacking), which keeps Label-Proba as the input matrix of meta-classifier and determines the optimal combination of learners by a global search, was used in the XGB-Stacking model. (3) Results: XGBIBS greatly improved the prediction accuracy and the feature reduction rate of a single classifier. Compared to a single classifier, the accuracy of the Sel-Stacking model was improved to varying degrees. Experiments proved that the prediction model of XGB-Stacking based on the XGBIBS algorithm and the Sel-Stacking method made effective predictions on diabetes retinopathy. (4) Conclusion: The XGB-Stacking prediction model of diabetic retinopathy based on biochemical and physical data had outstanding performance. This is highly significant to improve the screening efficiency of diabetes retinopathy and reduce the cost of diagnosis.

When is query performance prediction effective?

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09 ◽

10.1145/1571941.1572150 ◽

2009 ◽

Cited By ~ 4

Author(s):

Claudia Hauff ◽

Leif Azzopardi

Keyword(s):

Performance Prediction ◽

Query Performance

Ensemble Learning Approach with LASSO for Predicting Catalytic Reaction Rates

Synlett ◽

10.1055/a-1304-4878 ◽

2020 ◽

Author(s):

Akira Yada ◽

Kazuhiko Sato ◽

Tarojiro Matsumura ◽

Yasunobu Ando ◽

Kenji Nagata ◽

...

Keyword(s):

Ensemble Learning ◽

Reaction Rates ◽

Initial Reaction Rate ◽

Training Dataset ◽

Initial Reaction ◽

Learning Approach ◽

Learning Framework ◽

Machine Learning Approach ◽

Reasonable Prediction ◽

Epoxidation Of Alkenes

AbstractThe prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.

Student Performance Prediction with Optimum Multilabel Ensemble Model

Journal of Intelligent Systems ◽

10.1515/jisys-2021-0016 ◽

2021 ◽

Vol 30 (1) ◽

pp. 511-523

Author(s):

Ephrem Admasu Yekun ◽

Abrahaley Teklay Haile

Keyword(s):

High School Students ◽

Prediction Model ◽

Student Performance ◽

Performance Prediction ◽

Transformation Method ◽

Classification Task ◽

Support Vector ◽

School Students ◽

K Nearest Neighbors ◽

Classifier Chains

Abstract One of the important measures of quality of education is the performance of students in academic settings. Nowadays, abundant data is stored in educational institutions about students which can help to discover insight on how students are learning and to improve their performance ahead of time using data mining techniques. In this paper, we developed a student performance prediction model that predicts the performance of high school students for the next semester for five courses. We modeled our prediction system as a multi-label classification task and used support vector machine (SVM), Random Forest (RF), K-nearest Neighbors (KNN), and Multi-layer perceptron (MLP) as base-classifiers to train our model. We further improved the performance of the prediction model using a state-of-the-art partitioning scheme to divide the label space into smaller spaces and used Label Powerset (LP) transformation method to transform each labelset into a multi-class classification task. The proposed model achieved better performance in terms of different evaluation metrics when compared to other multi-label learning tasks such as binary relevance and classifier chains.

Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) and Performance Prediction Model

7th International Energy Conversion Engineering Conference ◽

10.2514/6.2009-4576 ◽

2009 ◽

Cited By ~ 11

Author(s):

Thomas Hammel ◽

Russell Bennett ◽

W. Otting ◽

S. Fanale

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Thermoelectric Generator ◽

And Performance

A performance prediction model for BIC-OFDM transmissions with AMC over nonlinear fading channels

2011 8th International Workshop on Multi-Carrier Systems & Solutions ◽

10.1109/mc-ss.2011.5910726 ◽

2011 ◽

Author(s):

Filippo Giannetti ◽

Ivan Stupia ◽

Vincenzo Lottici ◽

Riccardo Andreotti ◽

Aldo N. D'Andrea ◽

...

Keyword(s):

Prediction Model ◽

Fading Channels ◽

Performance Prediction ◽

A Performance

Forward and backward feature selection for query performance prediction

Proceedings of the 35th Annual ACM Symposium on Applied Computing ◽

10.1145/3341105.3373904 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sébastien Déjean ◽

Radu Tudor Ionescu ◽

Josiane Mothe ◽

Md Zia Ullah

Keyword(s):

Feature Selection ◽

Performance Prediction ◽

Query Performance ◽

Selection For

A PRACTICAL STUDY ON PAVEMENT PERFORMANCE PREDICTION MODEL AND EVALUATION METHOD FOR REHABILITATION STRATEGIES

JOURNAL OF PAVEMENT ENGINEERING JSCE ◽

10.2208/journalpe.12.219 ◽

2007 ◽

Vol 12 ◽

pp. 219-226

Author(s):

Naoki UESUGI ◽

Yoshikazu SUEHIRO ◽

Kouji HASHIMOTO ◽

Katsura ENDO

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Evaluation Method ◽

Pavement Performance ◽

Pavement Performance Prediction ◽

Rehabilitation Strategies