Resource utilization prediction technique in cloud using knowledge based ensemble random forest with LSTM model

Concurrent Engineering ◽

10.1177/1063293x211032622 ◽

2021 ◽

pp. 1063293X2110326

Author(s):

K Valarmathi ◽

S Kanaga Suba Raja

Keyword(s):

Random Forest ◽

Resource Utilization ◽

Prediction Models ◽

Short Term Memory ◽

Resource Usage ◽

Data Set ◽

Cloud Data ◽

Knowledge Based ◽

Cpu Utilization ◽

Historical Observation

Future computation of cloud datacenter resource usage is a provoking task due to dynamic and Business Critic workloads. Accurate prediction of cloud resource utilization through historical observation facilitates, effectively aligning the task with resources, estimating the capacity of a cloud server, applying intensive auto-scaling and controlling resource usage. As imprecise prediction of resources leads to either low or high provisioning of resources in the cloud. This paper focuses on solving this problem in a more proactive way. Most of the existing prediction models are based on a mono pattern of workload which is not suitable for handling peculiar workloads. The researchers address this problem by making use of a contemporary model to dynamically analyze the CPU utilization, so as to precisely estimate data center CPU utilization. The proposed design makes use of an Ensemble Random Forest-Long Short Term Memory based deep architectural models for resource estimation. This design preprocesses and trains data based on historical observation. The approach is analyzed by using a real cloud data set. The empirical interpretation depicts that the proposed design outperforms the previous approaches as it bears 30%–60% enhanced accuracy in resource utilization.

Download Full-text

Rock Strength Prediction in Real-Time while Drilling Employing Random Forest and Functional Network Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4050843 ◽

2021 ◽

pp. 1-21

Author(s):

Hany Gamal ◽

Ahmed Alsaihati ◽

Salaheldin Elkatatny ◽

Saleh Haidary ◽

Abdulazeez Abdulraheem

Keyword(s):

Random Forest ◽

Real Time ◽

Rock Strength ◽

Prediction Models ◽

Functional Network ◽

Percentage Error ◽

Data Set ◽

Unseen Data ◽

Drilling Data ◽

Data Points

Abstract The rock unconfined compressive strength (UCS) is one of the key parameters for geomechanical and reservoir modeling in the petroleum industry. Obtaining the UCS by conventional methods such as experimental work or empirical correlation from logging data are time consuming and highly cost. To overcome these drawbacks, this paper utilized the help of artificial intelligence (AI) to predict (in a real-time) the rock strength from the drilling parameters using two AI tools. Random forest (RF) based on principal component analysis (PCA), and functional network (FN) techniques were employed to build two UCS prediction models based on the drilling data such as weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q), and the rate of penetration (ROP). The models were built using 2,333 data points from well (A) with 70:30 training to testing ratio. The models were validated using unseen data set (1,300 data points) of Well (B) which is located in the same field and drilled across the same complex lithology. The results of the PCA-based RF model outperformed the FN in terms of correlation coefficient (R) and average absolute percentage error (AAPE). The overall accuracy for PCA-based RF was R of 0.99 and AAPE of 4.3 %, and for FN yielded R of 0.97 and AAPE of 8.5%. The validation results showed that R was 0.99 for RF and 0.96 for FN, while the AAPE was 4 and 7.9 % for RF and FN models, respectively. The developed PCA-based RF and FN models provide an accurate UCS estimation in real-time from the drilling data, saving time and cost and enhancing the well stability by generating UCS log from the rig drilling data.

Download Full-text

Predicting employee attrition using tree-based models

International Journal of Organizational Analysis ◽

10.1108/ijoa-10-2019-1903 ◽

2020 ◽

Vol 28 (6) ◽

pp. 1273-1291

Author(s):

Nesreen El-Rayes ◽

Ming Fang ◽

Michael Smith ◽

Stephen M. Taylor

Keyword(s):

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Binary Classification ◽

Primary Role ◽

Classification Models ◽

Job Transition ◽

Data Set ◽

Content Type ◽

Employee Attrition

Purpose The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes. Design/methodology/approach A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition. Findings Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm. Practical implications This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models. Originality/value This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.

Download Full-text

Reliable Prediction Models Based on Enriched Data for Identifying the Mode of Childbirth by Using Machine Learning Methods: Development Study

Journal of Medical Internet Research ◽

10.2196/28856 ◽

2021 ◽

Vol 23 (6) ◽

pp. e28856

Author(s):

Zahid Ullah ◽

Farrukh Saleem ◽

Mona Jamjoom ◽

Bahjat Fakieh

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Random Forest ◽

Maternity Care ◽

Nearest Neighbor ◽

Prediction Models ◽

Mode Of Delivery ◽

K Nearest Neighbor ◽

Reliable Prediction ◽

Data Set

Background The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data.

Download Full-text

Variant pathogenic prediction models VSRFM and VSRFM-s, the importance of splicing and allele frequency

10.1101/430975 ◽

2018 ◽

Author(s):

JL Cabrera-Alarcon ◽

J Garcia-Martinez

Keyword(s):

Random Forest ◽

Allele Frequency ◽

Prediction Models ◽

Specific Model ◽

Random Forest Model ◽

Independent Data ◽

Data Set ◽

New Model ◽

Forest Model

ABSTRACTCurrently, there are available several tools to predict the effect of variants, with the aim of classify variants in neutral or pathogenic. In this study, we propose a new model trained over ensemble scores with two particularities, first we consider minor frequency allele from gnomAD and second, we split variants based on their splicing for training each specific model. Variants Stacked Random Forest Model (VSRFM) was constructed for variants not involved in splicing and Variants Stacked Random Forest Model for splicing (VSRFM-s) was trained for variants affected by splicing. Comparing these scores with their constituent scores used as features, our models showed the best outcomes. These results were confirmed using an independent data set from Clinvar database, with similar results.

Download Full-text

Predicting the Rock Sonic Logs While Drilling by Random Forest and Decision Tree-Based Algorithms

Journal of Energy Resources Technology ◽

10.1115/1.4051670 ◽

2021 ◽

pp. 1-13

Author(s):

Hany Gamal ◽

Ahmed Alsaihati ◽

Salaheldin Elkatatny

Keyword(s):

Random Forest ◽

Decision Tree ◽

Real Time ◽

Prediction Models ◽

Percentage Error ◽

Data Set ◽

Drilling Parameters ◽

Drilling Data ◽

Conventional Methods ◽

Sonic Logs

Abstract The sonic data provides significant rock properties that are commonly used for designing the operational programs for drilling, rock fracturing, and development operations. The conventional methods for acquiring the rock sonic data in terms of compressional and shear slowness (ΔTc and ΔTs) are considered costly and time-consuming operations. The target of this paper is to proposed machine learning models for predicting the sonic logs from the drilling data in real-time. Decision tree (DT) and random forest (RF) were employed as train-based algorithms for building the sonic prediction models for drilling complex lithology rocks that have limestone, sandstone, shale, and carbonate formations. The input data for the models include the surface drilling parameters to predict the shear and compressional slowness. The study employed data set of 2888 data points for building and testing the model, while another collected 2863 data set was utilized for further validation for the sonic models. Sensitivity investigations were performed for DT and RF models to confirm optimal accuracy. The correlation of coefficient (R), and average absolute percentage error (AAPE) were used to check the models' accuracy between the actual values and models` outputs, in addition to, the sonic log profiles. The results indicated that the developed sonic models have a high capability for the sonic prediction from the drilling data as DT model recorded R higher than 0.967 and AAPE less than 2.76% for ΔTc and ΔTs models, while RF showed R higher than 0.991 with AAPE less than 1.07%. The further validation process for the developed models indicated the great results for the sonic prediction and RF model outperformed DT models as RF showed R higher than 0.986 with AAPE less than 1.12% while DT prediction recorded R greater than 0.93 with AAPE less than 1.95%. The sonic prediction through the developed models will save the cost and time for acquiring the sonic data through the conventional methods and will provide real-time estimation from the drilling parameters.

Download Full-text

AI-Based Stroke Disease Prediction System Using Real-Time Electromyography Signals

Applied Sciences ◽

10.3390/app10196791 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6791

Author(s):

Jaehak Yu ◽

Sejin Park ◽

Soon-Hyun Kwon ◽

Chee Meng Benjamin Ho ◽

Cheol-Sig Pyo ◽

...

Keyword(s):

Random Forest ◽

Real Time ◽

Prediction Models ◽

Short Term Memory ◽

Low Cost ◽

The Elderly ◽

Prediction System ◽

Appropriate Treatment ◽

Prediction Systems ◽

Time Diagnosis

Stroke is a leading cause of disabilities in adults and the elderly which can result in numerous social or economic difficulties. If left untreated, stroke can lead to death. In most cases, patients with stroke have been observed to have abnormal bio-signals (i.e., ECG). Therefore, if individuals are monitored and have their bio-signals measured and accurately assessed in real-time, they can receive appropriate treatment quickly. However, most diagnosis and prediction systems for stroke are image analysis tools such as CT or MRI, which are expensive and difficult to use for real-time diagnosis. In this paper, we developed a stroke prediction system that detects stroke using real-time bio-signals with artificial intelligence (AI). Both machine learning (Random Forest) and deep learning (Long Short-Term Memory) algorithms were used in our system. EMG (Electromyography) bio-signals were collected in real time from thighs and calves, after which the important features were extracted, and prediction models were developed based on everyday activities. Prediction accuracies of 90.38% for Random Forest and of 98.958% for LSTM were obtained for our proposed system. This system can be considered an alternative, low-cost, real-time diagnosis system that can obtain accurate stroke prediction and can potentially be used for other diseases such as heart disease.

Download Full-text

Computer-Aided Diagnosis in Jaundice: Comparison of Knowledge-based and Probabilistic Approaches

Methods of Information in Medicine ◽

10.1055/s-0038-1634634 ◽

1996 ◽

Vol 35 (01) ◽

pp. 41-51 ◽

Cited By ~ 3

Author(s):

F. Molino ◽

D. Furia ◽

F. Bar ◽

S. Battista ◽

N. Cappello ◽

...

Keyword(s):

Clinical Presentation ◽

Clinical Information ◽

Diagnostic Value ◽

Clinical Findings ◽

Clinical Documentation ◽

Data Set ◽

Knowledge Based ◽

Number Of Patients ◽

Support Tools ◽

Aided Diagnosis

AbstractThe study reported in this paper is aimed at evaluating the effectiveness of a knowledge-based expert system (ICTERUS) in diagnosing jaundiced patients, compared with a statistical system based on probabilistic concepts (TRIAL). The performances of both systems have been evaluated using the same set of data in the same number of patients. Both systems are spin-off products of the European project Euricterus, an EC-COMACBME Project designed to document the occurrence and diagnostic value of clinical findings in the clinical presentation of jaundice in Europe, and have been developed as decision-making tools for the identification of the cause of jaundice based only on clinical information and routine investigations. Two groups of jaundiced patients were studied, including 500 (retrospective sample) and 100 (prospective sample) subjects, respectively. All patients were independently submitted to both decision-support tools. The input of both systems was the data set agreed within the Euricterus Project. The performances of both systems were evaluated with respect to the reference diagnoses provided by experts on the basis of the full clinical documentation. Results indicate that both systems are clinically reliable, although the diagnostic prediction provided by the knowledge-based approach is slightly better.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

Random Forest (RF) and Artificial Neural Network (ANN) Algorithms for LULC Mapping

Engineering and Technology Journal ◽

10.30684/etj.v38i4a.399 ◽

2020 ◽

Vol 38 (4A) ◽

pp. 510-514

Author(s):

Tay H. Shihab ◽

Amjed N. Al-Hameedawi ◽

Ammar M. Hamza

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Artificial Neural Network ◽

Random Forest ◽

Satellite Image ◽

Landsat 8 ◽

Optical Remote Sensing ◽

Data Set ◽

Artificial Neural ◽

Artificial Neural Network Ann

In this paper to make use of complementary potential in the mapping of LULC spatial data is acquired from LandSat 8 OLI sensor images are taken in 2019. They have been rectified, enhanced and then classified according to Random forest (RF) and artificial neural network (ANN) methods. Optical remote sensing images have been used to get information on the status of LULC classification, and extraction details. The classification of both satellite image types is used to extract features and to analyse LULC of the study area. The results of the classification showed that the artificial neural network method outperforms the random forest method. The required image processing has been made for Optical Remote Sensing Data to be used in LULC mapping, include the geometric correction, Image Enhancements, The overall accuracy when using the ANN methods 0.91 and the kappa accuracy was found 0.89 for the training data set. While the overall accuracy and the kappa accuracy of the test dataset were found 0.89 and 0.87 respectively.

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text