Workload Prediction on Google Cluster Trace

Author(s):  
Md. Rasheduzzaman ◽  
Md. Amirul Islam ◽  
Rashedur M. Rahman

Workload prediction in cloud systems is an important task to ensure maximum resource utilization. So, a cloud system requires efficient resource allocation to minimize the resource cost while maximizing the profit. One optimal strategy for efficient resource utilization is to timely allocate resources according to the need of applications. The important precondition of this strategy is obtaining future workload information in advance. The main focus of this analysis is to design and compare different forecasting models to predict future workload. This paper develops model through Adaptive Neuro Fuzzy Inference System (ANFIS), Non-linear Autoregressive Network with Exogenous inputs (NARX), Autoregressive Integrated Moving Average (ARIMA), and Support Vector Regression (SVR). Public trace data (workload trace version II) which is made available by Google were used to verify the accuracy, stability and adaptability of different models. Finally, this paper compares these prediction models to find out the model which ensures better prediction. Performance of forecasting techniques is measured by some popular statistical metric, i.e., Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Sum of Squared Error (SSE), Normalized Mean Squared Error (NMSE). The experimental result indicates that NARX model outperforms other models, e.g., ANFIS, ARIMA, and SVR.

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Humera Batool ◽  
Lixin Tian

Infectious diseases like COVID-19 spread rapidly and have led to substantial economic loss worldwide, including in Pakistan. The effect of weather on COVID-19 spreading needs more detailed examination, as some studies have claimed to mitigate its spread. COVID-19 was declared a pandemic by WHO and has been reported in about 210 countries worldwide, including Asia, Europe, the USA, and North America. Person-to-person contact and international air travel between the nations were the leading causes behind the spreading of SARS-CoV-2 from its point of origin, besides the natural forces. However, further spread and infection within the community or country can be aided by natural elements, such as the weather. Therefore, the correlation between COVID-19 and temperature can be better elucidated in countries like Pakistan, where SARS-CoV-2 has affected at least 0.37 million people. This study collected Pakistan’s COVID-19 infection and mortality data for ten months (March–December 2020). Related weather parameters, temperature, and humidity were also obtained for the same course of time. The collected data were processed and used to compare the performance of various time series prediction models in terms of mean squared error (MSE), root-mean-squared error (RMSE), and mean absolute percentage error (MAPE). This paper, using the time series model, estimates the effect of humidity, temperature, and other weather parameters on COVID-19 transmission by obtaining the correlation among the total infected cases and the number of deaths and weather variables in a particular region. Results depict that weather parameters hold more influence in evaluating the sum number of cases and deaths than other factors like community, age, and the total population. Therefore, temperature and humidity are salient parameters for predicting COVID-19 affected instances. Moreover, it is concluded that the higher the temperature, the lesser the mortality due to COVID-19 infection.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Gurmanik Kaur ◽  
Ajat Shatru Arora ◽  
Vijender Kumar Jain

Crossing the legs at the knees, during BP measurement, is one of the several physiological stimuli that considerably influence the accuracy of BP measurements. Therefore, it is paramount to develop an appropriate prediction model for interpreting influence of crossed legs on BP. This research work described the use of principal component analysis- (PCA-) fused forward stepwise regression (FSWR), artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), and least squares support vector machine (LS-SVM) models for prediction of BP reactivity to crossed legs among the normotensive and hypertensive participants. The evaluation of the performance of the proposed prediction models using appropriate statistical indices showed that the PCA-based LS-SVM (PCA-LS-SVM) model has the highest prediction accuracy with coefficient of determination (R2) = 93.16%, root mean square error (RMSE) = 0.27, and mean absolute percentage error (MAPE) = 5.71 for SBP prediction in normotensive subjects. Furthermore, R2 = 96.46%, RMSE = 0.19, and MAPE = 1.76 for SBP prediction and R2 = 95.44%, RMSE = 0.21, and MAPE = 2.78 for DBP prediction in hypertensive subjects using the PCA-LSSVM model. This assessment presents the importance and advantages posed by hybrid computing models for the prediction of variables in biomedical research studies.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Gurmanik Kaur ◽  
Ajat Shatru Arora ◽  
Vijender Kumar Jain

High blood pressure (BP) is associated with an increased risk of cardiovascular diseases. Therefore, optimal precision in measurement of BP is appropriate in clinical and research studies. In this work, anthropometric characteristics including age, height, weight, body mass index (BMI), and arm circumference (AC) were used as independent predictor variables for the prediction of BP reactivity to talking. Principal component analysis (PCA) was fused with artificial neural network (ANN), adaptive neurofuzzy inference system (ANFIS), and least square-support vector machine (LS-SVM) model to remove the multicollinearity effect among anthropometric predictor variables. The statistical tests in terms of coefficient of determination (R2), root mean square error (RMSE), and mean absolute percentage error (MAPE) revealed that PCA based LS-SVM (PCA-LS-SVM) model produced a more efficient prediction of BP reactivity as compared to other models. This assessment presents the importance and advantages posed by PCA fused prediction models for prediction of biological variables.


Author(s):  
Mohammad Hossein Ahmadi ◽  
Alireza Baghban ◽  
Ely Salwana ◽  
Milad Sadeghzadeh ◽  
Mohammad Zamen ◽  
...  

Solar energy is a renewable resources of energy which is broadly utilized and have the least pollution impact between the available alternatives of fossil fuels. In this investigation, machine leaening approaches of neural networks (NN), neuro-fuzzy and least squares support vector machine (LSSVM) are used to build the models for prediction of the thermal performance of a photovoltaic-thermal solar collector (PV/T) by estimating its efficiency as an output of the model while inlet temperature, flow rate, heat, solar radiation, and heat of sun are input of the designed model. Experimental measurements was prepared by designing a solar collector system and 100 data extracted. Different analyses are also performed to examine the credibility of the introduced approaches revealing great performance. The suggested LSSVM model represented the best performance regarding the mean squared error (MSE) of 0.003 and correlation coefficient (R2) value of 0.99, respectively.


2021 ◽  
Author(s):  
Emilly Pereira Alves ◽  
Joao Fausto Lorenzato Oliveira ◽  
Manoel Henrique da Nóbrega Marinho ◽  
Francisco Madeiro

In the forecasting time series field, the combination of techniques to aid in predicting different patterns has been the subject of several studies. Hybrid models have been widely applied in this scenario, where the vast majority of series are composed of linear and nonlinear patterns. The Autoregressive Integrated Moving Average (ARIMA) presents satisfactory results in a linear pattern prediction but can not capture nonlinear ones. In dealing with nonlinear patterns, the Support Vector Regression (SVR) has shown promising results. In order to map both patterns, an optimized nonlinear combination model based on SVR and ARIMA is proposed. The main difference in comparison with other works is the use of an interactive Particle Swarm Optimization (PSO) to increase the prediction performance. To the experimental setup, six well-known datasets of the literature is used. The performance is assessed by the metrics Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE). The results show the proposed system attains better outcomes when compared to the other tested techniques, for most of the used data.


In international market, trading of metals has played a vital role. Metal cost might affect the nation’s economy. There are so many base metals available which have been utilized in world trading for construction and manufacturing of goods. Among them gold, silver, platinum, palladium have been treated as precious metals which has economic values. Therefore today’s researchers have concentrated their investigation on metal prediction using diversified algorithms like Auto Regressive Integrated Moving Average (ARIMA), KNN (K-Nearest Neighbor),Artificial Neural Network (ANN) and Support Vector Machine (SVM) etc. In this paper our foremost objective is to predict gold price, so we put our research on this metal. In this work we have employed rough set based affinity propagation algorithm for predicting future gold price and we compared our proposed model with rough set and ARIMA model basing upon the performance measures such as root mean square error (RMSE) and mean absolute percentage error (MAPE). The experimental result shows that the proposed model outperforms rough set and ARIMA model


2019 ◽  
Vol 6 (1) ◽  
pp. 41
Author(s):  
Jaka Darma Jaya

Perkembangan produksi daging sapi di Indonesia selama 30 tahun terakhir secara umum cenderung meningkat. Kebutuhan daging sapi di Indonesia masih belum bisa dicukupi oleh supply domestik, sehingga diperlukan impor daging sapi dari luar negeri.  Diperlukan kajian tentang proyeksi ketersediaan populasi sapi potong di masa mendatang agar diambil kebijakan yang tepat dalam menjaga stabilitas dan keterpenuhan supply daging nasional.  Penelitian ini bertujuan untuk melakukan peramalan jumlah populasi sapi potong menggunakan 3 (tiga) metode peramalan yaitu metode moving average, exponential smoothing dan trend analysis.  Hasil peramalan ini selanjutnya diukur akurasinya menggunakan MAD (Mean Absolud Deviation), MSE (Mean Squared Error) dan MAPE (Mean Absolute Percentage Error).  Proyeksi populasi sapi potong pada tahun 2019 (periode berikutnya) menggunakan 3 metode peramalan adalah: 195.100 (moving average); 218.225 (exponential smooting) dan 262.899 (trend analysis). Pengukuran akurasi menggunakan MAD, MSE dan MAPE menunjukkan bahwa metode peramalan jumlah populasi sapi potong yang paling akurat adalah peramalan menggunakan metode polynomial trend analysis (MAD 14.716,12;  MSE 327.282.084,17; dan MAPE 0,09) karena memiliki tingkat kesalahan yang lebih kecil dibandingkan hasil peramalan menggunakan metode moving average dan exponential smoothing.


Author(s):  
Qiuyu Meng ◽  
Xun Liu ◽  
Jiajia Xie ◽  
Dayong Xiao ◽  
Yi Wang ◽  
...  

Abstract Background This study aimed to analyse the epidemiological characteristics of bacillary dysentery (BD) caused by Shigella in Chongqing, China, and to establish incidence prediction models based on the correlation between meteorological factors and BD, thus providing a scientific basis for the prevention and control of BD. Methods In this study, descriptive methods were employed to investigate the epidemiological distribution of BD. The Boruta algorithm was used to estimate the correlation between meteorological factors and BD incidence. The genetic algorithm (GA) combined with support vector regression (SVR) was used to establish the prediction models for BD incidence. Results In total, 68,855 cases of BD were included. The incidence declined from 36.312/100,000 to 23.613/100,000, with an obvious seasonal peak from May to October. Males were more predisposed to the infection than females (the ratio was 1.118:1). Children < 5 years old comprised the highest incidence (295.892/100,000) among all age categories, and pre-education children comprised the highest proportion (34,658 cases, 50.335%) among all occupational categories. Eight important meteorological factors, including the highest temperature, average temperature, average air pressure, precipitation and sunshine, were correlated with the monthly incidence of BD. The obtained mean absolute percent error (MAPE), mean squared error (MSE) and squared correlation coefficient (R2) of GA_SVR_MONTH values were 0.087, 0.101 and 0.922, respectively. Conclusion From 2009 to 2016, BD incidence in Chongqing was still high, especially in the main urban areas and among the male and pre-education children populations. Eight meteorological factors, including temperature, air pressure, precipitation and sunshine, were the most important correlative feature sets of BD incidence. Moreover, BD incidence prediction models based on meteorological factors had better prediction accuracies. The findings in this study could provide a panorama of BD in Chongqing and offer a useful approach for predicting the incidence of infectious disease. Furthermore, this information could be used to improve current interventions and public health planning.


2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Hye-Jin Kim ◽  
Sung Min Park ◽  
Byung Jin Choi ◽  
Seung-Hyun Moon ◽  
Yong-Hyuk Kim

We propose three quality control (QC) techniques using machine learning that depend on the type of input data used for training. These include QC based on time series of a single weather element, QC based on time series in conjunction with other weather elements, and QC using spatiotemporal characteristics. We performed machine learning-based QC on each weather element of atmospheric data, such as temperature, acquired from seven types of IoT sensors and applied machine learning algorithms, such as support vector regression, on data with errors to make meaningful estimates from them. By using the root mean squared error (RMSE), we evaluated the performance of the proposed techniques. As a result, the QC done in conjunction with other weather elements had 0.14% lower RMSE on average than QC conducted with only a single weather element. In the case of QC with spatiotemporal characteristic considerations, the QC done via training with AWS data showed performance with 17% lower RMSE than QC done with only raw data.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 429
Author(s):  
Jose Emmanuel Chacón ◽  
Oldemar Rodríguez

This paper presents new approaches to fit regression models for symbolic internal-valued variables, which are shown to improve and extend the center method suggested by Billard and Diday and the center and range method proposed by Lima-Neto, E.A.and De Carvalho, F.A.T. Like the previously mentioned methods, the proposed regression models consider the midpoints and half of the length of the intervals as additional variables. We considered various methods to fit the regression models, including tree-based models, K-nearest neighbors, support vector machines, and neural networks. The approaches proposed in this paper were applied to a real dataset and to synthetic datasets generated with linear and nonlinear relations. For an evaluation of the methods, the root-mean-squared error and the correlation coefficient were used. The methods presented herein are available in the the RSDA package written in the R language, which can be installed from CRAN.


Sign in / Sign up

Export Citation Format

Share Document