scholarly journals Linear Attribute Projection and Performance Assessment for Signifying the Absenteeism at Work using Machine Learning

2019 ◽  
Vol 8 (3) ◽  
pp. 1262-1267

In recent times, with the technological advancement the industry and organization are transforming all their inflow and outflow operations into digital identity. At the outset, the name of the organization is also in the hands of the employee. One of the major needs of the employee in the working environment is to avail leave or vacation based on their family circumstances. Based on the health condition and need of the employee, the organization must extend their leave for the satisfaction of the employee. The performance of the employee is also predicted based on the working days in the organization. With this view, this paper attempts to analyze the performance of the employee and the number of working hours by using machine learning algorithms. The Absenteeism at work dataset from UCI machine learning Repository is used for prediction analysis. The prediction of absent hours is achieved in three ways. Firstly, the correlation between each of the dataset attributes are found and depicted as a histogram. Secondly, the top most high correlated features are identified which are directly fitted to the regression models like Linear regression, SRD regression, RANSAC regression, Ridge regression, Huber regression, ARD Regression, Passive Aggressive Regression and Theilson Regression. Thirdly, the Performance analysis is done by analyzing the performance metrics like Mean Squared Error, Mean Absolute Error, R2 Score, Explained Variance Score and Mean Squared Log Error. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Result shows that the Passive Aggressive Regression have achieved the effective prediction of number of absent hours with minimum MSE of 0.04, MAE of 0.16, EVS of 0.03, MSLE of 0.32 and reasonable R2 Score of 0.89.

Water ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 3369
Author(s):  
Jiyeong Hong ◽  
Seoro Lee ◽  
Gwanjae Lee ◽  
Dongseok Yang ◽  
Joo Hyun Bae ◽  
...  

For effective water management in the downstream area of a dam, it is necessary to estimate the amount of discharge from the dam to quantify the flow downstream of the dam. In this study, a machine learning model was constructed to predict the amount of discharge from Soyang River Dam using precipitation and dam inflow/discharge data from 1980 to 2020. Decision tree, multilayer perceptron, random forest, gradient boosting, RNN-LSTM, and CNN-LSTM were used as algorithms. The RNN-LSTM model achieved a Nash–Sutcliffe efficiency (NSE) of 0.796, root-mean-squared error (RMSE) of 48.996 m3/s, mean absolute error (MAE) of 10.024 m3/s, R of 0.898, and R2 of 0.807, showing the best results in dam discharge prediction. The prediction of dam discharge using machine learning algorithms showed that it is possible to predict the amount of discharge, addressing limitations of physical models, such as the difficulty in applying human activity schedules and the need for various input data.


Author(s):  
Gaurav Singh ◽  
Shivam Rai ◽  
Himanshu Mishra ◽  
Manoj Kumar

The prime objective of this work is to predicting and analysing the Covid-19 pandemic around the world using Machine Learning algorithms like Polynomial Regression, Support Vector Machine and Ridge Regression. And furthermore, assess and compare the performance of the varied regression algorithms as far as parameters like R squared, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. In this work, we have used the dataset available on Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. We have analyzed the covid19 cases from 22/1/2020 till now. We applied a supervised machine learning prediction model to forecast the possible confirmed cases for the next ten days.


Prediction of client behavior and their feedback remains as a challenging task in today’s world for all the manufacturing companies. The companies are struggling to increase their profit and annual turnover due to the lack of exact prediction of customer like and dislike. This leads to the accomplishment of machine learning algorithms for the prediction of customer demands. This paper attempts to identify the important features of the wine data set extracted from UCI Machine learning repository for the prediction of customer segment. The important features are extracted for the various ensembling methods like Ada boost regressor, Ada boost classifier, Random forest regressor, Extra Trees Regressor, Gradient booster regressor. The extracted feature importance of each of the ensembling methods is then fitted with logistic regression to analyze the performance. The same extracted feature importance of each of the ensembling methods are subjected to feature scaling and then fitted with logistic regression to analyze the performance. The Performance analysis is done with the performance metric such as Mean Squared error (MSE), Mean Absolute error (MAE), R2 Score, Explained Variance Score (EVS) and Mean Squared Log Error (MSLE). Experimental results shows that after applying feature scaling, the feature importance extracted from the Extra Tree Regressor is found to be effective with the MSE of 0.04, MAE of 0.03, R2 Score of 94%, EVS of 0.9 and MSLE of 0.01 as compared to other ensembling methods.


2020 ◽  
Vol 17 (9) ◽  
pp. 4703-4708
Author(s):  
K. Anitha Kumari ◽  
Avinash Sharma ◽  
S. Nivethitha ◽  
V. Dharini ◽  
V. Sanjith ◽  
...  

Electrical appliances most commonly consist of two electrical devices, namely, electrical motors and transformers. Typically, electrical motors are normally used in all sort of industrial purposes. Failures of such motors results in serious problems, such as overheat, shut down and even burnt, in their host systems. Thus, more attention have to be paid in detecting the outliers. In a similar way, to avoid the unexpected power reliability problems and system damages, the prediction of the failures in the transformers is expected to quantify the impacts. By predicting the failures, the lifetime of the transformers increases and unnecessary accidents is avoided. Therefore, this paper presents the detection of the outliers in electrical motors and failures in transformers using supervised machine learning algorithms. Machine learning techniques such as Support Vector Machine (SVM), Random Forest (RF) and regression techniques like Support Vector Regression (SVR), Polynomial Regression (PR) are used to analyze the use cases of different motor specifications. Evaluation and the efficiency of findings are proved by considering accuracy, precision, F-measure, and recall for motors. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE) and R-squared Error (R2) are considered as metrics for transformers. The proposed approach helps to identify the anomalies like vibration loss, copper loss and overheating in the industrial motor and to determine the abnormal functioning of the transformer that in turn leads to ascertain the lifetime. The proposed system analyses the behaviour of the electrical machines using the energy meter data and reports the outliers to users. It also analyses the abnormalities occurring in the transformer using the parameters involved in the degradation of the paper-oil insulation system and the voltage of operation as a whole leads to the predict the lifetime.


Energies ◽  
2020 ◽  
Vol 13 (20) ◽  
pp. 5420
Author(s):  
Alexandre Lucas ◽  
Konstantinos Pegios ◽  
Evangelos Kotsakis ◽  
Dan Clarke

The importance of price forecasting has gained attention over the last few years, with the growth of aggregators and the general opening of the European electricity markets. Market participants manage a tradeoff between, bidding in a lower price market (day-ahead), but with typically higher volume, or aiming for a lower volume market but with potentially higher returns (balance energy market). Companies try to forecast the extremes of revenues or prices, in order to manage risk and opportunity, assigning their assets in an optimal way. It is thought that in general, electricity markets have quasi-deterministic principles, rather than being based on speculation, hence the desire to forecast the price based on variables that can describe the outcome of the market. Many studies address this problem from a statistical approach or by performing multiple-variable regressions, but they very often focus only on the time series analysis. In 2019, the Loss of Load Probability (LOLP) was made available in the UK for the first time. Taking this opportunity, this study focusses on five LOLP variables (with different time-ahead estimations) and other quasi-deterministic variables, to explain the price behavior of a multi-variable regression model. These include base production, system load, solar and wind generation, seasonality, day-ahead price and imbalance volume contributions. Three machine-learning algorithms were applied to test for performance, Gradient Boosting (GB), Random Forest (RF) and XGBoost. XGBoost presented higher performance and so it was chosen for the implementation of the real time forecast step. The model returns a Mean Absolute Error (MAE) of 7.89 £/MWh, a coefficient of determination (R2 score) of 76.8% and a Mean Squared Error (MSE) of 124.74. The variables that contribute the most to the model are the Net Imbalance Volume, the LOLP (aggregated), the month and the De-rated margins (aggregated) with 28.6%, 27.5%, 14.0%, and 8.9% of weight on feature importance respectively.


2020 ◽  
Vol 5 (2) ◽  
pp. 183-186
Author(s):  
Ledisi Giok Kabari ◽  
Marcus B. Chigoziri ◽  
Joseph Eneotu

In this study, we discuss various machine learning algorithms and architectures suitable for the Nigerian Naira exchange rate forecast. Our analyses were focused on the exchange rates of the British Pounds, US Dollars and the Euro against the Naira. The exchange rate data was sourced from the Central Bank of Nigeria. The performances of the algorithms were evaluated using Mean Squared Error, Root Mean Squared Error, Mean Absolute Error and the coefficient of determination (R-Squared score). Finally, we compared the performances of these algorithms in forecasting the exchange rates.


2021 ◽  
pp. 1-15
Author(s):  
O. Basturk ◽  
C. Cetek

ABSTRACT In this study, prediction of aircraft Estimated Time of Arrival (ETA) is proposed using machine learning algorithms. Accurate prediction of ETA is important for management of delay and air traffic flow, runway assignment, gate assignment, collaborative decision making (CDM), coordination of ground personnel and equipment, and optimisation of arrival sequence etc. Machine learning is able to learn from experience and make predictions with weak assumptions or no assumptions at all. In the proposed approach, general flight information, trajectory data and weather data were obtained from different sources in various formats. Raw data were converted to tidy data and inserted into a relational database. To obtain the features for training the machine learning models, the data were explored, cleaned and transformed into convenient features. New features were also derived from the available data. Random forests and deep neural networks were used to train the machine learning models. Both models can predict the ETA with a mean absolute error (MAE) less than 6min after departure, and less than 3min after terminal manoeuvring area (TMA) entrance. Additionally, a web application was developed to dynamically predict the ETA using proposed models.


2021 ◽  
Vol 10 (4) ◽  
pp. 58-75
Author(s):  
Vivek Sen Saxena ◽  
Prashant Johri ◽  
Avneesh Kumar

Skin lesion melanoma is the deadliest type of cancer. Artificial intelligence provides the power to classify skin lesions as melanoma and non-melanoma. The proposed system for melanoma detection and classification involves four steps: pre-processing, resizing all the images, removing noise and hair from dermoscopic images; image segmentation, identifying the lesion area; feature extraction, extracting features from segmented lesion and classification; and categorizing lesion as malignant (melanoma) and benign (non-melanoma). Modified GrabCut algorithm is employed to generate skin lesion. Segmented lesions are classified using machine learning algorithms such as SVM, k-NN, ANN, and logistic regression and evaluated on performance metrics like accuracy, sensitivity, and specificity. Results are compared with existing systems and achieved higher similarity index and accuracy.


2021 ◽  
pp. 1-16
Author(s):  
Kevin Kloos

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.


2021 ◽  
Vol 35 (1) ◽  
pp. 11-21
Author(s):  
Himani Tyagi ◽  
Rajendra Kumar

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.


Sign in / Sign up

Export Citation Format

Share Document