A new generic method to improve machine learning applications in official statistics

2021 ◽  
pp. 1-16
Author(s):  
Kevin Kloos

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Hye-Jin Kim ◽  
Sung Min Park ◽  
Byung Jin Choi ◽  
Seung-Hyun Moon ◽  
Yong-Hyuk Kim

We propose three quality control (QC) techniques using machine learning that depend on the type of input data used for training. These include QC based on time series of a single weather element, QC based on time series in conjunction with other weather elements, and QC using spatiotemporal characteristics. We performed machine learning-based QC on each weather element of atmospheric data, such as temperature, acquired from seven types of IoT sensors and applied machine learning algorithms, such as support vector regression, on data with errors to make meaningful estimates from them. By using the root mean squared error (RMSE), we evaluated the performance of the proposed techniques. As a result, the QC done in conjunction with other weather elements had 0.14% lower RMSE on average than QC conducted with only a single weather element. In the case of QC with spatiotemporal characteristic considerations, the QC done via training with AWS data showed performance with 17% lower RMSE than QC done with only raw data.


2009 ◽  
Vol 5 (4) ◽  
pp. 58-76
Author(s):  
Zoran Bosnic ◽  
Igor Kononenko

In machine learning, the reliability estimates for individual predictions provide more information about individual prediction error than the average accuracy of predictive model (e.g. relative mean squared error). Such reliability estimates may represent decisive information in the risk-sensitive applications of machine learning (e.g. medicine, engineering, and business), where they enable the users to distinguish between more and less reliable predictions. In the authors’ previous work they proposed eight reliability estimates for individual examples in regression and evaluated their performance. The results showed that the performance of each estimate strongly varies depending on the domain and regression model properties. In this paper they empirically analyze the dependence of reliability estimates’ performance on the data set and model properties. They present the results which show that the reliability estimates perform better when used with more accurate regression models, in domains with greater number of examples and in domains with less noisy data.


Author(s):  
Gaurav Singh ◽  
Shivam Rai ◽  
Himanshu Mishra ◽  
Manoj Kumar

The prime objective of this work is to predicting and analysing the Covid-19 pandemic around the world using Machine Learning algorithms like Polynomial Regression, Support Vector Machine and Ridge Regression. And furthermore, assess and compare the performance of the varied regression algorithms as far as parameters like R squared, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. In this work, we have used the dataset available on Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. We have analyzed the covid19 cases from 22/1/2020 till now. We applied a supervised machine learning prediction model to forecast the possible confirmed cases for the next ten days.


Author(s):  
Zoran Bosnic ◽  
Igor Kononenko

In machine learning, the reliability estimates for individual predictions provide more information about individual prediction error than the average accuracy of predictive model (e.g. relative mean squared error). Such reliability estimates may represent decisive information in the risk-sensitive applications of machine learning (e.g. medicine, engineering, and business), where they enable the users to distinguish between more and less reliable predictions. In the atuhors’ previous work they proposed eight reliability estimates for individual examples in regression and evaluated their performance. The results showed that the performance of each estimate strongly varies depending on the domain and regression model properties. In this paper they empirically analyze the dependence of reliability estimates’ performance on the data set and model properties. They present the results which show that the reliability estimates perform better when used with more accurate regression models, in domains with greater number of examples and in domains with less noisy data.


2020 ◽  
Vol 12 (5) ◽  
pp. 41-51
Author(s):  
Shaimaa Mahmoud ◽  
◽  
Mahmoud Hussein ◽  
Arabi Keshk

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.


Author(s):  
Alessio Pagani ◽  
Abhinav Mehrotra ◽  
Mirco Musolesi

Understanding and learning the characteristics of network paths has been of particular interest for decades and has led to several successful applications. Such analysis becomes challenging for urban networks as their size and complexity are significantly higher compared to other networks. The state-of-the-art machine learning techniques allow us to detect hidden patterns and, thus, infer the features associated with them. However, very little is known about the impact on the performance of such predictive models by the use of different input representations. In this paper, we design and evaluate six different graph input representations (i.e. representations of the network paths), by considering the network’s topological and temporal characteristics, for being used as inputs for machine learning models to learn the behavior of urban network paths. The representations are validated and then tested with a real-world taxi journeys dataset predicting the tips of using a road network of New York. Our results demonstrate that the input representations that use temporal information help the model to achieve the highest accuracy (root mean-squared error of 1.42$).


2020 ◽  
Vol 17 (9) ◽  
pp. 4703-4708
Author(s):  
K. Anitha Kumari ◽  
Avinash Sharma ◽  
S. Nivethitha ◽  
V. Dharini ◽  
V. Sanjith ◽  
...  

Electrical appliances most commonly consist of two electrical devices, namely, electrical motors and transformers. Typically, electrical motors are normally used in all sort of industrial purposes. Failures of such motors results in serious problems, such as overheat, shut down and even burnt, in their host systems. Thus, more attention have to be paid in detecting the outliers. In a similar way, to avoid the unexpected power reliability problems and system damages, the prediction of the failures in the transformers is expected to quantify the impacts. By predicting the failures, the lifetime of the transformers increases and unnecessary accidents is avoided. Therefore, this paper presents the detection of the outliers in electrical motors and failures in transformers using supervised machine learning algorithms. Machine learning techniques such as Support Vector Machine (SVM), Random Forest (RF) and regression techniques like Support Vector Regression (SVR), Polynomial Regression (PR) are used to analyze the use cases of different motor specifications. Evaluation and the efficiency of findings are proved by considering accuracy, precision, F-measure, and recall for motors. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE) and R-squared Error (R2) are considered as metrics for transformers. The proposed approach helps to identify the anomalies like vibration loss, copper loss and overheating in the industrial motor and to determine the abnormal functioning of the transformer that in turn leads to ascertain the lifetime. The proposed system analyses the behaviour of the electrical machines using the energy meter data and reports the outliers to users. It also analyses the abnormalities occurring in the transformer using the parameters involved in the degradation of the paper-oil insulation system and the voltage of operation as a whole leads to the predict the lifetime.


Author(s):  
Gausiya Momin ◽  
Trupti Ingle ◽  
Vaishnavi Mirajkar ◽  
A. A. Magar

Bitcoin is the most profitable in the cryptocurrency market. However, the prices of Bitcoin have highly fluctuated which makes them very difficult to predict. This research aims to discover the most efficient accuracy model to predict Bitcoin prices from various machine learning algorithms. Using one-minute interval trading data on the exchange website name is bit stamp from January 1, 2012, to January 8, 2018, some different regression models with sci-kit- learn and Keras libraries had experimented. The best results showed that the Mean Squared Error (MSE) was as low as 0.00002 and the R-Square (R2) was as high as 99.2 Percentage.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Thuy-Anh Nguyen ◽  
Hai-Bang Ly ◽  
Hai-Van Thi Mai ◽  
Van Quan Tran

Accurate prediction of the concrete compressive strength is an important task that helps to avoid costly and time-consuming experiments. Notably, the determination of the later-age concrete compressive strength is more difficult due to the time required to perform experiments. Therefore, predicting the compressive strength of later-age concrete is crucial in specific applications. In this investigation, an approach using a feedforward neural network (FNN) machine learning algorithm was proposed to predict the compressive strength of later-age concrete. The proposed model was fully evaluated in terms of performance and prediction capability over statistical results of 1000 simulations under a random sampling effect. The results showed that the proposed algorithm was an excellent predictor and might be useful for engineers to avoid time-consuming experiments with the statistical performance indicators, namely, the Pearson correlation coefficient (R), root-mean-squared error (RMSE), and mean squared error (MAE) for the training and testing parts of 0.9861, 2.1501, 1.5650 and 0.9792, 2.8510, 2.1361, respectively. The results also indicated that the FNN model was superior to classical machine learning algorithms such as random forest and Gaussian process regression, as well as empirical formulations proposed in the literature.


2020 ◽  
Vol 5 (2) ◽  
pp. 183-186
Author(s):  
Ledisi Giok Kabari ◽  
Marcus B. Chigoziri ◽  
Joseph Eneotu

In this study, we discuss various machine learning algorithms and architectures suitable for the Nigerian Naira exchange rate forecast. Our analyses were focused on the exchange rates of the British Pounds, US Dollars and the Euro against the Naira. The exchange rate data was sourced from the Central Bank of Nigeria. The performances of the algorithms were evaluated using Mean Squared Error, Root Mean Squared Error, Mean Absolute Error and the coefficient of determination (R-Squared score). Finally, we compared the performances of these algorithms in forecasting the exchange rates.


Sign in / Sign up

Export Citation Format

Share Document