Attacking Data Transforming Learners at Training Time

While machine learning systems are known to be vulnerable to data-manipulation attacks at both training and deployment time, little is known about how to adapt attacks when the defender transforms data prior to model estimation. We consider the setting where the defender Bob first transforms the data then learns a model from the result; Alice, the attacker, perturbs Bob’s input data prior to him transforming it. We develop a general-purpose “plug and play” framework for gradient-based attacks based on matrix differentials, focusing on ordinary least-squares linear regression. This allows learning algorithms and data transformations to be paired and composed arbitrarily: attacks can be adapted through the use of the chain rule—analogous to backpropagation on neural network parameters—to compositional learning maps. Bestresponse attacks can be computed through matrix multiplications from a library of attack matrices for transformations and learners. Our treatment of linear regression extends state-ofthe-art attacks at training time, by permitting the attacker to affect both features and targets optimally and simultaneously. We explore several transformations broadly used across machine learning with a driving motivation for our work being autogressive modeling. There, Bob transforms a univariate time series into a matrix of observations and vector of target values which can then be fed into standard learners. Under this learning reduction, a perturbation from Alice to a single value of the time series affects features of several data points along with target values.

Download Full-text

Prediction and Forecasting of Air Quality Index in Chennai using Regression and ARIMA time series models

Journal of Engineering Research ◽

10.36909/jer.10253 ◽

2021 ◽

Vol 9 ◽

Author(s):

Geetha Mani ◽

◽

Joshi Kumar Viswanadhapalli ◽

Albert Alexander Stonie ◽

◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Air Quality ◽

Linear Regression ◽

Quality Index ◽

Air Quality Index ◽

Model Parameters ◽

Sensor Output ◽

Model Accuracy ◽

Life On Earth

Air is one of the most fundamental constituents for the sustenance of life on earth. The meteorological, traffic factors, consumption of non-renewable energy sources, and industrial parameters are steadily increasing air pollution. These factors affect the welfare and prosperity of life on earth; therefore, the nature of air quality in our environment needs to be monitored continuously. The Air Quality Index (AQI), which indicates air quality, is influenced by several individual factors such as the accumulation of NO2, CO, O3, PM2.5, SO2, and PM10. This research paper aims to predict and forecast the AQI with Machine Learning (ML) techniques, namely linear regression and time series analysis. Primarily,Multi Linear Regression (MLR) model, supervised machine learning, is developed to predict AQI. NO2, Ozone(O3), PM 2.5, and SO2 sensor output collected from Central Pollution Control Board (CPCB) – Chennai region, India feed as input features and optimized AQI calculated from sensor's output set as a target to train the regression model. The obtained model parameters are validated with new and unseen sensor output. The Key Performance Indices(KPI) like co-efficient of determination, root mean square error and mean absolute error were calculated to validate the model accuracy. The K-cross-fold validation for testing data of MLR was obtained as around 92%. Secondly, the Auto-Regressive Integrated Moving Average (ARIMA) time series model is applied to forecast the AQI. The obtained model parameters were validated with unseen data with a timestamp. The forecasted AQI value of the next 15 days lies in a 95 % confidence interval zone. The model accuracy of test data was obtained as more than 80%.

Download Full-text

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI VOLUME EKSPOR BIJI KAKAO INDONESIA KE MALAYSIA

AGRIBUSINESS JOURNAL ◽

10.15408/aj.v13i1.11871 ◽

2019 ◽

Vol 13 (1) ◽

pp. 37-58

Author(s):

Ilma Yuni Rosita ◽

Lilis Imamah Ichdayati ◽

Rizki Adi Puspita Sari

Keyword(s):

Time Series ◽

Linear Regression ◽

Multiple Linear Regression ◽

Least Squares ◽

International Market ◽

Ordinary Least Squares ◽

Cocoa Beans ◽

Significance Level ◽

The Real ◽

Analyze Time Series

This study aims to analyze the factors that affect the volume of Indonesian cocoa exports to Malaysia. Multiple linear regression and ordinary least squares (OLS) were employed to analyze time series of data from 2005 until 2013. Based on the analysis, it is obtained that factors that significantly effect the volume of Indonesian cocoa exports to Malaysia with a significance level (α) five percent are the real prices of Indonesian cocoa exports to Malaysia and the real prices of cocoa beans the international market.

Download Full-text

Analysis of psychometric data using statistical and machine learning methods

10.32920/ryerson.14665509 ◽

2021 ◽

Author(s):

Krishnapriya Subramanian

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Well Being ◽

Time Series Model ◽

Learning Methods ◽

Logistics Regression ◽

Machine Learning Methods ◽

Psychometric Data ◽

Simulation Results

The objective of this thesis is to analyse the psychometric data using statistical and machine learning methods. Psychological data are analysed to predict illness and injury of athletes. Regression technique, one of the statistical processes for estimating the relationship among variables is used as basis of this thesis. We apply the linear regression, time series and logistics regression to predict illness and well-being. Our linear regression simulation results are mainly used, to understand the data well. By reviewing the results of linear regression, time series model is developed which predicts sickness one day ahead. The predicted values of this time series model are continuous. However, logistic regression can be used, to provide a probabilistic approach to predict the future levels as a categorical value. Hence we have developed a binomial logistics regression model, when observation variable is the type of dichotomous. Our simulation results show that this prediction model performs well. Our empirical studies also show that our method can act as early warning system for athletes.

Download Full-text

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI VOLUME EKSPOR BIJI KAKAO INDONESIA KE MALAYSIA

AGRIBUSINESS JOURNAL ◽

10.15408/aj.v11i2.11842 ◽

2019 ◽

Vol 11 (2) ◽

pp. 161-182

Author(s):

Ilma Yuni Rosita ◽

Lilis Imamah Ichdayati ◽

Rizki Adi Puspita Sari

Keyword(s):

Time Series ◽

Linear Regression ◽

Multiple Linear Regression ◽

Least Squares ◽

International Market ◽

Ordinary Least Squares ◽

Cocoa Beans ◽

Significance Level ◽

The Real ◽

Analyze Time Series

Download Full-text

Traditional vs. Machine-Learning Methods for Forecasting Sandy Shoreline Evolution Using Historic Satellite-Derived Shorelines

Remote Sensing ◽

10.3390/rs13050934 ◽

2021 ◽

Vol 13 (5) ◽

pp. 934

Author(s):

Floris Calkoen ◽

Arjen Luijendijk ◽

Cristian Rodriguez Rivero ◽

Etienne Kras ◽

Fedor Baart

Keyword(s):

Machine Learning ◽

Time Series ◽

Mean Squared Error ◽

Computation Time ◽

Ordinary Least Squares ◽

Anthropogenic Pressures ◽

Time Series Forecast ◽

Shoreline Evolution ◽

Shoreline Prediction ◽

Probabilistic Machine Learning

Forecasting shoreline evolution for sandy coasts is important for sustainable coastal management, given the present-day increasing anthropogenic pressures and a changing future climate. Here, we evaluate eight different time-series forecasting methods for predicting future shorelines derived from historic satellite-derived shorelines. Analyzing more than 37,000 transects around the globe, we find that traditional forecast methods altogether with some of the evaluated probabilistic Machine Learning (ML) time-series forecast algorithms, outperform Ordinary Least Squares (OLS) predictions for the majority of the sites. When forecasting seven years ahead, we find that these algorithms generate better predictions than OLS for 54% of the transect sites, producing forecasts with, on average, 29% smaller Mean Squared Error (MSE). Importantly, this advantage is shown to exist over all considered forecast horizons, i.e., from 1 up to 11 years. Although the ML algorithms do not produce significantly better predictions than traditional time-series forecast methods, some proved to be significantly more efficient in terms of computation time. We further provide insight in how these ML algorithms can be improved so that they can be expected to outperform not only OLS regression, but also the traditional time-series forecast methods. These forecasting algorithms can be used by coastal engineers, managers, and scientists to generate future shoreline prediction at a global level and derive conclusions thereof.

Download Full-text

Least Square Regression for Prediction Problems in Machine Learning using R

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.17612 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 960

Author(s):

Anila. M ◽

G Pradeepini

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Ordinary Least Squares ◽

Least Square ◽

Least Squares Regression ◽

Ols Regression ◽

Least Square Regression ◽

Psychology And Economics ◽

Prediction Technique ◽

Prediction Problems

The most commonly used prediction technique is Ordinary Least Squares Regression (OLS Regression). It has been applied in many fields like statistics, finance, medicine, psychology and economics. Many people, specially Data Scientists using this technique know that it has not gone with enough training to apply it and should be checked why & when it can or can’t be applied.It’s not easy task to find or explain about why least square regression [1] is faced much criticism when trained and tried to apply it. In this paper, we mention firstly about fundamentals of linear regression and OLS regression along with that popularity of LS method, we present our analysis of difficulties & pitfalls that arise while OLS method is applied, finally some techniques for overcoming these problems.

Download Full-text

Multisensor crop yield estimation with machine learning

10.5194/egusphere-egu2020-21329 ◽

2020 ◽

Author(s):

Laura Martínez Ferrer ◽

Maria Piles ◽

Gustau Camps-Valls

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Crop Yield ◽

Gaussian Processes ◽

Vegetation Index ◽

Maximum Temperature ◽

Yield Estimation ◽

Study Results ◽

Non Linear

Providing accurate and spatially resolved predictions of crop yield is of utmost importance due to the rapid increase in the demand of biofuels and food in the foreseeable future. Satellite based remote sensing over agricultural areas allows monitoring crop development through key bio-geophysical variables such as the Enhanced Vegetation Index (EVI), sensitive to canopy greenness, the Vegetation Optical Depth (VOD), sensitive to biomass water-uptake dynamics, and Soil Moisture (SM), which provides direct information of plant available water. The aim of this work is to implement an automatic system for county-based crop yield estimation using time series from multisource satellite observations, meteorological data and available in situ surveys as supporting information. The spatio-temporal resolution of satellite and meteorological observations are fully exploited and synergistically combined for crop yield prediction using machine learning models. Linear and non-linear regression methods are used: least squares, LASSO, random forests, kernel machines and Gaussian processes. Here we are not only interested in the prediction skill, but also on understanding the relative relevance of the covariates. For this, we first study the importance of each feature separately and then propose a global model for operational monitoring of crop status using the most relevant agro-ecological drivers.&#160;We selected the Continental U.S. and a four-year time series dataset to perform the research study. Results reveal that the three satellite variables are complementary and that their combination with maximum temperature and precipitation from meteorological stations provides the best estimations. Interestingly, adding information about crop planted area also improved the predictions. A non-linear regression model based on Gaussian processes led to best results for all considered crops (soybean, corn and wheat), with high accuracy (low bias and correlation coefficients ranging from 0.75 to 0.92). The feature ranking allowed understanding the main drivers for crop monitoring and the underlying factors behind a prediction loss or gain.

Download Full-text

Analysis of psychometric data using statistical and machine learning methods

10.32920/ryerson.14665509.v1 ◽

2021 ◽

Author(s):

Krishnapriya Subramanian

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Well Being ◽

Time Series Model ◽

Learning Methods ◽

Logistics Regression ◽

Machine Learning Methods ◽

Psychometric Data ◽

Simulation Results

Download Full-text

Differentially Private Ordinary Least Squares

Journal of Privacy and Confidentiality ◽

10.29012/jpc.654 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Or Sheffet

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Least Squares ◽

Confidence Intervals ◽

Null Hypothesis ◽

Ordinary Least Squares ◽

Label Prediction ◽

Real Value ◽

T Values ◽

True Correlation

Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives t-values - representing the likelihood of each real value to be the true correlation. Using t-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is non-zero.Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of t-values; secondly, when JLT approximates Ridge regression (linear regression with l2-regularization) we derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm of Dwork et al (STOC 2014).

Download Full-text

Short-Term Electricity Generation Forecasting Using Machine Learning Algorithms: A Case Study of the Benin Electricity Community (C.E.B)

TH Wildau Engineering and Natural Sciences Proceedings ◽

10.52825/thwildauensp.v1i.25 ◽

2021 ◽

Vol 1 ◽

Author(s):

Agbassou Guenoupkati ◽

Adekunlé Akim Salami ◽

Mawugno Koffi Kodjo ◽

Kossi Napo

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Performance Metrics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Machine Learning Models

Time series forecasting in the energy sector is important to power utilities for decision making to ensure the sustainability and quality of electricity supply, and the stability of the power grid. Unfortunately, the presence of certain exogenous factors such as weather conditions, electricity price complicate the task using linear regression models that are becoming unsuitable. The search for a robust predictor would be an invaluable asset for electricity companies. To overcome this difficulty, Artificial Intelligence differs from these prediction methods through the Machine Learning algorithms which have been performing over the last decades in predicting time series on several levels. This work proposes the deployment of three univariate Machine Learning models: Support Vector Regression, Multi-Layer Perceptron, and the Long Short-Term Memory Recurrent Neural Network to predict the electricity production of Benin Electricity Community. In order to validate the performance of these different methods, against the Autoregressive Integrated Mobile Average and Multiple Regression model, performance metrics were used. Overall, the results show that the Machine Learning models outperform the linear regression methods. Consequently, Machine Learning methods offer a perspective for short-term electric power generation forecasting of Benin Electricity Community sources.

Download Full-text