scholarly journals Revenue-based attribution modeling for online advertising

2018 ◽  
Vol 61 (2) ◽  
pp. 195-209 ◽  
Author(s):  
Kaifeng Zhao ◽  
Seyed Hanif Mahboobi ◽  
Saeed R Bagheri

This article examines and proposes several attribution models that quantify how revenue should be attributed to online advertising inputs. We adopt and further develop relative importance methods, which are based on regression models that have been extensively studied and utilized to investigate the relationship between advertising efforts and market reaction (revenue). The relative importance methods aim at decomposing and allocating marginal contributions to the coefficient of determination ( R2) of the regression models as attribution values. In particular, we adopt two alternative submethods to perform this decomposition: dominance analysis and relative weight analysis. Moreover, we demonstrate an extension of the decomposition methods from standard linear models to additive models. We claim that our new approaches are more flexible and accurate in modeling the underlying relationship and quantifying the attribution values. We use simulation examples to demonstrate the superior performance of our new approaches to traditional methods. We further illustrate the value of our proposed approaches using a real advertising campaign data set.

Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 123
Author(s):  
María Jaenada ◽  
Leandro Pardo

Minimum Renyi’s pseudodistance estimators (MRPEs) enjoy good robustness properties without a significant loss of efficiency in general statistical models, and, in particular, for linear regression models (LRMs). In this line, Castilla et al. considered robust Wald-type test statistics in LRMs based on these MRPEs. In this paper, we extend the theory of MRPEs to Generalized Linear Models (GLMs) using independent and nonidentically distributed observations (INIDO). We derive asymptotic properties of the proposed estimators and analyze their influence function to asses their robustness properties. Additionally, we define robust Wald-type test statistics for testing linear hypothesis and theoretically study their asymptotic distribution, as well as their influence function. The performance of the proposed MRPEs and Wald-type test statistics are empirically examined for the Poisson Regression models through a simulation study, focusing on their robustness properties. We finally test the proposed methods in a real dataset related to the treatment of epilepsy, illustrating the superior performance of the robust MRPEs as well as Wald-type tests.


2018 ◽  
Vol 34 (3) ◽  
pp. 323-334
Author(s):  
Nadya Mincheva ◽  
Mitko Lalev ◽  
Magdalena Oblakova ◽  
Pavlina Hristakieva

The prediction of chicks? weight before hatching is an important element of selection, aimed at improving the uniformity rate and productivity of birds. With this regards, our goal was to develop and evaluate optimum models for similar prediction in two White Plymouth Rock chickens lines - line L and line K on the basis of the incubation egg weight and egg geometry characteristics - egg maximum breadth (B), egg length (L), geometric mean diameter (Dg), egg volume (V), egg surface area (S). A total of 280 eggs (140 from each line) laid by 40-weekold hens were randomly selected. Mean arithmetic values, standard deviations and coefficients of variation of studied parameters were determined for each line. Correlation coefficients between the weight of hatchlings and predictors were the highest for egg weight, geometric mean diameter, volume and surface area of eggs (r=0.731-0.779 for line L; r=0.802-0.819 for line ?). Nine linear regression models were developed and their accuracy evaluated. The regression equations of hatchlings? weight vs egg length had the lowest coefficient of determination (0.175 for line K and 0.291 for line L), but when egg length and breadth entered the model together, its value increased significantly up to 0.541 and 0.665 for lines L and K, respectively. The weight of day-old chicks from line L could be predicted with higher accuracy with a model involving egg surface area apart egg weight (ChW=0.513EW+0.282S - 10.345; R2=0.620). In line ? a more accurate prognosis was attained by adding egg breadth as an additional predictor to the weight in the model (ChW=0.587EW+0.566? - 19.853; R2=0.692). The study demonstrated that multiple linear regression models were more precise that single linear models.


2010 ◽  
Vol 62 (4) ◽  
pp. 875-882 ◽  
Author(s):  
A. Dembélé ◽  
J.-L. Bertrand-Krajewski ◽  
B. Barillon

Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.


2020 ◽  
pp. 1-25
Author(s):  
F. O. de Franca ◽  
G. S. I. Aldeia

Interaction-Transformation (IT) is a new representation for Symbolic Regression that reduces the space of solutions to a set of expressions that follow a specific structure. The potential of this representation was illustrated in prior work with the algorithm called SymTree. This algorithm starts with a simple linear model and incrementally introduces new transformed features until a stop criterion is met. While the results obtained by this algorithm were competitive with the literature, it had the drawback of not scaling well with the problem dimension. This paper introduces a mutation only Evolutionary Algorithm, called ITEA, capable of evolving a population of IT expressions. One advantage of this algorithm is that it enables the user to specify the maximum number of terms in an expression. In order to verify the competitiveness of this approach, ITEA is compared to linear, nonlinear and Symbolic Regression models from the literature. The results indicate that ITEA is capable of finding equal or better approximations than other Symbolic Regression models while being competitive to state-of-the-art non-linear models. Additionally, since this representation follows a specific structure, it is possible to extract the importance of each original feature of a data set as an analytical function, enabling us to automate the explanation of any prediction. In conclusion, ITEA is competitive when comparing to regression models with the additional benefit of automating the extraction of additional information of the generated models.


Plants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 697
Author(s):  
Juan Villacrés ◽  
Andrés Fuentes ◽  
Pedro Reszka ◽  
Fernando Auat Cheein

The vegetation indices derived from spectral reflectance have served as an indicator of vegetation’s biophysical and biochemical parameters. Some of these indices are capable of characterizing more than one parameter at a time. This study examines the feasibility of retrieving several spectral vegetation indices from a single index under the assumption that all these indices are correlated with water content. The models used are based on a linear regression adjusted with least squares. The spectral signatures of Eucalyptus globulus and Pinus radiata, which constitute 97.5% of the forest plantation in Valparaiso region in Chile, have been used to test and validate the proposed approach. The linear models were fitted with an independent data set from which their performance was assessed. The results suggest that from the Leaf Water Index, other spectral indices can be recovered with a root mean square error up to 0.02, a bias of 1.12%, and a coefficient of determination of 0.77. The latter encourages using a sensor with discrete wavelengths instead of a continuum spectrum to estimate the forestry’s essential parameters.


Author(s):  
Meghna Chakraborty ◽  
Md Shakir Mahmud ◽  
Timothy J. Gates ◽  
Subhrajit Sinha

Since the United States started grappling with the COVID-19 pandemic, with the highest number of confirmed cases and deaths in the world as of August 2020, most states have enforced travel restrictions resulting in drastic reductions in mobility and travel. However, the long-term implications of this crisis to mobility still remain uncertain. To this end, this study proposes an analytical framework that determines the most significant factors affecting human mobility in the United States during the early days of the pandemic. Particularly, the study uses least absolute shrinkage and selection operator (LASSO) regularization to identify the most significant variables influencing human mobility and uses linear regularization algorithms, including ridge, LASSO, and elastic net modeling techniques, to predict human mobility. State-level data were obtained from various sources from January 1, 2020 to June 13, 2020. The entire data set was divided into a training and a test data set, and the variables selected by LASSO were used to train models by the linear regularization algorithms, using the training data set. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that several factors, including the number of new cases, social distancing, stay-at-home orders, domestic travel restrictions, mask-wearing policy, socioeconomic status, unemployment rate, transit mode share, percent of population working from home, and percent of older (60+ years) and African and Hispanic American populations, among others, significantly influence daily trips. Moreover, among all models, ridge regression provides the most superior performance with the least error, whereas both LASSO and elastic net performed better than the ordinary linear model.


2021 ◽  
Author(s):  
Mzwakhe Magagula ◽  
Shaun Ramroop ◽  
Faustin Habyarimana

Abstract BackgroundChild malnutrition is perhaps the one of the main medical condition influencing general human wellbeing, mainly in non-industrial nations. The improvement of legitimate evaluations of malnutrition is one of the difficulties encountered by policymakers in numerous countries worldwide. In this manner, the current study was embraced with the essential goal of evaluating and determining all potential determinants of childhood malnutrition in Malawi, using the Demographic and Health Survey (DHS) data 2015/16. The study seeks to reveal some of the significant factors that are perpetuating the incidence of malnutrition in children of Malawi. It also designed to offer deeper insights on how the probability of being diagnosed with this medical condition (malnutrition) evolves across the different levels of the found significant factors.Methods The proportional odds (PO) model was the best model to utilize, motivated by the design of the current study's data set. The PO model is an alternative to conceptualize how the ordinal designed data can be sequentially into dichotomous groups without losing the ordinal nature of response variables. The model is an extension of logistic regression models with two outcomes, it is one of the best models to deal with ordinal response variable comprising of more than two categories. The PO model, as well as the logistic regression models are common classes of generalised linear models (GLMs) mostly used to model association between dependent variable and independent variables. ResultsThe observations derived from fitting the PO model on the Malawi DHS data to investigate risk factors associated with malnutrition (stunting) suggested that: the age of the child; birth type (singleton/multiple births), parents' level of education, household's type of resident; mother's age at the time of birth, mother's BMI, incident of diarrhoea in the last two weeks before the survey, are the most significant independent risk factors of malnutrition (stunting). ConclusionsAll the aforementioned risk factors are controllable, and they can be improved through intervention strategies. The policies that undergird the country are required to counteract this condition, as the majority of the risk factors need the coherent actions of several governing authorities.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Hai Nguyen ◽  
Patrick Y Jay ◽  
William Scott ◽  
Janos Molnar ◽  
Henry Huang ◽  
...  

Introduction: Various mathematical equations have been proposed to correct QT interval for heart rate (QTc). However, with most formulas, QTc remains dependent on heart rate (HR) especially at low and high HR values. Hypothesis: A spline correction function would perform better than standard mathematical formulas by allowing the data to determine the form of the relationship. Methods: A series of regression models using the generalized additive models for location, scale, and shape framework was applied to 10,000 completely normal electrocardiogram data from the National Health and Nutrition Examination Surveys II and III. Evaluation of the model’s performance was performed using the R 2 coefficient of determination and the root mean squared of the errors between the predicted and observed QTc. The new regression models were compared to the Bazett’s and Fredericia’s formulas, which are the 2 most widely used formulas in clinical practice. Results: When boxplots of the QTc for each formula are plotted, grouped by HR in intervals of 5 beats/minute, QTc determined by the penalized spline regression was almost heart rate independent for both male and female as the slope of the regression line was almost zero (-0.003 for female and -0.001 for male) (Figure 1 A—B). By contrast, QTc by Bazett had a positive correlation (regression slope of 0.86 for female and 0.89 for male) while Fredericia’s had a negative correlation (regression slope of -0.14 for female and -0.13 for male) with HR (Figure 1C—F). In all 3 formulas, there was no significant difference between male and female. Conclusions: A new QTc formula was developed, which is almost independent of HR thereby providing a more accurate estimate of QTc for clinical management. Automatic QTc calculation and percentiles estimation could easily be incorporated in an online calculator or app for easy integration in everyday clinical practice.


1969 ◽  
Vol 92 (3-4) ◽  
pp. 171-182
Author(s):  
Víctor H. Ramírez-Builes ◽  
Timothy G. Porch ◽  
Eric W. Harmsen

Plant leaf area is an important physiological trait, and direct, non-destructive methods for estimating leaf area have been shown to be effective while allowing for repeated plant sampling.The objective of this study was to evaluate direct, non-destructive leaflet measurements as predictors of actual leaflet area (LA), to test previously developed models, and to develop genotype-specific linear models for leaflet area estimation in common bean (Phaseolus vulgaris L.). For development of appropriate regression models for leaflet area estimation, four common bean genotypes were evaluated under greenhouse conditions: BAT 477, 'Morales', SER 16, and SER 21. The greenhouse-derived models were evaluated under field conditions. Previously developed models were tested and found to overestimate or underestimate leaflet area. Leaflet measurements included maximum leaflet width (W) and maximum leaflet length (L) and L X W. The measurements with the highest values for the coefficient of determination (R2) were W or L X W for BAT 477, SER 16, and Morales (0.97, 0.95, and 0.95, respectively), and L X W for SER 21 (R2 = 0.96). The linear models developed were shown to be effective and robust for predicting leaflet area under both greenhouse and field conditions during both vegetative and reproductive stages of plant development.


Web Ecology ◽  
2009 ◽  
Vol 9 (1) ◽  
pp. 58-67 ◽  
Author(s):  
D. Nogués-Bravo

Abstract. Multivariable regression models have been used extensively as spatial modelling tools. However, other regression approaches are emerging as more efficient techniques. This paper attempts to present a synthesis of Generalised Regression Models (Generalized Linear Models, GLMs, Generalized Additive Models, GAMs), and a Geographically Weighted Regression, GWR, implemented in a GAM, explaining their statistical formulations and assessing improvements in predictive accuracy compared with linear regressions. The problems associated with these approaches are also discussed. A digital database developed with Geographic Information Systems (GIS), including environmental maps and bird species richness distribution in northern Spain, is used for comparison of the techniques. GWR using splines has shown the highest improvement in accounted deviance when compared with traditional linear regression approach, followed by GAM and GLM.


Sign in / Sign up

Export Citation Format

Share Document