Used Car Price Estimation: Moving from Linear Regression towards a New S-Curve Model

2021 ◽  
Vol 22 (3) ◽  
pp. 1174-1187
Author(s):  
Fadzilah Salim ◽  
Nur Azman Abu

A simple linear regression is commonly used as a practical predictive model on a used car price. It is a useful model which carry smaller prediction errors around its central mean. Practically, real data will hardly produce a linear relationship. A non-linear model has been observed to better forecast any price appreciation and manage prediction errors in real-life phenomena. In this paper, an S-curve model shall be proposed as an alternative non-linear model in estimating the price of used cars. A dynamic S-shaped Membership Function (SMF) is used as a basis to build an S-curve pricing model in this research study. Real used car price data has been collected from a popular website. Comparisons against linear regression and cubic regression are made. An S-curve model has produced smaller error than linear regression while its residual is closer to a cubic regression. Overall, an S-curve model is anticipated to provide a better and more practical estimate on used car prices in Malaysia.

2018 ◽  
Vol 7 (2.29) ◽  
pp. 912
Author(s):  
Fadzilah Salim ◽  
Nur Azman Abu

A simple linear regression model is useful in a prediction model. A general linear regression beyond a single independent variable is still not popular. A nonlinear regression can be easily produced a better predictive model but it is difficult to construct. The objective of this paper is to propose a technique for predicting the price of used cars in Malaysia using S-shaped curve model. In this paper, the S-shaped Membership Function [SMF] is used as the basis to develop a novel S-Regression model. Comparisons between linear regression, cubic regression and S-Regression have been made on the used car prices. The mean squared error of S-Regression model is found to be closer to cubic regression than the linear regression. S-Regression model is found to be quite suitable to represent the relationship between the price of a used car and the make year of a car. The result demonstrates that the S-Regression model gives better and practical estimate of the price of a used car in Malaysia.  


Agronomy ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 885
Author(s):  
Magdalena Piekutowska ◽  
Gniewko Niedbała ◽  
Tomasz Piskier ◽  
Tomasz Lenartowicz ◽  
Krzysztof Pilarski ◽  
...  

Yield forecasting is a rational and scientific way of predicting future occurrences in agriculture—the level of production effects. Its main purpose is reducing the risk in the decision-making process affecting the yield in terms of quantity and quality. The aim of the following study was to generate a linear and non-linear model to forecast the tuber yield of three very early potato cultivars: Arielle, Riviera, and Viviana. In order to achieve the set goal of the study, data from the period 2010–2017 were collected, coming from official varietal experiments carried out in northern and northwestern Poland. The linear model has been created based on multiple linear regression analysis (MLR), while the non-linear model has been built using artificial neural networks (ANN). The created models can predict the yield of very early potato varieties on 20th June. Agronomic, phytophenological, and meteorological data were used to prepare the models, and the correctness of their operation was verified on the basis of separate sets of data not participating in the construction of the models. For the proper validation of the model, six forecast error metrics were used: i.e., global relative approximation error (RAE), root mean square error (RMS), mean absolute error (MAE), and mean absolute percentage error (MAPE). As a result of the conducted analyses, the forecast error results for most models did not exceed 15% of MAPE. The predictive neural model NY1 was characterized by better values of quality measures and ex post forecast errors than the regression model RY1.


Predicting the true value of used cars requires lot of analysis. This prediction takes into account variables such as car model, fuel type, number of owner and so on. In this paper we are applying machine learning algorithms to determine the true value of cars when selling them to the dealers. We have used multiple linear regression model by dividing the data into training and test. Vehicle price forecast is both a critical and significant job, particularly when the car is used and does not come directly from the factory.


2021 ◽  
pp. 139-180
Author(s):  
Justin C. Touchon

Chapter 6 continues exploring the world of statistics that are covered within the linear model, namely two-way and three-way ANOVA, linear regression and analysis of covariance (ANCOVA). In each type of model, a detailed description of how to interpret the summary output is undertaken, including understanding how to interpret and plot interactions. Conducting post-hoc analyses and using the predict() function are also covered. The chapter ends by reinforcing earlier plotting skills in ggplot2 by walking through an example of making a professional looking figure with multiple non-linear regression curves and confidence intervals.


2010 ◽  
Vol 159 ◽  
pp. 595-598
Author(s):  
Xiang Hu Liu

Fitting of forecast function is very difficult and important in non-linear regression forecast problems. The accuracy is directly affected by the fitting of forecast function. Linear model replaced non-linear model in the traditional method is difficult to solve the problem when non-linear is stronger, and the result of fitting and forecast is not ideal. Functional network is a recently introduced extension of neural networks. It has certain advantages solving non-linear problems. Non-linear regression forecast model and learning algorithm based on functional networks is proposed in this article. Example about multi-variable non-linear regression forecast is provided. The simulation results demonstrate that forecast model based on Functional Networks whose accuracy of fitting and forecasting is more than some traditional methods have some value about theory and application.


2014 ◽  
pp. 287-296 ◽  
Author(s):  
T. WIBMER ◽  
K. DOERING ◽  
C. KROPF-SANCHEN ◽  
S. RÜDIGER ◽  
I. BLANTA ◽  
...  

Pulse transit time (PTT), the interval between ventricular electrical activity and peripheral pulse wave, is assumed to be a surrogate marker for blood pressure (BP) changes. The objective of this study was to analyze PTT and its relation to BP during cardiopulmonary exercise tests (CPET). In 20 patients (mean age 51±18.4 years), ECG and finger-photoplethysmography were continuously recorded during routine CPETs. PTT was calculated for each R-wave in the ECG and the steepest slope of the corresponding upstroke in the plethysmogram. For each subject, linear and non-linear regression models were used to assess the relation between PTT and upper-arm oscillometric BP in 9 predefined measuring points including measurements at rest, during exercise and during recovery. Mean systolic BP (sBP) and PTT at rest were 128 mm Hg and 366 ms respectively, 197 mm Hg and 289 ms under maximum exercise, and 128 mm Hg and 371 ms during recovery. Linear regression showed a significant, strong negative correlation between PTT and sBP. The correlation between PTT and diastolic BP was rather weak. Bland-Altman plots of sBP values estimated by the regression functions revealed slightly better limits of agreements for the non-linear model (–10.9 to 10.9 mm Hg) than for the linear model (−13.2 to 13.1 mm Hg). These results indicate that PTT is a good potential surrogate measure for sBP during exercise and could easily be implemented in CPET as an additional parameter of cardiovascular reactivity. A non-linear approach might be more effective in estimating BP than linear regression.


1970 ◽  
Vol 41 (1) ◽  
pp. 58-64
Author(s):  
Sudipa Sarker ◽  
Mahbub Hossain

Linear Regression is often used for predicting the initial parameters of the forecasting models. But if the underlying demand model is not linear, linear regression does not produce optimal values of these parameters. Again for a novice user predicting the smoothing constants for level and trend demand forecast is not easy and recommended values of these constants may result in larger forecast errors. In this paper real life data of a pharmaceutical company is used to show that forecasting accuracy greatly improves with the non linear optimization of the smoothing constant. It is done using an EXCEL solver where the solver tries to optimize and find the values of the smoothing constants by minimizing the mean square error (MSE).Key Words: Non-Linear Optimization; Smoothing Constant; Trend Demand.DOI: 10.3329/jme.v41i1.5363Journal of Mechanical Engineering, Vol. ME 41, No. 1, June 2010 58-64


Author(s):  
Matthew Kerin ◽  
Jonathan Marchini

Abstract Motivation Gene-environment (GxE) interactions are one of the least studied aspects of the genetic architecture of human traits and diseases. The environment of an individual is inherently high dimensional, evolves through time and can be expensive and time consuming to measure. The UK Biobank study, with all 500,000 participants having undergone an extensive baseline questionnaire, represents a unique opportunity to assess GxE heritability for many traits and diseases in a well powered setting. Results We have developed a randomized Haseman-Elston non-linear regression method applicable when many environmental variables have been measured on each individual. The method (GPLEMMA) simultaneously estimates a linear environmental score (ES) and its GxE heritability. We compare the method via simulation to a whole-genome regression approach (LEMMA) for estimating GxE heritability. We show that GPLEMMA is more computationally efficient than LEMMA on large datasets, and produces results highly correlated with those from LEMMA when applied to simulated data and real data from the UK Biobank. Availability Software implementing the GPLEMMA method is available from https://jmarchini.org/gplemma/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Chao-Yu Guo ◽  
Xing-Yi Huang ◽  
Pei-Cheng Kuo ◽  
Yi-Hau Chen

AbstractThe effects of meteorological factors on health outcomes have gained popularity due to climate change, resulting in a general rise in temperature and abnormal climatic extremes. Instead of the conventional cross-sectional analysis that focuses on the association between a predictor and the single dependent variable, the distributed lag non-linear model (DLNM) has been widely adopted to examine the effect of multiple lag environmental factors health outcome. We propose several novel strategies to model mortality with the effects of distributed lag temperature measures and the delayed effect of mortality. Several attempts are derived by various statistical concepts, such as summation, autoregressive, principal component analysis, baseline adjustment, and modeling the offset in the DLNM. Five strategies are evaluated by simulation studies based on permutation techniques. The longitudinal climate and daily mortality data in Taipei, Taiwan, from 2012 to 2016 were implemented to generate the null distribution. According to simulation results, only one strategy, named MVDLNM, could yield valid type I errors, while the other four strategies demonstrated much more inflated type I errors. With a real-life application, the MVDLNM that incorporates both the current and lag mortalities revealed a more significant association than the conventional model that only fits the current mortality. The results suggest that, in public health or environmental research, not only the exposure may post a delayed effect but also the outcome of interest could provide the lag association signals. The joint modeling of the lag exposure and the delayed outcome enhances the power to discover such a complex association structure. The new approach MVDLNM models lag outcomes within 10 days and lag exposures up to 1 month and provide valid results.


Sign in / Sign up

Export Citation Format

Share Document