scholarly journals Sales prediction on e-commerce platform, by using data mining model

2020 ◽  
Vol 5 (2) ◽  
pp. 60-76
Author(s):  
Stefana Janićijević ◽  
Đorđe Petrović ◽  
Miodrag Stefanović

In this paper we applied twinning algorithm for product that are sold via e-commerce platform. To establish relatively homogenous product groups that were on sale on this e-commerce platform during the last year, it was necessary to form predictive mathematical model. We determined set of relevant variables that will represent group attributes, and we applied K-means algorithm, Market Basket model and Vector Distance model. Based on analysis of basic and derived variables, fixed number of clusters was introduced. Silhouette index was used for the purposes of detecting whether these clusters are compact. Using these cluster separations, we created models that detect similar products, and try to analyze probability of sales for each product. Analysis results can be used for planning future sales campaigns, marketing expenses optimization, creation of new loyalty programs, and better understanding customer behavior in general.

2020 ◽  
Vol 7 (2) ◽  
pp. 51-57
Author(s):  
Willy Yunus ◽  
Ririn Ikana Desanti ◽  
Wella Wella

PD. Asia Agung Pontianak is the only official distributor of Ajinomoto in the West Kalimantan region. Every year this company needs to find out the amount of turnover that will be obtained in the coming year. Unfortunately, the company only makes predictions using the average income from each year which is very less accurate. This research is conduct to create visualizations and predictions using multiple linear regression methods to predict the turnover obtained in the coming year. Multiple linear regression is a regression analysis method that can use more than 2 variables in the prediction process which is divided into 2 parts, namely the dependent variable and the independent variable. The results obtained in this research are prediction results in 2019 using data from 2010 to 2018 as a basis. Prediction results show that the longer the data used the smaller the error rate obtained. The original data from the company is visualized using a dashboard on tableau software so that the data could be easier to analyze by the company.


2021 ◽  
Author(s):  
Helen J Mayfield ◽  
Colleen L Lau ◽  
Jane E Sinclair ◽  
Samuel J Brown ◽  
Andrew Baird ◽  
...  

Uncertainty surrounding the risk of developing and dying from Thrombosis and Thromobocytopenia Syndrome (TTS) associated with the AstraZeneca (AZ) COVID-19 vaccine may contribute to vaccine hesitancy. A model is urgently needed to combine and effectively communicate the existing evidence on the risks versus benefits of the AZ vaccine. We developed a Bayesian network to consolidate the existing evidence on risks and benefits of the AZ vaccine, and parameterised the model using data from a range of empirical studies, government reports, and expert advisory groups. Expert judgement was used to interpret the available evidence and determine the structure of the model, relevant variables, data to be included, and how these data were used to inform the model. The model can be used as a decision support tool to generate scenarios based on age, sex, virus variant and community transmission rates, making it a useful for individuals, clinicians, and researchers to assess the chances of different health outcomes. Model outputs include the risk of dying from TTS following the AZ COVID-19 vaccine, the risk of dying from COVID-19 or COVID-19-associated atypical severe blood clots under different scenarios. Although the model is focused on Australia, it can be easily adaptable to international settings by re-parameterising it with local data. This paper provides detailed description of the model-building methodology, which can used to expand the scope of the model to include other COVID-19 vaccines, booster doses, comorbidities and other health outcomes (e.g., long COVID) to ensure the model remains relevant in the face of constantly changing discussion on risks versus benefits of COVID-19 vaccination.


2020 ◽  
Vol 14 (2) ◽  
pp. 181-192
Author(s):  
Bo Wu ◽  
Yi Sun ◽  
Katsutoshi Yada

Abstract Studies based on the analysis of a new design of loyalty program, item-based loyalty programs (IBLPs), indicate that customers are more interested in item-based reward points than in traditional price discounts. However, we are still unaware of customer responses to the different point settings on IBLP items. This study investigates an analysis with Tobit II to explore IBLPs’ short-term (4 months) impact on customers’ purchase behaviors using data from two newly opened Japanese supermarket chains that have implemented this new IBLP program from the beginning. The results showed that different types of customers are differently affected by IBLPs, and that heavy customers are more inclined to purchase more items with more spending money than others. The results also indicated that customers’ purchase behaviors are affected by IBLPs’ different point levels. Moreover, to an IBLP with different points, the responses from different types of customers are different. The findings of this study have important guiding significance in IBLP design and marketing management.


2001 ◽  
Vol 32 (3) ◽  
pp. 189-200 ◽  
Author(s):  
Detlef Fetchenhauer ◽  
Gerben van der Vegt

Summary: This article investigates cross-country differences in economic growth rates from a psychological perspective. Based on social capital theory it is argued that 1) financial honesty and trust are positively correlated with each other when they are aggregated on a country level and that 2) a high level of financial honesty and trust in a given country reduces transaction costs and thus stimulates economic growth. Using data from the World-Value-Surveys in 1981 and 1990 these hypotheses are empirically confirmed. The influence of social capital (i.e., financial honesty and trust) on economic growth was robust and substantial even if a number of relevant variables like gross national product (GNP), urbanization, economic inequality or the proportion of agriculture in gross domestic product were controlled. Thus, it seems worthwhile for economic psychology to further explore the influence of psychological determinants (like trust and honesty) on macroeconomic variables like economic growth or wealth.


2017 ◽  
Vol 2605 (1) ◽  
pp. 99-108 ◽  
Author(s):  
Long Cheng ◽  
Xuewu Chen ◽  
William H. K. Lam ◽  
Shuo Yang ◽  
Pengfei Wang

Low-income residents can depend on fewer travel options and have restricted mobility. This paper analyzes low-income commuters’ mode choice behavior by using data from an activity-based travel survey in Fushun, China. An integrated choice and latent variable model is presented. The model uses the following latent attitudes: comfort, convenience, reliability, flexibility, safety, and environmental preferences. The inclusion of attitudes captures unobserved heterogeneity of the choice process with a better understanding of travel demands. Postestimation of the integrated model is applied to assess the responsiveness of preferences for various transportation modes to changes in policy-relevant variables. This assessment is done by calculating the elasticity and marginal effects of choice probabilities for the relevant attributes of travel preferences. The analysis indicates that individuals with high comfort preferences care more about walking environment, and they need solutions to enhance their walking experience. However, travelers preferring reliability are more likely to travel by public transit, and measures to inform commuters of real-time bus operation information were proposed. Commuters who emphasize environmental preference are more apt to cycle; therefore, probike strategies are recommended. Results of the analysis indicate that different actions should be taken to serve different preferences. The findings should be useful information for policy makers and transportation planners wanting to improve low-income commuters’ travel quality.


2020 ◽  
Vol 5 (11) ◽  
Author(s):  
Ummul Hairah ◽  

Data mining clustering technique is used to classify the level of beginner voters using the K-Means method. Fixed voter clusters are used for decision making for stakeholders regarding information on beginner voters in each district and sub-district. The error calculation method is used to measure the level of error value for each distance calculation used, the distance calculation method used ie Euclidean, Manhattan, and Minkowski Distance with the Means Square Error (MSE) approach to measure the level of the error value. The calculation results show that the lowest error occurs in the calculation of the Minkowski Distance model 3 cluster, where the error rate is 11%, while the highest error rate occurs in the calculation of the Manhattan Distance model 5 cluster, which is 38%.


1981 ◽  
Vol 20 (04) ◽  
pp. 213-216 ◽  
Author(s):  
K. Ulm ◽  
E. Sauer ◽  
H. Sebening

A method is presented which allows a stepwise selection of relevant variables for a diagnosis and also a sequential allocation bearing in mind the time-sequence and the expense in recording the variables. The method is based on an information-theoretical approach and is suitable for the application of qualitative variables. The method is presented using data concerning patients with suspected coronary artery disease, taking into consideration the fact that the variables are observed at different times.


2019 ◽  
Vol 2019 ◽  
pp. 1-15 ◽  
Author(s):  
Shouwen Ji ◽  
Xiaojing Wang ◽  
Wenpeng Zhao ◽  
Dong Guo

Sales forecasting is even more vital for supply chain management in e-commerce with a huge amount of transaction data generated every minute. In order to enhance the logistics service experience of customers and optimize inventory management, e-commerce enterprises focus more on improving the accuracy of sales prediction with machine learning algorithms. In this study, a C-A-XGBoost forecasting model is proposed taking sales features of commodities and tendency of data series into account, based on the XGBoost model. A C-XGBoost model is first established to forecast for each cluster of the resulting clusters based on two-step clustering algorithm, incorporating sales features into the C-XGBoost model as influencing factors of forecasting. Secondly, an A-XGBoost model is used to forecast the tendency with the ARIMA model for the linear part and the XGBoost model for the nonlinear part. The final results are summed by assigning weights to forecasting results of the C-XGBoost and A-XGBoost models. By comparison with the ARIMA, XGBoost, C-XGBoost, and A-XGBoost models using data from Jollychic cross-border e-commerce platform, the C-A-XGBoost is proved to outperform than other four models.


Heart ◽  
2018 ◽  
Vol 104 (18) ◽  
pp. 1492-1499 ◽  
Author(s):  
David T Linker ◽  
Tasha B Murphy ◽  
Ali H Mokdad

ObjectiveAtrial fibrillation can lead to stroke if untreated, and identifying those at higher risk is necessary for cost-effective screening for asymptomatic, paroxysmal atrial fibrillation. Age has been proposed to identify those at risk, but risk models may provide better discrimination. This study compares atrial fibrillation risk models with age for screening for atrial fibrillation.MethodsNine atrial fibrillation risk models were compared using the Atherosclerosis Risk in Communities study (11 373 subjects, 60.0±5.7 years old). A new risk model (Screening for Asymptomatic Atrial Fibrillation Events—SAAFE) was created using data collected in the Monitoring Disparities in Chronic Conditions study (3790 subjects, 58.9±15.3 years old). The primary measure was the fraction of incident atrial fibrillation subjects who should receive treatment due to a high CHA2DS2-VASc score identified when screening a fixed number equivalent to the age criterion. Secondary measures were the C statistic and net benefit.ResultsFive risk models were significantly better than age. Age identified 71 (61%) of the subjects at risk for stroke who subsequently developed atrial fibrillation, while the best risk model identified 96 (82%). The newly developed SAAFE model identified 95 (81%), primarily based on age, congestive heart failure and coronary artery disease.ConclusionsUse of a risk model increases identification of subjects at risk for atrial fibrillation. One of the best performing models (SAAFE) does not require an ECG for its application, so that it could be used instead of age as a screening criterion without adding to the cost.


Sign in / Sign up

Export Citation Format

Share Document