Exploring energy-saving refrigerators through online e-commerce reviews: an augmented mining model based on machine learning methods

Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yuyan Luo ◽  
Zheng Yang ◽  
Yuan Liang ◽  
Xiaoxu Zhang ◽  
Hong Xiao

PurposeBased on climate issues and carbon emissions, this study aims to promote low-carbon consumption and compel consumers to actively shift to energy-saving appliances. In this big data era, online reviews in social and electronic commerce (e-commerce) websites contain valuable product information, which can facilitate firm business strategies and consumer comparison shopping. This study is designed to advance existing research on energy-saving refrigerators by incorporating machine learning models in the analysis of online reviews to provide valuable suggestions to e-commerce platform managers and manufacturers to effectively understand the psychological cognition of consumers.Design/methodology/approachThis study proposes an online e-commerce review mining and management strategy model based on “data acquisition and cleaning, data mining and analysis and strategy formation” through multiple machine learning methods, namely, Bayes networks, support vector machine (SVM), latent Dirichlet allocation (LDA) and importance–performance analysis (IPA), to help managers.FindingsBased on a case study of one of the largest e-commerce platforms in China, this study linguistically analyzes 29,216 online reviews of energy-saving refrigerators. Results indicate that the energy-saving refrigerator features that consumers are generally satisfied with are, in sequential order, logistics, function, price, outlook, after-sales service, brand, quality and space. This study also identifies ten topics with 100 keywords by analyzing 18 different refrigerator models. Finally, based on the IPA, this study allocates different priorities to the features and provides suggestions from the perspective of consumers, the government and manufacturers.Research limitations/implicationsIn terms of limitations, future research may focus on the following points. First, the topics identified in this study derive from specific points in time and reviews; thus, the topics may change with the text data. A machine learning-based online review analysis platform could be developed in the future to dynamically improve consumer satisfaction. Moreover, given that consumers' needs may change over time, e-commerce platform types and consumer characteristics, such as user profiles, can be incorporated into the model to effectively analyze trends in consumers' perceived dimensions.Originality/valueThis study fills the gap in previous research in this field, which uses small-sample data for qualitative analysis, while integrating management ideas and proposes an online e-commerce review mining and management strategy model based on machine learning methods. Moreover, this study considers how consumers' emotional and thematic preferences for products affect their purchase decision-making from the perspective of their psychological perception and linguistically analyzes online reviews of energy-saving refrigerators using the proposed mining model. Through the improved IPA model, this study provides optimizing strategies to help e-commerce platform managers and manufacturers.

2022 ◽  
Vol 14 (1) ◽  
pp. 226
Author(s):  
Qianyi Gu ◽  
Yang Han ◽  
Yaping Xu ◽  
Haiyan Yao ◽  
Haofang Niu ◽  
...  

Currently, soil salinization is a serious problem affecting agricultural production and human settlements. Remote sensing techniques have the advantages of a large monitoring range, rapid acquisition of information, implementation of dynamic monitoring, and low impact on the ground surface. Over the past two decades, many semi-empirical bidirectional polarized distribution function (BPDF) models have been proposed to accurately calculate the polarized reflectance (Rp) on the soil surface. Although there have been some studies on the BPDF model based on traditional machine learning methods, there is a lack of research on the BPDF model based on deep learning, especially using laboratory measurement spectrum data as the processing object, with limited research results. In this paper, we collected saline-alkaline soil in the field as the observation object and measured the Rp at multiple angles in the laboratory environment. We used semi-empirical models (the Nadal–Bréon model, Litvinov model, and Xie–Cheng model) and machine learning methods (support vector regression, random forest, and deep neural networks regression) to simulate and predict the surface Rp of saline-alkaline soils and compare them with experimental results. The measured values of the laboratory are compared and fitted, and the root mean squared error, R-squared, and correlation coefficient are calculated to express the prediction effect. The results show that the predictions of the BPDF model based on machine learning methods are generally better than those of the semi-empirical BPDF model, which is improved by 3.06% at 670 nm and 19.75% at 865 nm. The results of this study also provide new ideas and methods based on deep learning for the prediction of Rp on the surface of saline-alkaline soils.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1113
Author(s):  
Ming Zhong ◽  
Yajin Zhou ◽  
Gang Chen

IoT plays an important role in daily life; commands and data transfer rapidly between the servers and objects to provide services. However, cyber threats have become a critical factor, especially for IoT servers. There should be a vigorous way to protect the network infrastructures from various attacks. IDS (Intrusion Detection System) is the invisible guardian for IoT servers. Many machine learning methods have been applied in IDS. However, there is a need to improve the IDS system for both accuracy and performance. Deep learning is a promising technique that has been used in many areas, including pattern recognition, natural language processing, etc. The deep learning reveals more potential than traditional machine learning methods. In this paper, sequential model is the key point, and new methods are proposed by the features of the model. The model can collect features from the network layer via tcpdump packets and application layer via system routines. Text-CNN and GRU methods are chosen because the can treat sequential data as a language model. The advantage compared with the traditional methods is that they can extract more features from the data and the experiments show that the deep learning methods have higher F1-score. We conclude that the sequential model-based intrusion detection system using deep learning method can contribute to the security of the IoT servers.


Computation ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 139
Author(s):  
Olga Kochueva ◽  
Kirill Nikolskii

Predictive emission monitoring systems (PEMS) are software solutions for the validation and supplementation of costly continuous emission monitoring systems for natural gas electrical generation turbines. The basis of PEMS is that of predictive models trained on past data to estimate emission components. The gas turbine process dataset from the University of California at Irvine open data repository has initiated a challenge of sorts to investigate the quality of models of various machine learning methods to build a model for predicting CO and NOx emissions depending on ambient variables and the parameters of the technological process. The novelty and features of this paper are: (i) a contribution to the study of the features of the open dataset on CO and NOx emissions for gas turbines, which will enable one to more objectively compare different machine learning methods for further research; (ii) for the first time for the CO and NOx emissions, a model based on symbolic regression and a genetic algorithm is presented—the advantage of this being the transparency of the influence of factors and the interpretability of the model; (iii) a new classification model based on the symbolic regression model and fuzzy inference system is proposed. The coefficients of determination of the developed models are: R2=0.83 for NOx emissions, R2=0.89 for CO emissions.


2017 ◽  
Vol 19 (1/2) ◽  
pp. 65-93 ◽  
Author(s):  
Samira Khodabandehlou ◽  
Mahmoud Zivari Rahman

Purpose This paper aims to provide a predictive framework of customer churn through six stages for accurate prediction and preventing customer churn in the field of business. Design/methodology/approach The six stages are as follows: first, collection of customer behavioral data and preparation of the data; second, the formation of derived variables and selection of influential variables, using a method of discriminant analysis; third, selection of training and testing data and reviewing their proportion; fourth, the development of prediction models using simple, bagging and boosting versions of supervised machine learning; fifth, comparison of churn prediction models based on different versions of machine-learning methods and selected variables; and sixth, providing appropriate strategies based on the proposed model. Findings According to the results, five variables, the number of items, reception of returned items, the discount, the distribution time and the prize beside the recency, frequency and monetary (RFM) variables (RFMITSDP), were chosen as the best predictor variables. The proposed model with accuracy of 97.92 per cent, in comparison to RFM, had much better performance in churn prediction and among the supervised machine learning methods, artificial neural network (ANN) had the highest accuracy, and decision trees (DT) was the least accurate one. The results show the substantially superiority of boosting versions in prediction compared with simple and bagging models. Research limitations/implications The period of the available data was limited to two years. The research data were limited to only one grocery store whereby it may not be applicable to other industries; therefore, generalizing the results to other business centers should be used with caution. Practical implications Business owners must try to enforce a clear rule to provide a prize for a certain number of purchased items. Of course, the prize can be something other than the purchased item. Business owners must accept the items returned by the customers for any reasons, and the conditions for accepting returned items and the deadline for accepting the returned items must be clearly communicated to the customers. Store owners must consider a discount for a certain amount of purchase from the store. They have to use an exponential rule to increase the discount when the amount of purchase is increased to encourage customers for more purchase. The managers of large stores must try to quickly deliver the ordered items, and they should use equipped and new transporting vehicles and skilled and friendly workforce for delivering the items. It is recommended that the types of services, the rules for prizes, the discount, the rules for accepting the returned items and the method of distributing the items must be prepared and shown in the store for all the customers to see. The special services and reward rules of the store must be communicated to the customers using new media such as social networks. To predict the customer behaviors based on the data, the future researchers should use the boosting method because it increases efficiency and accuracy of prediction. It is recommended that for predicting the customer behaviors, particularly their churning status, the ANN method be used. To extract and select the important and effective variables influencing customer behaviors, the discriminant analysis method can be used which is a very accurate and powerful method for predicting the classes of the customers. Originality/value The current study tries to fill this gap by considering five basic and important variables besides RFM in stores, i.e. prize, discount, accepting returns, delay in distribution and the number of items, so that the business owners can understand the role services such as prizes, discount, distribution and accepting returns play in retraining the customers and preventing them from churning. Another innovation of the current study is the comparison of machine-learning methods with their boosting and bagging versions, especially considering the fact that previous studies do not consider the bagging method. The other reason for the study is the conflicting results regarding the superiority of machine-learning methods in a more accurate prediction of customer behaviors, including churning. For example, some studies introduce ANN (Huang et al., 2010; Hung and Wang, 2004; Keramati et al., 2014; Runge et al., 2014), some introduce support vector machine ( Guo-en and Wei-dong, 2008; Vafeiadis et al., 2015; Yu et al., 2011) and some introduce DT (Freund and Schapire, 1996; Qureshi et al., 2013; Umayaparvathi and Iyakutti, 2012) as the best predictor, confusing the users of the results of these studies regarding the best prediction method. The current study identifies the best prediction method specifically in the field of store businesses for researchers and the owners. Moreover, another innovation of the current study is using discriminant analysis for selecting and filtering variables which are important and effective in predicting churners and non-churners, which is not used in previous studies. Therefore, the current study is unique considering the used variables, the method of comparing their accuracy and the method of selecting effective variables.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Bo Qiu ◽  
Wei Fan

Purpose Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in travel time prediction, however, such machine learning methods practically face the problem of overfitting. Tree-based ensembles have been applied in various prediction fields, and such approaches usually produce high prediction accuracy by aggregating and averaging individual decision trees. The inherent advantages of these approaches not only get better prediction results but also have a good bias-variance trade-off which can help to avoid overfitting. However, the reality is that the application of tree-based integration algorithms in traffic prediction is still limited. This study aims to improve the accuracy and interpretability of the models by using random forest (RF) to analyze and model the travel time on freeways. Design/methodology/approach As the traffic conditions often greatly change, the prediction results are often unsatisfactory. To improve the accuracy of short-term travel time prediction in the freeway network, a practically feasible and computationally efficient RF prediction method for real-world freeways by using probe traffic data was generated. In addition, the variables’ relative importance was ranked, which provides an investigation platform to gain a better understanding of how different contributing factors might affect travel time on freeways. Findings The parameters of the RF model were estimated by using the training sample set. After the parameter tuning process was completed, the proposed RF model was developed. The features’ relative importance showed that the variables (travel time 15 min before) and time of day (TOD) contribute the most to the predicted travel time result. The model performance was also evaluated and compared against the extreme gradient boosting method and the results indicated that the RF always produces more accurate travel time predictions. Originality/value This research developed an RF method to predict the freeway travel time by using the probe vehicle-based traffic data and weather data. Detailed information about the input variables and data pre-processing were presented. To measure the effectiveness of proposed travel time prediction algorithms, the mean absolute percentage errors were computed for different observation segments combined with different prediction horizons ranging from 15 to 60 min.


Sign in / Sign up

Export Citation Format

Share Document