Identification of critical factors for assessing the quality of restaurants using data mining approaches

2019 ◽  
Vol 37 (6) ◽  
pp. 952-969
Author(s):  
Ahsan Mahmood ◽  
Hikmat Ullah Khan

Purpose The purpose of this paper is to apply state-of-the-art machine learning techniques for assessing the quality of the restaurants using restaurant inspection data. The machine learning techniques are applied to solve the real-world problems in all sphere of life. Health and food departments pay regular visits to restaurants for inspection and mark the condition of the restaurant on the basis of the inspection. These inspections consider many factors that determine the condition of the restaurants and make it possible for the authorities to classify the restaurants. Design/methodology/approach In this paper, standard machine learning techniques, support vector machines, naïve Bayes and random forest classifiers are applied to classify the critical level of the restaurants on the basis of features identified during the inspection. The importance of different factors of inspection is determined by using feature selection through the help of the minimum-redundancy-maximum-relevance and linear vector quantization feature importance methods. Findings The experiments are accomplished on the real-world New York City restaurant inspection data set that contains diverse inspection features. The results show that the nonlinear support vector machine achieves better accuracy than other techniques. Moreover, this research study investigates the importance of different factors of restaurant inspection and finds that inspection score and grade are significant features. The performance of the classifiers is measured by using the standard performance evaluation measures of accuracy, sensitivity and specificity. Originality/value This research uses a real-world data set of restaurant inspection that has, to the best of the authors’ knowledge, never been used previously by researchers. The findings are helpful in identifying the best restaurants and help finding the factors that are considered important in restaurant inspection. The results are also important in identifying possible biases in restaurant inspections by the authorities.

2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


2019 ◽  
Vol 11 (16) ◽  
pp. 1943 ◽  
Author(s):  
Omid Rahmati ◽  
Saleh Yousefi ◽  
Zahra Kalantari ◽  
Evelyn Uuemaa ◽  
Teimur Teimurian ◽  
...  

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.


The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application


2021 ◽  
Vol 8 (1) ◽  
pp. 28
Author(s):  
S. L. Ávila ◽  
H. M. Schaberle ◽  
S. Youssef ◽  
F. S. Pacheco ◽  
C. A. Penz

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Apostolos Ampountolas ◽  
Mark P. Legg

Purpose This study aims to predict hotel demand through text analysis by investigating keyword series to increase demand predictions’ precision. To do so, this paper presents a framework for modeling hotel demand that incorporates machine learning techniques. Design/methodology/approach The empirical forecasting is conducted by introducing a segmented machine learning approach of leveraging hierarchical clustering tied to machine learning and deep learning techniques. These features allow the model to yield more precise estimates. This study evaluates an extensive range of social media–derived words with the most significant probability of gradually establishing an understanding of an optimal outcome. Analyzes were performed on a major hotel chain in an urban market setting within the USA. Findings The findings indicate that while traditional methods, being the naïve approach and ARIMA models, struggled with forecasting accuracy, segmented boosting methods (XGBoost) leveraging social media predict hotel occupancy with greater precision for all examined time horizons. Additionally, the segmented learning approach improved the forecasts’ stability and robustness while mitigating common overfitting issues within a highly dimensional data set. Research limitations/implications Incorporating social media into a segmented learning framework can augment the current generation of forecasting methods’ accuracy. Moreover, the segmented learning approach mitigates the negative effects of market shifts (e.g. COVID-19) that can reduce in-production forecasts’ life-cycles. The ability to be more robust to market deviations will allow hospitality firms to minimize development time. Originality/value The results are expected to generate insights by providing revenue managers with an instrument for predicting demand.


2019 ◽  
Vol 16 (9) ◽  
pp. 4019-4027 ◽  
Author(s):  
Salahadin Seid ◽  
Pooja

In this study we focused on the relationship between preprocessing and model accuracy. The performance of the Machine learning techniques depends on the quality of the data set. Preprocessing is not only advantageous but it is very necessary and a preliminary work in predicting model. As a result, experiments discovered that preprocessed techniques increased performance for model building. To see the performance preprocessing support vector machine is applied before preprocessing and after preprocessing. Its model accuracy increased from 68.7% to 88.5%.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Demeke Endalie ◽  
Getamesay Haile

For decades, machine learning techniques have been used to process Amharic texts. The potential application of deep learning on Amharic document classification has not been exploited due to a lack of language resources. In this paper, we present a deep learning model for Amharic news document classification. The proposed model uses fastText to generate text vectors to represent semantic meaning of texts and solve the problem of traditional methods. The text vectors matrix is then fed into the embedding layer of a convolutional neural network (CNN), which automatically extracts features. We conduct experiments on a data set with six news categories, and our approach produced a classification accuracy of 93.79%. We compared our method to well-known machine learning algorithms such as support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), XGBoost (XGB), and random forest (RF) and achieved good results.


2021 ◽  
Vol 11 (6) ◽  
pp. 2546
Author(s):  
Milena Nacchia ◽  
Fabio Fruggiero ◽  
Alfredo Lambiase ◽  
Ken Bruton

The increasing availability of data, gathered by sensors and intelligent machines, is changing the way decisions are made in the manufacturing sector. In particular, based on predictive approach and facilitated by the nowadays growing capabilities of hardware, cloud-based solutions, and new learning approaches, maintenance can be scheduled—over cell engagement and resource monitoring—when required, for minimizing (or managing) unexpected equipment failures, improving uptime through less aggressive maintenance schedules, shortening unplanned downtime, reducing excess (direct and indirect) cost, reducing long-term damage to machines and processes, and improve safety plans. With access to increased levels of data (and over learning mechanisms), companies have the capability to conduct statistical tests using machine learning algorithms, in order to uncover root causes of problems previously unknown. This study analyses the maturity level and contributions of machine learning methods for predictive maintenance. An upward trend in publications for predictive maintenance using machine learning techniques was identified with the USA and China leading. A mapping study—steady set until early 2019 data—was employed as a formal and well-structured method to synthesize material and to report on pervasive areas of research. Type of equipment, sensors, and data are mapped to properly assist new researchers in positioning new research activities in the domain of smart maintenance. Hence, in this paper, we focus on data-driven methods for predictive maintenance (PdM) with a comprehensive survey on applications and methods until, for the sake of commenting on stable proposal, 2019 (early included). An equal repartition between evaluation and validation studies was identified, this being a symptom of an immature but growing research area. In addition, the type of contribution is mainly in the form of models and methodologies. Vibrational signal was marked as the most used data set for diagnosis in manufacturing machinery monitoring; furthermore, supervised learning is reported as the most used predictive approach (ensemble learning is growing fast). Neural networks, followed by random forests and support vector machines, were identified as the most applied methods encompassing 40% of publications, of which 67% related to deep neural network with long short-term memory predominance. Notwithstanding, there is no robust approach (no one reported optimal performance over different case tests) that works best for every problem. We finally conclude the research in this area is moving fast to gather a separate focused analysis over the last two years (whenever stable implementations will appear).


Author(s):  
Guo-Zheng Li

This chapter introduces great challenges and the novel machine learning techniques employed in clinical data processing. It argues that the novel machine learning techniques including support vector machines, ensemble learning, feature selection, feature reuse by using multi-task learning, and multi-label learning provide potentially more substantive solutions for decision support and clinical data analysis. The authors demonstrate the generalization performance of the novel machine learning techniques on real world data sets including one data set of brain glioma, one data set of coronary heart disease in Chinese Medicine and some tumor data sets of microarray. More and more machine learning techniques will be developed to improve analysis precision of clinical data sets.


Sign in / Sign up

Export Citation Format

Share Document