Identification of critical factors for assessing the quality of restaurants using data mining approaches

Purpose The purpose of this paper is to apply state-of-the-art machine learning techniques for assessing the quality of the restaurants using restaurant inspection data. The machine learning techniques are applied to solve the real-world problems in all sphere of life. Health and food departments pay regular visits to restaurants for inspection and mark the condition of the restaurant on the basis of the inspection. These inspections consider many factors that determine the condition of the restaurants and make it possible for the authorities to classify the restaurants. Design/methodology/approach In this paper, standard machine learning techniques, support vector machines, naïve Bayes and random forest classifiers are applied to classify the critical level of the restaurants on the basis of features identified during the inspection. The importance of different factors of inspection is determined by using feature selection through the help of the minimum-redundancy-maximum-relevance and linear vector quantization feature importance methods. Findings The experiments are accomplished on the real-world New York City restaurant inspection data set that contains diverse inspection features. The results show that the nonlinear support vector machine achieves better accuracy than other techniques. Moreover, this research study investigates the importance of different factors of restaurant inspection and finds that inspection score and grade are significant features. The performance of the classifiers is measured by using the standard performance evaluation measures of accuracy, sensitivity and specificity. Originality/value This research uses a real-world data set of restaurant inspection that has, to the best of the authors’ knowledge, never been used previously by researchers. The findings are helpful in identifying the best restaurants and help finding the factors that are considered important in restaurant inspection. The results are also important in identifying possible biases in restaurant inspections by the authorities.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Implementing Computational Intelligence Techniques for Security Systems Design - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-2418-3.ch007 ◽

2020 ◽

pp. 146-162

Author(s):

Hesham M. Al-Ammal

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Anomaly Detection ◽

Real Life ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Learning Techniques ◽

Vector Machines

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.

Download Full-text

Multi-Hazard Exposure Mapping Using Machine Learning Techniques: A Case Study from Iran

Remote Sensing ◽

10.3390/rs11161943 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1943 ◽

Cited By ~ 15

Author(s):

Omid Rahmati ◽

Saleh Yousefi ◽

Zahra Kalantari ◽

Evelyn Uuemaa ◽

Teimur Teimurian ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Characteristic Curve ◽

Machine Learning Techniques ◽

Support Vector ◽

Mountainous Area ◽

Data Set ◽

Boosted Regression Tree ◽

Hazard Exposure ◽

Learning Techniques

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.

Download Full-text

Vehicle Price Prediction using SVM Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5915.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 398-401

Keyword(s):

Machine Learning ◽

Research Area ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Set ◽

Network Support ◽

Java Application ◽

Learning Techniques ◽

The Individual

The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

A segmented machine learning modeling approach of social media for predicting occupancy

International Journal of Contemporary Hospitality Management ◽

10.1108/ijchm-06-2020-0611 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 1

Author(s):

Apostolos Ampountolas ◽

Mark P. Legg

Keyword(s):

Machine Learning ◽

Social Media ◽

Life Cycles ◽

Machine Learning Techniques ◽

Learning Approach ◽

Negative Effects ◽

Data Set ◽

Content Type ◽

Learning Techniques ◽

The Usa

Purpose This study aims to predict hotel demand through text analysis by investigating keyword series to increase demand predictions’ precision. To do so, this paper presents a framework for modeling hotel demand that incorporates machine learning techniques. Design/methodology/approach The empirical forecasting is conducted by introducing a segmented machine learning approach of leveraging hierarchical clustering tied to machine learning and deep learning techniques. These features allow the model to yield more precise estimates. This study evaluates an extensive range of social media–derived words with the most significant probability of gradually establishing an understanding of an optimal outcome. Analyzes were performed on a major hotel chain in an urban market setting within the USA. Findings The findings indicate that while traditional methods, being the naïve approach and ARIMA models, struggled with forecasting accuracy, segmented boosting methods (XGBoost) leveraging social media predict hotel occupancy with greater precision for all examined time horizons. Additionally, the segmented learning approach improved the forecasts’ stability and robustness while mitigating common overfitting issues within a highly dimensional data set. Research limitations/implications Incorporating social media into a segmented learning framework can augment the current generation of forecasting methods’ accuracy. Moreover, the segmented learning approach mitigates the negative effects of market shifts (e.g. COVID-19) that can reduce in-production forecasts’ life-cycles. The ability to be more robust to market deviations will allow hospitality firms to minimize development time. Originality/value The results are expected to generate insights by providing revenue managers with an instrument for predicting demand.

Download Full-text

Road Accident Data Analysis: Data Preprocessing for Better Model Building

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8288 ◽

2019 ◽

Vol 16 (9) ◽

pp. 4019-4027 ◽

Cited By ~ 1

Author(s):

Salahadin Seid ◽

Pooja

Keyword(s):

Model Building ◽

Analysis Data ◽

Road Accident ◽

Machine Learning Techniques ◽

Support Vector ◽

Model Accuracy ◽

Data Set ◽

Learning Techniques ◽

Accident Data

In this study we focused on the relationship between preprocessing and model accuracy. The performance of the Machine learning techniques depends on the quality of the data set. Preprocessing is not only advantageous but it is very necessary and a preliminary work in predicting model. As a result, experiments discovered that preprocessed techniques increased performance for model building. To see the performance preprocessing support vector machine is applied before preprocessing and after preprocessing. Its model accuracy increased from 68.7% to 88.5%.

Download Full-text

Automated Amharic News Categorization Using Deep Learning Models

Computational Intelligence and Neuroscience ◽

10.1155/2021/3774607 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Demeke Endalie ◽

Getamesay Haile

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Document Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Language Resources ◽

Data Set ◽

Learning Techniques ◽

Proposed Model

For decades, machine learning techniques have been used to process Amharic texts. The potential application of deep learning on Amharic document classification has not been exploited due to a lack of language resources. In this paper, we present a deep learning model for Amharic news document classification. The proposed model uses fastText to generate text vectors to represent semantic meaning of texts and solve the problem of traditional methods. The text vectors matrix is then fed into the embedding layer of a convolutional neural network (CNN), which automatically extracts features. We conduct experiments on a data set with six news categories, and our approach produced a classification accuracy of 93.79%. We compared our method to well-known machine learning algorithms such as support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), XGBoost (XGB), and random forest (RF) and achieved good results.

Download Full-text

A Systematic Mapping of the Advancing Use of Machine Learning Techniques for Predictive Maintenance in the Manufacturing Sector

Applied Sciences ◽

10.3390/app11062546 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2546

Author(s):

Milena Nacchia ◽

Fabio Fruggiero ◽

Alfredo Lambiase ◽

Ken Bruton

Keyword(s):

Machine Learning ◽

Manufacturing Sector ◽

Predictive Maintenance ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Set ◽

Vibrational Signal ◽

Research Activities ◽

Learning Techniques ◽

Predictive Approach

The increasing availability of data, gathered by sensors and intelligent machines, is changing the way decisions are made in the manufacturing sector. In particular, based on predictive approach and facilitated by the nowadays growing capabilities of hardware, cloud-based solutions, and new learning approaches, maintenance can be scheduled—over cell engagement and resource monitoring—when required, for minimizing (or managing) unexpected equipment failures, improving uptime through less aggressive maintenance schedules, shortening unplanned downtime, reducing excess (direct and indirect) cost, reducing long-term damage to machines and processes, and improve safety plans. With access to increased levels of data (and over learning mechanisms), companies have the capability to conduct statistical tests using machine learning algorithms, in order to uncover root causes of problems previously unknown. This study analyses the maturity level and contributions of machine learning methods for predictive maintenance. An upward trend in publications for predictive maintenance using machine learning techniques was identified with the USA and China leading. A mapping study—steady set until early 2019 data—was employed as a formal and well-structured method to synthesize material and to report on pervasive areas of research. Type of equipment, sensors, and data are mapped to properly assist new researchers in positioning new research activities in the domain of smart maintenance. Hence, in this paper, we focus on data-driven methods for predictive maintenance (PdM) with a comprehensive survey on applications and methods until, for the sake of commenting on stable proposal, 2019 (early included). An equal repartition between evaluation and validation studies was identified, this being a symptom of an immature but growing research area. In addition, the type of contribution is mainly in the form of models and methodologies. Vibrational signal was marked as the most used data set for diagnosis in manufacturing machinery monitoring; furthermore, supervised learning is reported as the most used predictive approach (ensemble learning is growing fast). Neural networks, followed by random forests and support vector machines, were identified as the most applied methods encompassing 40% of publications, of which 67% related to deep neural network with long short-term memory predominance. Notwithstanding, there is no robust approach (no one reported optimal performance over different case tests) that works best for every problem. We finally conclude the research in this area is moving fast to gather a separate focused analysis over the last two years (whenever stable implementations will appear).

Download Full-text

Machine Learning for Clinical Data Processing

Advances in Digital Crime, Forensics, and Cyber Terrorism - Digital Forensics for the Health Sciences ◽

10.4018/978-1-60960-483-7.ch009 ◽

2011 ◽

pp. 193-215

Author(s):

Guo-Zheng Li

Keyword(s):

Machine Learning ◽

Data Processing ◽

Clinical Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Sets ◽

The Novel ◽

Real World Data ◽

Data Set ◽

Learning Techniques

This chapter introduces great challenges and the novel machine learning techniques employed in clinical data processing. It argues that the novel machine learning techniques including support vector machines, ensemble learning, feature selection, feature reuse by using multi-task learning, and multi-label learning provide potentially more substantive solutions for decision support and clinical data analysis. The authors demonstrate the generalization performance of the novel machine learning techniques on real world data sets including one data set of brain glioma, one data set of coronary heart disease in Chinese Medicine and some tumor data sets of microarray. More and more machine learning techniques will be developed to improve analysis precision of clinical data sets.

Download Full-text