CLASSIFICATION COMPLEX QUERY SQL FOR DATA LAKE MANAGEMENT USING MACHINE LEARNING

A query is a request for data or information from a database table or a combination of tables. It allows for a more accurate database search. SQL queries are divided into two types, namely, simple queries and complex queries. Complex SQL is the use of SQL queries that go beyond standard SQL by using the SELECT and WHERE commands. Complex SQL queries often involve the use of complex joins and subqueries, where the queries are nested in a WHERE clause. Complex SQL queries can be grouped into two types of queries, namely, Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) queries. In the implementation of complex SQL queries in the NoSQL database, a classification process is needed due to the varying data formats, namely, structured, semi-structured, and unstructured data. The classification process aims to make it easier for the query data to be organized by type of query. The classification method used in this research is the Naive Bayes Classifier (NBC) which is generally often used in text data, and the Support Vector Machine (SVM), which is known to work very well on data with large dimensions. The two methods will be compared to determine the best classification result. The results showed that SVM was 84.61% accurate in terms of classification, and comparatively, NBC was at 76.92%.

Download Full-text

Bipartite Network of Interest (BNOI): Extending Co-Word Network with Interest of Researchers Using Sensor Data and Corresponding Applications as an Example

Sensors ◽

10.3390/s21051668 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1668

Author(s):

Zongming Dai ◽

Kai Hu ◽

Jie Xie ◽

Shengyu Shen ◽

Jie Zheng ◽

...

Keyword(s):

Feature Fusion ◽

Extraction Methods ◽

Knowledge Network ◽

Sensor Data ◽

Support Vector ◽

Bipartite Network ◽

Classification Models ◽

Text Data ◽

Domain Experts ◽

Problems And Solutions

Traditional co-word networks do not discriminate keywords of researcher interest from general keywords. Co-word networks are therefore often too general to provide knowledge if interest to domain experts. Inspired by the recent work that uses an automatic method to identify the questions of interest to researchers like “problems” and “solutions”, we try to answer a similar question “what sensors can be used for what kind of applications”, which is great interest in sensor- related fields. By generalizing the specific questions as “questions of interest”, we built a knowledge network considering researcher interest, called bipartite network of interest (BNOI). Different from a co-word approaches using accurate keywords from a list, BNOI uses classification models to find possible entities of interest. A total of nine feature extraction methods including N-grams, Word2Vec, BERT, etc. were used to extract features to train the classification models, including naïve Bayes (NB), support vector machines (SVM) and logistic regression (LR). In addition, a multi-feature fusion strategy and a voting principle (VP) method are applied to assemble the capability of the features and the classification models. Using the abstract text data of 350 remote sensing articles, features are extracted and the models trained. The experiment results show that after removing the biased words and using the ten-fold cross-validation method, the F-measure of “sensors” and “applications” are 93.2% and 85.5%, respectively. It is thus demonstrated that researcher questions of interest can be better answered by the constructed BNOI based on classification results, comparedwith the traditional co-word network approach.

Download Full-text

Different Machine Learning Classifiers for Music Emotion Recognition

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7833.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2187-2191

Keyword(s):

Machine Learning ◽

Emotion Recognition ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Classifier ◽

Promising Alternative ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Statistical Metrics

Music in an essential part of life and the emotion carried by it is key to its perception and usage. Music Emotion Recognition (MER) is the task of identifying the emotion in musical tracks and classifying them accordingly. The objective of this research paper is to check the effectiveness of popular machine learning classifiers like XGboost, Random Forest, Decision Trees, Support Vector Machine (SVM), K-Nearest-Neighbour (KNN) and Gaussian Naive Bayes on the task of MER. Using the MIREX-like dataset [17] to test these classifiers, the effects of oversampling algorithms like Synthetic Minority Oversampling Technique (SMOTE) [22] and Random Oversampling (ROS) were also verified. In all, the Gaussian Naive Bayes classifier gave the maximum accuracy of 40.33%. The other classifiers gave accuracies in between 20.44% and 38.67%. Thus, a limit on the classification accuracy has been reached using these classifiers and also using traditional musical or statistical metrics derived from the music as input features. In view of this, deep learning-based approaches using Convolutional Neural Networks (CNNs) [13] and spectrograms of the music clips for MER is a promising alternative.

Download Full-text

IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DAN SUPPORT VECTOR MACHINE PADA KLASIFIKASI SENTIMEN REVIEW LAYANAN TELEMEDICINE HALODOC

Jambura Journal of Probability and Statistics ◽

10.34312/jjps.v2i2.11364 ◽

2021 ◽

Vol 2 (2) ◽

pp. 96-104

Author(s):

REYNALDA NABILA CIKANIA

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Telemedicine Service ◽

Application Service ◽

Auc Value

Halodoc is a telemedicine-based healthcare application that connects patients with health practitioners such as doctors, pharmacies, and laboratories. There are some comments from halodoc users, both positive and negative comments. This indicates the public's concern for the Halodoc application so it is necessary to analyze the sentiment or comments that appear on the Halodoc application service, especially during the COVID-19 pandemic in order for Halodoc application services to be better. The Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) algorithms are used to analyze the public sentiment of Halodoc's telemedicine service application users. The negative category sentiment classification result was 12.33%, while the positive category sentiment was 87.67% from 5,687 reviews which means that the positive review sentiment is more than the negative review sentiment. The accuracy performance of the Naive Bayes Classifier Algorithm resulted in an accuracy rate of 87.77% with an AUC value of 57.11% and a G-Mean of 40.08%, while svm algorithm with KERNEL RBF had an accuracy value of 86.1% with an AUC value of 60.149% and a G-Mean value of 49.311%. Based on the accuracy value of the model can be known SVM Kernel RBF model better than NBC on classifying the review of user sentiment of halodoc telemedicine service

Download Full-text

Classification of Idioms and Literals Using Support Vector Machine and Naïve Bayes Classifier

10.1007/978-981-16-5078-9_42 ◽

2021 ◽

pp. 515-524

Author(s):

J. Briskilal ◽

C. N. Subalalitha

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Support Vector Regression Prediction Method of Text Data Based on Correlation Analysis

2020 International Conference on Data Processing Techniques and Applications for Cyber-Physical Systems - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-16-1726-3_74 ◽

2021 ◽

pp. 595-601

Author(s):

Jianfeng Ma ◽

Tongfei Shang ◽

Jingwei Yang ◽

Jiaqing Zhao

Keyword(s):

Correlation Analysis ◽

Support Vector Regression ◽

Prediction Method ◽

Support Vector ◽

Text Data

Download Full-text

Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification

Abstract and Applied Analysis ◽

10.1155/2014/459137 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 35

Author(s):

Longjun Dong ◽

Xibing Li ◽

Gongnan Xie

Keyword(s):

Predictive Power ◽

Seismic Event ◽

Naive Bayes ◽

Nuclear Explosion ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Classifier ◽

Vector Machines ◽

The One ◽

Calculated Distance

The discrimination of seismic event and nuclear explosion is a complex and nonlinear system. The nonlinear methodologies including Random Forests (RF), Support Vector Machines (SVM), and Naïve Bayes Classifier (NBC) were applied to discriminant seismic events. Twenty earthquakes and twenty-seven explosions with nine ratios of the energies contained within predetermined “velocity windows” and calculated distance are used in discriminators. Based on the one out cross-validation, ROC curve, calculated accuracy of training and test samples, and discriminating performances of RF, SVM, and NBC were discussed and compared. The result of RF method clearly shows the best predictive power with a maximum area of 0.975 under the ROC among RF, SVM, and NBC. The discriminant accuracies of RF, SVM, and NBC for test samples are 92.86%, 85.71%, and 92.86%, respectively. It has been demonstrated that the presented RF model can not only identify seismic event automatically with high accuracy, but also can sort the discriminant indicators according to calculated values of weights.

Download Full-text

Comparison Performance of Naive Bayes Classifier and Support Vector Machine Algorithm for Twitter’s Classification of Tokopedia Services

Journal of Physics Conference Series ◽

10.1088/1742-6596/1320/1/012016 ◽

2019 ◽

Vol 1320 ◽

pp. 012016

Author(s):

R Kusumawati ◽

A D’arofah ◽

P A Pramana

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Comparison of Support Vector Machine Classifier and Naïve Bayes Classifier on Road Surface Type Classification

2018 International Conference on Sustainable Information Engineering and Technology (SIET) ◽

10.1109/siet.2018.8693113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Susi Marianingsih ◽

Fitri Utaminingrum

Keyword(s):

Support Vector Machine ◽

Support Vector Machine Classifier ◽

Naive Bayes ◽

Naïve Bayes ◽

Road Surface ◽

Support Vector ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Surface Type ◽

Type Classification

Download Full-text

N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2014-002694 ◽

2014 ◽

Vol 21 (5) ◽

pp. 871-875 ◽

Cited By ~ 31

Author(s):

Ben J Marafino ◽

Jason M Davies ◽

Naomi S Bardach ◽

Mitzi L Dean ◽

R Adams Dudley

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Support Vector Machines ◽

Support Vector ◽

Free Text ◽

Text Data ◽

Vector Machines ◽

N Gram ◽

Diagnosis Classification

Download Full-text

NAIVE BAYES CLASSIFIER DAN SUPPORT VECTOR MACHINE SEBAGAI ALTERNATIF SOLUSI UNTUK TEXT MINING

Jurnal Teknologi Informasi dan Pendidikan ◽

10.24036/tip.v12i2.219 ◽

2019 ◽

Vol 12 (2) ◽

pp. 32-38

Author(s):

Iin Ernawati

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

The Relationship

This study was conducted to text-based data mining or often called text mining, classification methods commonly used method Naïve bayes classifier (NBC) and support vector machine (SVM). This classification is emphasized for Indonesian language documents, while the relationship between documents is measured by the probability that can be proven with other classification algorithms. This evident from the conclusion that the probability result Naïve Bayes Classifier (NBC) word “party” at least in the economic document and political. Then the result of the algorithm support vector machine (svm) with the word “price” and “kpk” contains in both economic and politic document.

Download Full-text