Integration of synthetic minority oversampling technique for imbalanced class

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text

Assessment of Machine Learning Models to Identify Port Jackson Shark Behaviours Using Tri-Axial Accelerometers

Sensors ◽

10.3390/s20247096 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7096

Author(s):

Julianna P. Kadar ◽

Monique A. Ladds ◽

Joanna Day ◽

Brianne Lyall ◽

Culum Brown

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Classification Tree ◽

Support Vector ◽

Fine Scale ◽

Learning Models ◽

Port Jackson ◽

F Measure ◽

Machine Learning Models ◽

Broad Scale

Movement ecology has traditionally focused on the movements of animals over large time scales, but, with advancements in sensor technology, the focus can become increasingly fine scale. Accelerometers are commonly applied to quantify animal behaviours and can elucidate fine-scale (<2 s) behaviours. Machine learning methods are commonly applied to animal accelerometry data; however, they require the trial of multiple methods to find an ideal solution. We used tri-axial accelerometers (10 Hz) to quantify four behaviours in Port Jackson sharks (Heterodontus portusjacksoni): two fine-scale behaviours (<2 s)—(1) vertical swimming and (2) chewing as proxy for foraging, and two broad-scale behaviours (>2 s–mins)—(3) resting and (4) swimming. We used validated data to calculate 66 summary statistics from tri-axial accelerometry and assessed the most important features that allowed for differentiation between the behaviours. One and two second epoch testing sets were created consisting of 10 and 20 samples from each behaviour event, respectively. We developed eight machine learning models to assess their overall accuracy and behaviour-specific accuracy (one classification tree, five ensemble learners and two neural networks). The support vector machine model classified the four behaviours better when using the longer 2 s time epoch (F-measure 89%; macro-averaged F-measure: 90%). Here, we show that this support vector machine (SVM) model can reliably classify both fine- and broad-scale behaviours in Port Jackson sharks.

Download Full-text

Design and analysis of an efficient machine learning based hybrid recommendation system with enhanced density-based spatial clustering for digital e-learning applications

Complex & Intelligent Systems ◽

10.1007/s40747-021-00509-4 ◽

2021 ◽

Author(s):

S. Bhaskaran ◽

Raja Marappan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Support Vector Machine ◽

Absolute Error ◽

Support Vector ◽

E Learning ◽

Public Datasets ◽

Hybrid Recommender ◽

New Strategies

AbstractA decision-making system is one of the most important tools in data mining. The data mining field has become a forum where it is necessary to utilize users' interactions, decision-making processes and overall experience. Nowadays, e-learning is indeed a progressive method to provide online education in long-lasting terms, contrasting to the customary head-to-head process of educating with culture. Through e-learning, an ever-increasing number of learners have profited from different programs. Notwithstanding, the highly assorted variety of the students on the internet presents new difficulties to the conservative one-estimate fit-all learning systems, in which a solitary arrangement of learning assets is specified to the learners. The problems and limitations in well-known recommender systems are much variations in the expected absolute error, consuming more query processing time, and providing less accuracy in the final recommendation. The main objectives of this research are the design and analysis of a new transductive support vector machine-based hybrid personalized hybrid recommender for the machine learning public data sets. The learning experience has been achieved through the habits of the learners. This research designs some of the new strategies that are experimented with to improve the performance of a hybrid recommender. The modified one-source denoising approach is designed to preprocess the learner dataset. The modified anarchic society optimization strategy is designed to improve the performance measurements. The enhanced and generalized sequential pattern strategy is proposed to mine the sequential pattern of learners. The enhanced transductive support vector machine is developed to evaluate the extracted habits and interests. These new strategies analyze the confidential rate of learners and provide the best recommendation to the learners. The proposed generalized model is simulated on public datasets for machine learning such as movies, music, books, food, merchandise, healthcare, dating, scholarly paper, and open university learning recommendation. The experimental analysis concludes that the enhanced clustering strategy discovers clusters that are based on random size. The proposed recommendation strategies achieve better significant performance over the methods in terms of expected absolute error, accuracy, ranking score, recall, and precision measurements. The accuracy of the proposed datasets lies between 82 and 98%. The MAE metric lies between 5 and 19.2% for the simulated public datasets. The simulation results prove the proposed generalized recommender has a great strength to improve the quality and performance.

Download Full-text

Biased support vector machine and weighted-smote in handling class imbalance problem

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v4i1.146 ◽

2018 ◽

Vol 4 (1) ◽

pp. 21 ◽

Cited By ~ 21

Author(s):

Hartono Hartono ◽

Opim Salim Sitompul ◽

Tulus Tulus ◽

Erna Budhiarti Nababan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Distribution ◽

Class Imbalance ◽

Support Vector ◽

Class Imbalance Problem ◽

Precise Method ◽

Imbalance Problem

Class imbalance occurs when instances in a class are much higher than in other classes. This machine learning major problem can affect the predicted accuracy. Support Vector Machine (SVM) is robust and precise method in handling class imbalance problem but weak in the bias data distribution, Biased Support Vector Machine (BSVM) became popular choice to solve the problem. BSVM provide better control sensitivity yet lack accuracy compared to general SVM. This study proposes the integration of BSVM and SMOTEBoost to handle class imbalance problem. Non Support Vector (NSV) sets from negative samples and Support Vector (SV) sets from positive samples will undergo a Weighted-SMOTE process. The results indicate that implementation of Biased Support Vector Machine and Weighted-SMOTE achieve better accuracy and sensitivity.

Download Full-text

Classification of all-rounders in limited over cricket - a machine learning approach

Journal of Sports Analytics ◽

10.3233/jsa-200467 ◽

2021 ◽

Vol 6 (4) ◽

pp. 295-306

Author(s):

Ananda B. W. Manage ◽

Ram C. Kafle ◽

Danush K. Wijekularathna

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Discriminant Function ◽

Linear Discriminant Function ◽

Classification Rule ◽

Support Vector ◽

Classification Methods ◽

Linear Discriminant ◽

Quadratic Discriminant Function

In cricket, all-rounders play an important role. A good all-rounder should be able to contribute to the team by both bat and ball as needed. However, these players still have their dominant role by which we categorize them as batting all-rounders or bowling all-rounders. Current practice is to do so by mostly subjective methods. In this study, the authors have explored different machine learning techniques to classify all-rounders into bowling all-rounders or batting all-rounders based on their observed performance statistics. In particular, logistic regression, linear discriminant function, quadratic discriminant function, naïve Bayes, support vector machine, and random forest classification methods were explored. Evaluation of the performance of the classification methods was done using the metrics accuracy and area under the ROC curve. While all the six methods performed well, logistic regression, linear discriminant function, quadratic discriminant function, and support vector machine showed outstanding performance suggesting that these methods can be used to develop an automated classification rule to classify all-rounders in cricket. Given the rising popularity of cricket, and the increasing revenue generated by the sport, the use of such a prediction tool could be of tremendous benefit to decision-makers in cricket.

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

Machine Learning in Apache Spark Environment for Diagnosis of Diabetes

10.20944/preprints202111.0200.v1 ◽

2021 ◽

Author(s):

Farshid Bagheri Saravi ◽

Shadi Moghanian ◽

Giti Javidi ◽

Ehsan O Sheybani

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Big Data ◽

Random Forest ◽

Apache Spark ◽

Support Vector ◽

Computing Environment ◽

Large Dataset ◽

Related Data

Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.

Download Full-text

Klasifikasi Pengembalian Radar dari Ionosfer Menggunakan SVM, NaÃ¯ve Bayes dan Random Forest

Komputika : Jurnal Sistem Komputer ◽

10.34010/komputika.v10i2.4347 ◽

2021 ◽

Vol 10 (2) ◽

pp. 111-117

Author(s):

Yulia Aryani ◽

Arie Wahyu Wijayanto

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Support Vector ◽

Ve Bayes

ABSTRAK â€“ Klasifikasi merupakan salah satu topik utama dalam data mining atau machine learning. Klasifikasi adalah suatu pengelompokan data dimana data yang digunakan tersebut mempunyai kelas label atau target. Klasifikasi digunakan untuk mengambil data dan ditempatkan kedalam kelompok tertentu. Studi tentang ionosfer penting untuk penelitian di berbagai domain, khususnya dalam sistem komunikasi. Dalam penelitian ionosfer, perlu dilakukan klasifikasi radar yang berguna dan tidak berguna dari ionosfer. Pada makalah ini, akan dilakukan klasifikasi terhadap data inosphere yang diambil dari UCI machine learning repository. Klasifikasi dilakukan dengan menggunakan tiga metode klasifikasi, yakni SVM ( Support Vector Machine ) , NaÃ¯ve Bayes, dan Random Forest. Hasil dari percobaan ini bisa menunjukkan prediksi dari setiap percobaan dengan tingkat akurasi dan prediksi yang berbeda-beda di setiap metode yang digunakan. Hasil akurasi, presisi, dan recall terbaik didapatkan pada metode Random Forest dengan rasio data latih dan data uji sebesar 85% didapat akurasi dari data uji sebesar 90,57% dengan presisi sebesar 94,12%. Kata Kunci â€“ Ionosfer; Klasifikasi; SVM; NaÃ¯ve Bayes; Random Forest.

Download Full-text

A Business Classifier to Detect Readability Metrics on Software Games and Their Types

International Journal of E-Entrepreneurship and Innovation ◽

10.4018/ijeei.2013100104 ◽

2013 ◽

Vol 4 (4) ◽

pp. 47-57

Author(s):

Yahya M. Tashtoush ◽

Derar Darwish ◽

Motasim Albdarneh ◽

Izzat M. Alsmadi ◽

Khalid Alkhatib

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Software Metrics ◽

Machine Learning Algorithms ◽

Support Vector ◽

Mobile Environments ◽

New Approach ◽

Data Mining Tool ◽

Mining Tool

Readability metric is considered to be one of the most important factors that may affect games business in terms of evaluating games' quality in general and usability in particular. As games may go through many evolutions and developed by many developers, code readability can significantly impact the time and resources required to build, update or maintain such games. This paper introduces a new approach to detect readability for games built in Java or C++ for desktop and mobile environments. Based on data mining techniques, an approach for predicting the type of the game is proposed based on readability and some other software metrics or attributes. Another classifier is built to predict software readability in games applications based on several collected features. These classifiers are built using machine learning algorithms (J48 decision tree, support vector machine, SVM and Naive Bayes, NB) that are available in WEKA data mining tool.

Download Full-text

Simple quantum circuit for pattern recognition based on nearest mean classifier

International Journal on Perceptive and Cognitive Computing ◽

10.31436/ijpcc.v2i2.38 ◽

2016 ◽

Vol 2 (2) ◽

Author(s):

Gharib M Subhi ◽

Azeddine Messikh

Keyword(s):

Machine Learning ◽

Data Mining ◽

Pattern Recognition ◽

Support Vector Machine ◽

Quantum Circuit ◽

Support Vector ◽

Bloch Sphere ◽

Optimal Hyperplane ◽

Simple Quantum ◽

Quantum Matrix

Machine learning plays a key role in many applications such as data mining and image recognition.Classification is one subcategory under machine learning. In this paper we propose a simple quantum circuitbased on the nearest mean classifier to classified handwriting characters. Our circuit is a simplified circuit fromthe quantum support vector machine [Phys. Rev. Lett. 114, 140504 (2015)] which uses quantum matrix inversealgorithm to find optimal hyperplane that separated two different classes. In our case the hyperplane is foundusing projections and rotations on the Bloch sphere.

Download Full-text