scholarly journals Data mining classification algorithms: An overview

2021 ◽  
Vol 8 (2) ◽  
pp. 1-5
Author(s):  
Bardab et al. ◽  

Data mining is also defined as the process of analyzing a quantity of data (usually a large amount) to find a logical relationship that summarizes the data in a new way that is understandable and useful to the owner of the data. This paper examines the various types of classification algorithms in Data Mining, their applications and categorically states the strengths and limitations of each type. The weaknesses found in each algorithm demonstrate how tasks cannot be performed well when only one type of algorithm is applied. For this reason, it is the view of the writer that further research needs to be carried out to explore the potential of combining several of these algorithms to solve machine learning problems.

2020 ◽  
Vol 5 (3) ◽  
pp. 138-146 ◽  

:Most online customers use cards to pay for their purchases. As charge cards become the most mainstream strategy for installment, instances of misrepresentation relationship with it too increases. The primary goal of this venture is to be ready to perceive false exchanges from non-fake exchanges. In request to do so,primarily,data mining methods are utilized to examine the examples and attributes of deceitful and non-fake transactions.Then,machine learning systems are utilized to foresee the fake and non-fake exchanges automatically. Algorithms LR (Logistic Regression) is used. Therefore, the blend of AI and information mining procedures are utilized to distinguish the fake and non-fake exchanges by learning the examples of the information. Models are made utilizing these calculations and afterward precision,accuracy,recall are determined and an examination is made.


2020 ◽  
Vol 14 (1) ◽  
pp. 34
Author(s):  
Nina Sulistiyowati ◽  
Mohamad Jajuli

Classification of data with unbalanced classes is a major problem in the field of machine learning and data mining. If working on unbalanced data, almost all classification algorithms will produce much higher accuracy for majority classes than minority classes. This research will implement the Synthetic Minority Over-sampling Technique (SMOTE) method to overcome unbalanced data on credit customer data in Rawamerta teacher cooperatives. The research methodology uses SEMMA with the stages of research Sample, Explore, Modify, Model, and Asses. The Sample Phase was conducted to choose the data of the Rawamerta Teachers Cooperative credit customers for 2015-2017 with a total of 878 data with the attributes used namely income, total deposits, loan amount, duration of installments, services, installments, and credit status. The Explore phase analyzes current classes which are categorized as majority classes because there are 813 data, while traffic classes can be categorized as minority classes because there are 65 data. The data shows an imbalance of data between the two classes. The Modify stages perform the 500% SMOTE process. The Model Stage classifies using Na�ve Bayes. Na�ve Bayes modeling with SMOTE produced 1131 successfully classified data correctly and 72 data were not classified correctly while without SMOTE resulted in 818 data was classified correctly and 60 data were not classified correctly.Keywords: Na�ve Bayes, SMOTE, unbalanced data


Author(s):  
Meenu Gupta ◽  
Vijender Kumar Solanki ◽  
Vijay Kumar Singh ◽  
Vicente García-Díaz

Data mining is used in various domains of research to identify a new cause for tan effect in the society over the globe. This article includes the same reason for using the data mining to identify the Accident Occurrences in different regions and to identify the most valid reason for happening accidents over the globe. Data Mining and Advanced Machine Learning algorithms are used in this research approach and this article discusses about hyperline, classifications, pre-processing of the data, training the machine with the sample datasets which are collected from different regions in which we have structural and semi-structural data. We will dive into deep of machine learning and data mining classification algorithms to find or predict something novel about the accident occurrences over the globe. We majorly concentrate on two classification algorithms to minify the research and task and they are very basic and important classification algorithms. SVM (Support vector machine), CNB Classifier. This discussion will be quite interesting with WEKA tool for CNB classifier, Bag of Words Identification, Word Count and Frequency Calculation.


1997 ◽  
Vol 06 (04) ◽  
pp. 537-566 ◽  
Author(s):  
Ron Kohavi ◽  
Dan Sommerfield ◽  
James Dougherty

Data mining algorithms including maching learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called [Formula: see text], which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. [Formula: see text] not only provides a workbench for such comparisons, but also provides a library of C++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers.


The absence of labels and the bad quality of data is a prevailing challenge in numerous data mining and machine learning problems. The performance of a model is limited by available data samples with few labels for training. These problems are ultra-critical in multi-label classification, which usually needs clean data. Multi-label classification is a challenging research problem that emerges in several applications such as multi-object recognition, text categorization, music categorization and image classification. This paper presents a literature review on multi-label classification, various evaluation metrics used for analyzing performance and research hchallenges.


2020 ◽  
Vol 3 (1) ◽  
pp. 56-67
Author(s):  
Farid Ablayev ◽  
Marat Ablayev ◽  
Joshua Zhexue Huang ◽  
Kamil Khadiev ◽  
Nailya Salikhova ◽  
...  

Breast Cancer is the second highest reason for the death rate among women as well as men too in world. In this paper, we used Data mining classification algorithms to find the presence of breast cancer whether it is benign or malignant and analysis is done on the basics of accuracy and time taken in build model. The data is collected from WISCONSIN of UCI machine learning Repository, which includes patient’s samples. The dataset undergoes different algorithm with and without feature selection.


Decision tree algorithms, being accurate and comprehensible classifiers, have been one of the most widely used classifiers in data mining and machine learning. However, like many other classification algorithms, decision tree algorithms focus on extracting patterns with high generality and in the process, these ignore some rare but useful and interesting patterns that may exist in small disjuncts of data. Such extraordinary patterns with low support and high confidence capture very specific but exceptional behavior present in data. This paper proposes a novel Enhanced Decision Tree Algorithm for Discovering Intra and Inter-class Exceptions (EDTADE). Intra-class exceptions cover objects of unique interest within a class whereas inter-class exceptions capture rare conditions due to which we are forced shift the class of few unusual objects. For instance, whales and bats are examples of intra-class exceptions since these have unique characteristics within the class of mammals. Further, most of the birds are flying creatures, but the rare birds, like penguin and ostrich fall in the category of no flying birds. Here, penguin and ostrich are inter-class exceptions. In fact, without knowing about such exceptional patterns, our knowledge about a domain is incomplete. We have enhanced the decision tree algorithm by defining a framework for capturing intra and inter-class exceptions at leaf nodes of a decision tree. The proposed algorithm (EDTADE) is applied to many datasets from UCI Machine Learning Repository. The results show that the EDTADE has been successful in discovering many intra and inter-class exceptions. The decision tree augmented with intra and inter-class exceptions are more accurate, comprehensible as well as interesting since these provide additional knowledge in the form of exceptional patterns that deviate from the general rules discovered for classification


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document