Classification Methods

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch032 ◽

2011 ◽

pp. 196-201 ◽

Cited By ~ 2

Author(s):

Aijun An

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Credit Card ◽

A Priori ◽

Training Data ◽

Classification Model ◽

Highly Active ◽

Data Object ◽

Patient Database ◽

Data Objects

Generally speaking, classification is the action of assigning an object to a category according to the characteristics of the object. In data mining, classification refers to the task of analyzing a set of pre-classified data objects to learn a model (or a function) that can be used to classify an unseen data object into one of several predefined classes. A data object, referred to as an example, is described by a set of attributes or variables. One of the attributes describes the class that an example belongs to and is thus called the class attribute or class variable. Other attributes are often called independent or predictor attributes (or variables). The set of examples used to learn the classification model is called the training data set. Tasks related to classification include regression, which builds a model from training data to predict numerical values, and clustering, which groups examples to form categories. Classification belongs to the category of supervised learning, distinguished from unsupervised learning. In supervised learning, the training data consists of pairs of input data (typically vectors), and desired outputs, while in unsupervised learning there is no a priori output. Classification has various applications, such as learning from a patient database to diagnose a disease based on the symptoms of a patient, analyzing credit card transactions to identify fraudulent transactions, automatic recognition of letters or digits based on handwriting samples, and distinguishing highly active compounds from inactive ones based on the structures of compounds for drug discovery.

Download Full-text

Classification Methods

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch028 ◽

2011 ◽

pp. 144-149 ◽

Cited By ~ 1

Author(s):

Aijun An

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

A Priori ◽

Training Data ◽

Classification Model ◽

Data Set ◽

Data Object ◽

Unseen Data ◽

Data Objects ◽

Class Variable

Download Full-text

DATA MINING FOR THE MANAGEMENT OF SOFTWARE DEVELOPMENT PROCESS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194004001841 ◽

2004 ◽

Vol 14 (06) ◽

pp. 665-695 ◽

Cited By ~ 6

Author(s):

J. L. ÁLVAREZ-MACÍAS ◽

J. MATA-VÁZQUEZ ◽

J. C. RIQUELME-SANTOS

Keyword(s):

Data Mining ◽

Software Development ◽

Unsupervised Learning ◽

Supervised Learning ◽

Development Process ◽

A Priori ◽

Post Mortem ◽

Software Development Process ◽

A Priori Analysis ◽

Mining Tools

In this paper we present a new method for the application of data mining tools on the management phase of software development process. Specifically, we describe two tools, the first one based on supervised learning, and the second one on unsupervised learning. The goal of this method is to induce a set of management rules that make easy the development process to the managers. Depending on how and to what is this method applied, it will permit an a priori analysis, a monitoring of the project or a post-mortem analysis.

Download Full-text

Semi-Supervised Classification and its Application to Filtering IDS False Positives

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2309 ◽

2013 ◽

Vol 427-429 ◽

pp. 2309-2312

Author(s):

Hai Bin Mei ◽

Ming Hua Zhang

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

Classification Performance ◽

False Positives ◽

Training Data ◽

Classification Model ◽

Classification Technique

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.

Download Full-text

Classification Based on Unsupervised Learning

Statistical Techniques for Network Security ◽

10.4018/978-1-59904-708-9.ch010 ◽

2011 ◽

pp. 348-395

Author(s):

Yu Wang

Keyword(s):

Network Security ◽

Unsupervised Learning ◽

Supervised Learning ◽

Network Traffic ◽

High Speed ◽

Ad Hoc ◽

Training Data ◽

Traffic Data ◽

Response Variable ◽

Learning Techniques

The requirement for having a labeled response variable in training data from the supervised learning technique may not be satisfied in some situations: particularly, in dynamic, short-term, and ad-hoc wireless network access environments. Being able to conduct classification without a labeled response variable is an essential challenge to modern network security and intrusion detection. In this chapter we will discuss some unsupervised learning techniques including probability, similarity, and multidimensional models that can be applied in network security. These methods also provide a different angle to analyze network traffic data. For comprehensive knowledge on unsupervised learning techniques please refer to the machine learning references listed in the previous chapter; for their applications in network security see Carmines, Edward & McIver (1981), Lane & Brodley (1997), Herrero, Corchado, Gastaldo, Leoncini, Picasso & Zunino (2007), and Dhanalakshmi & Babu (2008). Unlike in supervised learning, where for each vector 1 2 ( , , , ) n X x x x = ? we have a corresponding observed response, Y, in unsupervised learning we only have X, and Y is not available either because we could not observe it or its frequency is too low to be fit ted with a supervised learning approach. Unsupervised learning has great meanings in practice because in many circumstances, available network traffic data may not include any anomalous events or known anomalous events (e.g., traffics collected from a newly constructed network system). While high-speed mobile wireless and ad-hoc network systems have become popular, the importance and need to develop new unsupervised learning methods that allow the modeling of network traffic data to use anomaly-free training data have significantly increased.

Download Full-text

A Density-Based Method for the Identification of Non-Disjoint Clusters With Arbitrary and Non-Spherical Shapes

Computer Science ◽

10.7494/csci.2021.22.2.4002 ◽

2021 ◽

Vol 22 (2) ◽

Author(s):

Chiheb Eddine Ben Ncir

Keyword(s):

Unsupervised Learning ◽

Clustering Methods ◽

Clustering Method ◽

Overlapping Clustering ◽

Overlapping Clusters ◽

Data Object ◽

Complex Shapes ◽

Important Challenge ◽

Data Objects

Overlapping clustering is an important challenge in unsupervised learning applications while it allows for each data object to belong to more than one group. Several clustering methods were proposed to deal with this requirement by using several usual clustering approaches. Although the ability of these methods to detect non-disjoint partitioning, they fail when data contain groups with arbitrary and non-spherical shapes. We propose in this work a new density based overlapping clustering method, referred to as OC-DD, which is able to detect overlapping clusters even having non-spherical and complex shapes. The proposed method is based on the density and distances to detect dense regions in data while allowing for some data objects to belong to more than one group.Experiments performed on articial and real multi-labeled datasets have shown the effectiveness of the proposed method compared to the existing ones.

Download Full-text

Learning-Based Dissimilarity for Clustering Categorical Data

Applied Sciences ◽

10.3390/app11083509 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3509

Author(s):

Edgar Jacob Rivera Rios ◽

Miguel Angel Medina-Pérez ◽

Manuel S. Lazo-Cortés ◽

Raúl Monroy

Keyword(s):

Machine Learning ◽

Categorical Data ◽

Confusion Matrix ◽

Dissimilarity Measure ◽

Classification Model ◽

Data Object ◽

Object Distance ◽

Data Objects ◽

The University ◽

Attribute Space

Comparing data objects is at the heart of machine learning. For continuous data, object dissimilarity is usually taken to be object distance; however, for categorical data, there is no universal agreement, for categories can be ordered in several different ways. Most existing category dissimilarity measures characterize the distance among the values an attribute may take using precisely the number of different values the attribute takes (the attribute space) and the frequency at which they occur. These kinds of measures overlook attribute interdependence, which may provide valuable information when capturing per-attribute object dissimilarity. In this paper, we introduce a novel object dissimilarity measure that we call Learning-Based Dissimilarity, for comparing categorical data. Our measure characterizes the distance between two categorical values of a given attribute in terms of how likely it is that such values are confused or not when all the dataset objects with the remaining attributes are used to predict them. To that end, we provide an algorithm that, given a target attribute, first learns a classification model in order to compute a confusion matrix for the attribute. Then, our method transforms the confusion matrix into a per-attribute dissimilarity measure. We have successfully tested our measure against 55 datasets gathered from the University of California, Irvine (UCI) Machine Learning Repository. Our results show that it surpasses, in terms of various performance indicators for data clustering, the most prominent distance relations put forward in the literature.

Download Full-text

A Classification Model of Legal Consulting Questions Based on Multi-Attention Prototypical Networks

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00053-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Jianzhou Feng ◽

Jinman Cui ◽

Qikai Wei ◽

Zhengji Zhou ◽

Yuxiong Wang

Keyword(s):

Supervised Learning ◽

Language Processing ◽

Text Classification ◽

Question Answering ◽

Training Data ◽

Classification Model ◽

Great Progress ◽

Public Datasets ◽

The Cost

AbstractText classification is a research hotspot in the field of natural language processing. Existing text classification models based on supervised learning, especially deep learning models, have made great progress on public datasets. But most of these methods rely on a large amount of training data, and these datasets coverage is limited. In the legal intelligent question-answering system, accurate classification of legal consulting questions is a necessary prerequisite for the realization of intelligent question answering. However, due to lack of sufficient annotation data and the cost of labeling is high, which lead to the poor effect of traditional supervised learning methods under sparse labeling. In response to the above problems, we construct a few-shot legal consulting questions dataset, and propose a prototypical networks model based on multi-attention. For the same category of instances, this model first highlights the key features in the instances as much as possible through instance-dimension level attention. Then it realizes the classification of legal consulting questions by prototypical networks. Experimental results show that our model achieves state-of-the-art results compared with baseline models. The code and dataset are released on https://github.com/cjm0824/MAPN.

Download Full-text

Hierarchical Classification of Urban ALS Data by Using Geometry and Intensity Information

Sensors ◽

10.3390/s19204583 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4583 ◽

Cited By ~ 1

Author(s):

Xiaoqiang Liu ◽

Yanming Chen ◽

Shuyi Li ◽

Liang Cheng ◽

Manchun Li

Keyword(s):

Supervised Learning ◽

Laser Scanning ◽

Large Scale ◽

Three Dimensional ◽

Hierarchical Classification ◽

Training Data ◽

Classification Model ◽

Learning Method ◽

Intensity Information

Airborne laser scanning (ALS) can acquire both geometry and intensity information of geo-objects, which is important in mapping a large-scale three-dimensional (3D) urban environment. However, the intensity information recorded by ALS will be changed due to the flight height and atmospheric attenuation, which decreases the robustness of the trained supervised classifier. This paper proposes a hierarchical classification method by separately using geometry and intensity information of urban ALS data. The method uses supervised learning for stable geometry information and unsupervised learning for fluctuating intensity information. The experiment results show that the proposed method can utilize the intensity information effectively, based on three aspects, as below. (1) The proposed method improves the accuracy of classification result by using intensity. (2) When the ALS data to be classified are acquired under the same conditions as the training data, the performance of the proposed method is as good as the supervised learning method. (3) When the ALS data to be classified are acquired under different conditions from the training data, the performance of the proposed method is better than the supervised learning method. Therefore, the classification model derived from the proposed method can be transferred to other ALS data whose intensity is inconsistent with the training data. Furthermore, the proposed method can contribute to the hierarchical use of some other ALS information, such as multi-spectral information.

Download Full-text

Technology of the Surround

Catalyst Feminism Theory Technoscience ◽

10.28968/cftt.v7i2.35973 ◽

2021 ◽

Vol 7 (2) ◽

Author(s):

Beth Coleman

Keyword(s):

Machine Learning ◽

Machine Vision ◽

Unsupervised Learning ◽

Supervised Learning ◽

Status Quo ◽

Training Data ◽

Reproductive Condition ◽

Game Play ◽

Key Concepts ◽

In The Wild

In addressing the issue of harmful bias in AI systems, this paper asks for a consideration of a generatively wild AI that exceeds the framework of predictive machine learning. The argument places supervised learning with its labeled training data as primarily a form of reproduction of a status quo. Based on this framework, the paper moves through an analysis of two AI modalities—supervised learning (e.g., machine vision) and unsupervised learning (e.g., game play)—to demonstrate the potential of AI as mechanism that creates patterns of association outside of a purely reproductive condition. This analysis is followed by an introduction to the concept of the technology of the surround, where the paper then turns toward theoretical positions that unbind categorical logics, moving toward other possible positionalities—the surround (Harney and Moten), alien intelligence (Parisi), and intra-actions of subject/object resolution (Barad). The paper frames two key concepts in relation to an AI in the wild: the colonial sublime and black techné. The paper concludes with a summation of what AI in the wild can contribute to the subversion of technologies of oppression toward a liberatory potential of AI.

Download Full-text

Unsupervised Learning Architecture for Classifying the Transient Noise of Interferometric Gravitational-Wave Detectors

10.21203/rs.3.rs-1094374/v1 ◽

2021 ◽

Author(s):

Yusuke Sakai ◽

Yousuke Itoh ◽

Piljong Jung ◽

Keiko Kokeyama ◽

Chihiro Kozakai ◽

...

Keyword(s):

Gravitational Wave ◽

Unsupervised Learning ◽

Supervised Learning ◽

Training Data ◽

High Rate ◽

Time Frequency ◽

Frequency Representation ◽

Gravitational Wave Detectors ◽

Non Gaussian

Abstract In the data of laser interferometric gravitational wave detectors, transient noise with non-stationary and non-Gaussian features occurs at a high rate. It often causes problems such as instability of the detector, hiding and/or imitating gravitational-wave signals. This transient noise has various characteristics in the time-frequency representation, which is considered to be associated with environmental and instrumental origins. Classification of transient noise can offer one of the clues for exploring its origin and improving the performance of the detector. One approach for the classification of these noises is supervised learning. However, generally, supervised learning requires annotation of the training data, and there are issues with ensuring objectivity in the classification and its corresponding new classes. On the contrary, unsupervised learning can reduce the annotation work for the training data and ensuring objectivity in the classification and its corresponding new classes. In this study, we propose an architecture for the classification of transient noise by using unsupervised learning, which combines a variational autoencoder and invariant information clustering. To evaluate the effectiveness of the proposed architecture, we used the dataset (time-frequency two-dimensional spectrogram images and labels) of the LIGO first observation run prepared by the Gravity Spy project. We obtain the consistency between the label annotated by Gravity spy project and the class provided by our proposed unsupervised learning architecture and provide the potential for the existence of the unrevealed classes.

Download Full-text