numeric attributes Latest Research Papers

LATEX-Numeric: Language Agnostic Text Attribute Extraction for Numeric Attributes

10.18653/v1/2021.naacl-industry.34 ◽

2021 ◽

Author(s):

Kartik Mehta ◽

Ioana Oprea ◽

Nikhil Rasiwasia

Keyword(s):

Attribute Extraction ◽

Numeric Attributes

For real: a thorough look at numeric attributes in subgroup discovery

Data Mining and Knowledge Discovery ◽

10.1007/s10618-020-00703-x ◽

2020 ◽

Author(s):

Marvin Meeng ◽

Arno Knobbe

Keyword(s):

Pattern Mining ◽

Experimental Comparison ◽

Subgroup Discovery ◽

Data Types ◽

Real World Data ◽

Fine Grained ◽

Subgroup Selection ◽

Numeric Data ◽

Numeric Attributes

Abstract Subgroup discovery (SD) is an exploratory pattern mining paradigm that comes into its own when dealing with large real-world data, which typically involves many attributes, of a mixture of data types. Essential is the ability to deal with numeric attributes, whether they concern the target (a regression setting) or the description attributes (by which subgroups are identified). Various specific algorithms have been proposed in the literature for both cases, but a systematic review of the available options is missing. This paper presents a generic framework that can be instantiated in various ways in order to create different strategies for dealing with numeric data. The bulk of the work in this paper describes an experimental comparison of a considerable range of numeric strategies in SD, where these strategies are organised according to four central dimensions. These experiments are furthermore repeated for both the classification task (target is nominal) and regression task (target is numeric), and the strategies are compared based on the quality of the top subgroup, and the quality and redundancy of the top-k result set. Results of three search strategies are compared: traditional beam search, complete search, and a variant of diverse subgroup set discovery called cover-based subgroup selection. Although there are various subtleties in the outcome of the experiments, the following general conclusions can be drawn: it is often best to determine numeric thresholds dynamically (locally), in a fine-grained manner, with binary splits, while considering multiple candidate thresholds per attribute.

Fog-Computing-Based Approximate Spatial Keyword Queries With Numeric Attributes in IoV

IEEE Internet of Things Journal ◽

10.1109/jiot.2020.2965730 ◽

2020 ◽

Vol 7 (5) ◽

pp. 4304-4316 ◽

Cited By ~ 1

Author(s):

Yanhong Li ◽

Rongbo Zhu ◽

Shiwen Mao ◽

Ashiq Anjum

Keyword(s):

Fog Computing ◽

Numeric Attributes

Data Mining and Machine Learning ◽

10.1017/9781108564175.004 ◽

2020 ◽

pp. 29-60

Keyword(s):

Numeric Attributes

Taking into account Qualitative and Textual Variables in Hierarchical Ascending Clustering (HAC)

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4276.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 1555-1561

Keyword(s):

Machine Learning ◽

Data Science ◽

Free Text ◽

Clustering Methods ◽

Clustering Method ◽

Textual Data ◽

Different Types ◽

Qualitative Variables ◽

Comparison Of The Results ◽

Numeric Attributes

In Machine Learning, the clustering methods are the mains unsupervised methods. Their objectives is to partition a set of objects in some homogeneously groups. Clustering methods in general and more particularly Hierarchical Ascending Clustering (HAC) techniques are based on metrics and ultra-metrics. Metrics are used to evaluate the similarities between two objects; and ultra-metrics are used to estimate the similarity of two groups or the similarity of an element and a group. The main characteristic of these metrics and ultra-metrics is the fact that they are only adapted to numerical variables or can be reduced to them. With the advent of Data Mining and Data Science, most of the datasets to be analyzed contain different types of variables. In the same dataset, we can find numeric attributes, qualitative variables and free text fields very often together. Despite this diversity of variables in the same dataset, the existed clustering methods are generally build to use only an unique kind of attribute. In this paper, we propose an approach to take account different types of attributes in the same clustering method. The method proposed is a variant of HAC methods that can take into account both numerical, qualitative and textual data. Our approach is based on a metric call Phi-Similarity we developed in order to estimate the proximity of two objects, each of them is describe by a vector of attributes of different types. The developed method will be implemented with the scientific computing language R and applied to real survey data. A comparison of the results will be made with HAC techniques based on classical metrics with the Ward criterion as aggregation criteria. For classical algorithms, we will limit ourselves to the variables of the database compatible with them. This work of comparison will highlight the gain in precision in terms of classification brought by our method compared to the classic versions of HAC