categorical attribute Latest Research Papers

AbstractMost privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Distance-based methods, in particular, have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private family of algorithms for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We define different variants of our algorithm and we show empirically that our approach consumes little privacy budget while providing accurate distances, making it suitable in distance-based applications, such as clustering and classification.

ICAI-SR: Item Categorical Attribute Integrated Sequential Recommendation

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463060 ◽

2021 ◽

Author(s):

Xu Yuan ◽

Dongsheng Duan ◽

Lingling Tong ◽

Lei Shi ◽

Cheng Zhang

Keyword(s):

Categorical Attribute

Prediction of House Price Using XGBoost Regression Algorithm

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1870 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

J. Avanijaa, Et. al.

Keyword(s):

House Price ◽

Regression Technique ◽

Land Value ◽

Processing Data ◽

Categorical Attribute ◽

Null Values ◽

To Come

House price fluctuates each and every year due to changes in land value and change in infrastructure in and around the area. Centralised system should be available for prediction of house price in correlation with neighbourhood and infrastructure, will help customer to estimate the price of the house. Also, it assists the customer to come to a conclusion where to buy a house and when to purchase the house. Different factors are taken into consideration while predicting the worth of the house like location, neighbourhood and various amenities like garage space etc. Developing a model starts with Pre-processing data to remove all sort of discrepancies and fill null values or remove data outliers and make data ready to be processed. The categorical attribute can be converted into required attributes using one hot encoding methodology. Later the house price is predicted using XGBoost regression technique.

A study on Two-Stage Mixed Attribute Data Clustering Based on Density Peaks

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/2 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Shihua Liu ◽

Hao Zhang ◽

Xianghua Liu

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Two Stage ◽

One Dimensional ◽

Attribute Data ◽

Numerical Attributes ◽

Density Peaks ◽

Density Peaks Clustering ◽

Categorical Attribute ◽

Attribute Clustering

A Two-stage clustering framework and a clustering algorithm for mixed attribute data based on density peaks and Goodall distance are proposed. Firstly, the subset of numerical attributes of the dataset is clustered, and then the result is mapped into one-dimensional categorical attribute and added to the subset of categorical attribute data. Finally, the new dataset is clustered by the density peaks clustering algorithm to obtain the final result. Experiments on three commonly used UCI datasets show that this algorithm can effectively realize mixed attribute clustering and produce better clustering results than the traditional K-prototypes algorithm do. The clustering accuracy on the Acute, Heart and Credit datasets are 17%, 24%, and 21% higher on average than that of the K-prototypes, respectively.

Graphs from Features: Tree-Based Graph Layout for Feature Analysis

Algorithms ◽

10.3390/a13110302 ◽

2020 ◽

Vol 13 (11) ◽

pp. 302

Author(s):

Rosane Minghim ◽

Liz Huancapaza ◽

Erasmo Artur ◽

Guilherme P. Telles ◽

Ivar V. Belizario

Keyword(s):

Data Analysis ◽

Feature Analysis ◽

Compact Representation ◽

Graph Layout ◽

Tree Graph ◽

Node Placement ◽

On Demand ◽

Feature Similarity ◽

Graph Layouts ◽

Categorical Attribute

Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.

A Novel Categorical Data Attribute Split Technique in Decision Tree Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2568.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1607-1612

Keyword(s):

Decision Tree ◽

Categorical Data ◽

Ease Of Use ◽

Decision Tree Learning ◽

Current Node ◽

Categorical Attributes ◽

Categorical Attribute ◽

A New Technique ◽

Class Labels ◽

Better Than

A new technique is proposed for splitting categorical data during the process of decision tree learning. This technique is based on the class probability representations and manipulations of the class labels corresponding to the distinct values of categorical attributes. For each categorical attribute aggregate similarity in terms of class probabilities is computed and then based on the highest aggregated similarity measure the best attribute is selected and then the data in the current node of the decision tree is divided into the number of sub sets equal to the number of distinct values of the best categorical split attribute. Many experiments are conducted using this proposed method and the results have shown that the proposed technique is better than many other competitive methods in terms of efficiency, ease of use, understanding, and output results and it will be useful in many modern applications.

Clustering Behavioral Data for Advertising Purposes using K-Prototypes Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5229.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2329-2334

Keyword(s):

Social Media ◽

Text Messaging ◽

Clustering Algorithm ◽

Behavioral Data ◽

The Internet ◽

Telemetry Data ◽

Categorical Attribute ◽

Cluster A ◽

A Company ◽

Potential Customers

Understanding the customer sentiment is very important when it comes to advertising. To appeal to their current and potential customers, a company must understand the market interests. Companies can segment their customers by using surveys and telemetry data to get to know the customer’s interests. One way of segmenting the customer is by grouping or clustering them according to their interests and behaviors. In this study, the k-prototypes clustering algorithm, which is an improved combination of k-means and k-modes algorithm, will be used to cluster a behavioral data that contains both numerical and categorical attribute, obtained from a survey conducted on teenagers into clusters of 4, 5, and 6. Each cluster will contain teenagers with certain behavior different from other clusters. And then by analyzing the results, advertisers will be able to define a profile that indicates their interests regarding the internet, social media and text messaging, effectively revealing the kind of ad that would be relatable for them.

Bayesian network model for quality control with categorical attribute data

Applied Soft Computing ◽

10.1016/j.asoc.2019.105746 ◽

2019 ◽

Vol 84 ◽

pp. 105746 ◽

Cited By ~ 2

Author(s):

Barry R. Cobb ◽

Linda Li

Keyword(s):

Quality Control ◽

Bayesian Network ◽

Network Model ◽

Bayesian Network Model ◽

Attribute Data ◽

Categorical Attribute

Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

Intelligent Data Engineering and Automated Learning – IDEAL 2019 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-33607-3_3 ◽

2019 ◽

pp. 20-27 ◽

Cited By ~ 2

Author(s):

Nádia Junqueira Martarelli ◽

Marcelo Seido Nagano

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Mixed Data ◽

Attribute Weights ◽

Categorical Attribute

A Multi-Level Privacy-Preserving Approach to Hierarchical Data Based on Fuzzy Set Theory

Symmetry ◽

10.3390/sym10080333 ◽

2018 ◽

Vol 10 (8) ◽

pp. 333

Author(s):

Jinyan Wang ◽

Guoqing Cai ◽

Chen Liu ◽

Jingli Wu ◽

Xianxian Li

Keyword(s):

Data Privacy ◽

Clinical Decision ◽

Privacy Preserving ◽

Computer Assisted ◽

Hierarchical Data ◽

Knowledge Based ◽

Privacy Model ◽

Multi Level ◽

Categorical Attribute ◽

Attribute Value

Nowadays, more and more applications are dependent on storage and management of semi-structured information. For scientific research and knowledge-based decision-making, such data often needs to be published, e.g., medical data is released to implement a computer-assisted clinical decision support system. Since this data contains individuals’ privacy, they must be appropriately anonymized before to be released. However, the existing anonymization method based on l-diversity for hierarchical data may cause a serious similarity attack, and cannot protect data privacy very well. In this paper, we utilize fuzzy sets to divide levels for sensitive numerical and categorical attribute values uniformly (a categorical attribute value can be converted into a numerical attribute value according to its frequency of occurrences), and then transform the value levels to sensitivity levels. The privacy model ( α l e v h , k)-anonymity for hierarchical data with multi-level sensitivity is proposed. Furthermore, we design a privacy-preserving approach to achieve this privacy model. Experiment results demonstrate that our approach is obviously superior to existing anonymous approach in hierarchical data in terms of utility and security.

categorical attribute
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Differentially Private Distance Learning in Categorical Data

ICAI-SR: Item Categorical Attribute Integrated Sequential Recommendation

Prediction of House Price Using XGBoost Regression Algorithm

A study on Two-Stage Mixed Attribute Data Clustering Based on Density Peaks

Graphs from Features: Tree-Based Graph Layout for Feature Analysis

A Novel Categorical Data Attribute Split Technique in Decision Tree Learning

Clustering Behavioral Data for Advertising Purposes using K-Prototypes Algorithm

Bayesian network model for quality control with categorical attribute data

Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

A Multi-Level Privacy-Preserving Approach to Hierarchical Data Based on Fuzzy Set Theory

Export Citation Format

categorical attributeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Differentially Private Distance Learning in Categorical Data

ICAI-SR: Item Categorical Attribute Integrated Sequential Recommendation

Prediction of House Price Using XGBoost Regression Algorithm

A study on Two-Stage Mixed Attribute Data Clustering Based on Density Peaks

Graphs from Features: Tree-Based Graph Layout for Feature Analysis

A Novel Categorical Data Attribute Split Technique in Decision Tree Learning

Clustering Behavioral Data for Advertising Purposes using K-Prototypes Algorithm

Bayesian network model for quality control with categorical attribute data

Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

A Multi-Level Privacy-Preserving Approach to Hierarchical Data Based on Fuzzy Set Theory

categorical attribute
Recently Published Documents