Feature grouping-based parallel outlier mining of categorical data using spark

Information Sciences ◽

10.1016/j.ins.2019.07.045 ◽

2019 ◽

Vol 504 ◽

pp. 1-19

Author(s):

Junli Li ◽

Jifu Zhang ◽

Xiao Qin ◽

Yaling Xun

Keyword(s):

Categorical Data ◽

Outlier Mining ◽

Feature Grouping

Download Full-text

Computing Mutual Information of Big Categorical Data and Its Application to Feature Grouping

2020 IEEE 36th International Conference on Data Engineering (ICDE) ◽

10.1109/icde48307.2020.00210 ◽

2020 ◽

Author(s):

Junli Li ◽

Chaowei Zhang ◽

Jifu Zhang ◽

Xiao Qin

Keyword(s):

Mutual Information ◽

Categorical Data ◽

Feature Grouping

Download Full-text

Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2018.2847625 ◽

2020 ◽

Vol 50 (11) ◽

pp. 4295-4308 ◽

Author(s):

Junli Li ◽

Jifu Zhang ◽

Ning Pang ◽

Xiao Qin

Keyword(s):

Outlier Detection ◽

Categorical Data ◽

High Dimensional ◽

Feature Grouping

Download Full-text

Analyzing sequential categorical data on dyadic interaction: A comment on Gottman.

Psychological Bulletin ◽

10.1037/0033-2909.91.2.393 ◽

1982 ◽

Vol 91 (2) ◽

pp. 393-403 ◽

Author(s):

Paul D. Allison ◽

Jeffrey K. Liker

Keyword(s):

Categorical Data ◽

Dyadic Interaction

Download Full-text

Maximum Likelihood Methods for Association Models in Ordered Categorical Data: Multi-Way Case

Behaviormetrika ◽

10.2333/bhmk.15.23_85 ◽

1988 ◽

Vol 15 (23) ◽

pp. 85-91 ◽

Author(s):

Masaaki Tsujitani

Keyword(s):

Maximum Likelihood ◽

Categorical Data ◽

Likelihood Methods ◽

Association Models ◽

Ordered Categorical Data ◽

Maximum Likelihood Methods ◽

Ordered Categorical

Download Full-text

AN EMINENT WAY OF AN IMPROVING A DENCLUE ALGORITHM APPROACH FOR OUTLIER MINING IN LARGE DATABASE

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.536540 ◽

2018 ◽

Vol 6 (10) ◽

pp. 536-540 ◽

Author(s):

R. Prabahari ◽

M. Ramalingam

Keyword(s):

Large Database ◽

Download Full-text

Non-Mode Clustering of Categorical Data with Attributes Weighting

Journal of Software ◽

10.3724/sp.j.1001.2013.04470 ◽

2014 ◽

Vol 24 (11) ◽

pp. 2628-2641 ◽

Author(s):

Li-Fei CHEN ◽

Gong-De GUO

Keyword(s):

Categorical Data

Download Full-text

Machine Learning Based Predictive Action on Categorical Non-Sequential Data

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190417150421 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1020-1030

Author(s):

Pradeep S. ◽

Jagadish S. Kallimani

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Categorical Data ◽

Numerical Data ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Sequential Data ◽

Industry Standard ◽

Robust Model ◽

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.

Download Full-text

Low Dimensional Representation of Space Structure and Clustering of Categorical Data

2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) ◽

10.1109/bdcloud.2018.00161 ◽

2018 ◽

Author(s):

Jianjun Cao ◽

Qibin Zheng ◽

Nianfeng Weng ◽

Xingchun Diao

Keyword(s):

Categorical Data ◽

Space Structure ◽

Dimensional Representation ◽

Representation Of Space ◽

Low Dimensional

Download Full-text

Bayesian Models for Categorical Data

10.1002/0470092394 ◽

2005 ◽

Author(s):

Peter Congdon

Keyword(s):

Categorical Data ◽

Bayesian Models

Download Full-text

Categorical Data Analysis for Geographers and Environmental Scientists

Economic Geography ◽

10.2307/144098 ◽

1986 ◽

Vol 62 (2) ◽

pp. 192 ◽

Author(s):

Joel L. Horowitz ◽

Neil Wrigley

Keyword(s):

Data Analysis ◽

Categorical Data ◽

Categorical Data Analysis

Download Full-text