Research of Clustering Algorithms using Enhanced Feature Selection

As we know use of Internet flourishes with its full velocity and in all dimensions. Enormous availability of Text documents in digital form (email, web pages, blog post, news articles, ebooks and other text files) on internet challenges technology to appropriate retrieval of document as a response for any search query. As a result there has been an eruption of interest in people to mine these vast resources and classify them properly. It invigorates researchers and developers to work on numerous approaches of document clustering. Researchers got keen interest in this problem of text mining. The aim of this chapter is to summarised different document clustering algorithms used by researchers.

Download Full-text

Optimized Swarm Search-Based Feature Selection for Text Mining in Sentiment Analysis

2015 IEEE International Conference on Data Mining Workshop (ICDMW) ◽

10.1109/icdmw.2015.231 ◽

2015 ◽

Cited By ~ 4

Author(s):

Simon Fong ◽

Elisa Gao ◽

Raymond Wong

Keyword(s):

Feature Selection ◽

Text Mining ◽

Sentiment Analysis ◽

Selection For

Download Full-text

Effective Feature Selection for 5G IM Applications Traffic Classification

Mobile Information Systems ◽

10.1155/2017/6805056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Muhammad Shafiq ◽

Xiangzhan Yu ◽

Asif Ali Laghari ◽

Dawei Wang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Statistical Test ◽

Traffic Classification ◽

Features Selection ◽

Traffic Flows ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Wrapper Method ◽

Selection For

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

Download Full-text

Document Clustering

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch003 ◽

2018 ◽

pp. 47-64 ◽

Cited By ~ 2

Author(s):

Harsha Patil ◽

R. S. Thakur

Keyword(s):

Text Mining ◽

Clustering Algorithms ◽

Document Clustering ◽

Web Pages ◽

Digital Form ◽

Search Query ◽

Text Documents ◽

Keen Interest ◽

Use Of Internet

As we know use of Internet flourishes with its full velocity and in all dimensions. Enormous availability of Text documents in digital form (email, web pages, blog post, news articles, ebooks and other text files) on internet challenges technology to appropriate retrieval of document as a response for any search query. As a result there has been an eruption of interest in people to mine these vast resources and classify them properly. It invigorates researchers and developers to work on numerous approaches of document clustering. Researchers got keen interest in this problem of text mining. The aim of this chapter is to summarised different document clustering algorithms used by researchers.

Download Full-text

Distance Variance Score: An Efficient Feature Selection Method in Text Classification

Mathematical Problems in Engineering ◽

10.1155/2015/695720 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Heyong Wang ◽

Ming Hong

Keyword(s):

Feature Selection ◽

Text Mining ◽

Text Classification ◽

Web Applications ◽

Rapid Development ◽

Feature Selection Method ◽

Selection Method ◽

Text Documents ◽

Unsupervised Feature Selection ◽

Classification Feature

With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining. Text classification is one of the most important subfields of text mining. In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification. Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM. An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification. Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition. However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM. To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS). DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features. Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM. Thus, it is much more efficient than LS.

Download Full-text

Interval-Valued Feature Selection for Classification of Text Documents

Advances in Intelligent Systems and Computing - Intelligent Systems Design and Applications ◽

10.1007/978-3-030-71187-0_95 ◽

2021 ◽

pp. 1028-1038

Author(s):

N. Vinay Kumar ◽

K. Swarnalatha ◽

D. S. Guru ◽

B. S. Anami

Keyword(s):

Feature Selection ◽

Text Documents ◽

Selection For ◽

Interval Valued

Download Full-text

Stacked ensemble coupled with feature selection for biomedical entity extraction

Knowledge-Based Systems ◽

10.1016/j.knosys.2013.02.008 ◽

2013 ◽

Vol 46 ◽

pp. 22-32 ◽

Cited By ~ 19

Author(s):

Asif Ekbal ◽

Sriparna Saha

Keyword(s):

Feature Selection ◽

Entity Extraction ◽

Selection For

Download Full-text

Feature Selection for Clustering of Homicide Rates in the Brazilian State of Goias

CLEI electronic journal ◽

10.19153/cleiej.22.3.1 ◽

2019 ◽

Vol 22 (3) ◽

Author(s):

Samuel Bruno da Silva Sousa ◽

Ronaldo de Castro Del-Fiaco ◽

Lilian Berton

Keyword(s):

Feature Selection ◽

Adult Population ◽

Clustering Algorithms ◽

The State ◽

Urban Centers ◽

Homicide Rates ◽

Selection Models ◽

Feature Importance ◽

Selection For ◽

Socio Economic Variables

Homicide is recognized as one of the most violent types of crime. In some countries, it is a hard problem to tackle because of its high occurrence and the lack of research on it. In Brazil, this problem is even harder, since this country is responsible for about 10% of the homicides in the world. Some Brazilian states suffer from the rise of homicide rates, like the state of Goi´as, in which its homicide rate increased from 24.5 per 100,000 in 2002 to 42.6 per 100,000 in 2014, becoming one of the five most violent states of Brazil, despite of having few population. This paper aims at applying clustering algorithms and feature selection models on criminal data concerning homicides and socio-economic variables in the state of Goi´as. We employed three clustering algorithms: K-means, Densitybased, and Hierarchical; as well as two feature selection models: Univariate Selection and Feature Importance. Our results indicate that homicide rates are more recurrent in large urban centers, although these cities have the best socio-economic indicators. Population and the educational level of the adult population were the variables which most influenced the results. K-means clustering brought the optimum outcomes, and Univariate Selection better selected attributes of the database.

Download Full-text

Dropout-based feature selection for scRNASeq

10.1101/065094 ◽

2016 ◽

Cited By ~ 14

Author(s):

Tallulah S. Andrews ◽

Martin Hemberg

Keyword(s):

Feature Selection ◽

Dimensionality Reduction ◽

Single Cell ◽

Relevant Information ◽

Features Selection ◽

Technical Noise ◽

Biologically Relevant ◽

Selection For ◽

Cell Expression ◽

Variable Genes

AbstractFeatures selection is a key step in many single-cell RNASeq (scRNASeq) analyses. Feature selection is intended to preserve biologically relevant information while removing genes only subject to technical noise. As it is frequently performed prior to dimensionality reduction, clustering and pseudotime analyses, feature selection can have a major impact on the results. Several different approaches have been proposed for unsupervised feature selection from unprocessed single-cell expression matrices, most based upon identifying highly variable genes in the dataset. We present two methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show that dropout-based feature selection outperforms variance-based feature selection for multiple applications of single-cell RNASeq.

Download Full-text

An Enhanced Feature Selection for Text Documents

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-32-9690-9_3 ◽

2019 ◽

pp. 21-29

Author(s):

Venkata Nagaraju Thatha ◽

A. Sudhir Babu ◽

D. Haritha

Keyword(s):

Feature Selection ◽

Text Documents ◽

Selection For

Download Full-text