Patent Text Classification Based on Naive Bayesian Method

Text classification has many applications in text processing and information retrieval. Instance-based learning (IBL) is among the top-performing text classification methods. However, its effectiveness depends on the distance function it uses to determine similar documents. In this study, we evaluate some popular distance measures’ performance and propose new ones that exploit word frequencies and the ordinal relationship between them. In particular, we propose new distance measures that are based on the value distance metric (VDM) and the inverted specific-class distance measure (ISCDM). The proposed measures are suitable for documents represented as vectors of word frequencies. We compare these measures’ performance with their original counterparts and with powerful Naïve Bayesian-based text classification algorithms. We evaluate the proposed distance measures using the kNN algorithm on 18 benchmark text classification datasets. Our empirical results reveal that the distance metrics for nominal values render better classification results for text classification than the Euclidean distance measure for numeric values. Furthermore, our results indicate that ISCDM substantially outperforms VDM, but it is also more susceptible to make use of the ordinal nature of term-frequencies than VDM. Thus, we were able to propose more ISCDM-based distance measures for text classification than VDM-based measures. We also compare the proposed distance measures with Naïve Bayesian-based text classification, namely, multinomial Naïve Bayes (MNB), complement Naïve Bayes (CNB), and the one-versus-all-but-one (OVA) model. It turned out that when kNN uses some of the proposed measures, it outperforms NB-based text classifiers for most datasets.

Download Full-text

Using Naïve Bayesian method for plant leaf classification based on shape and texture features

2015 International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM) ◽

10.1109/hnicem.2015.7393179 ◽

2015 ◽

Cited By ~ 3

Author(s):

Francis Rey F. Padao ◽

Elmer A. Maravillas

Keyword(s):

Bayesian Method ◽

Texture Features ◽

Plant Leaf ◽

Naive Bayesian ◽

Naïve Bayesian

Download Full-text

Application of the Naïve Bayesian Method with user current usage and hierarchy from website in Chinese Webpage Classification

2007 IEEE International Conference on Automation and Logistics ◽

10.1109/ical.2007.4338782 ◽

2007 ◽

Cited By ~ 1

Author(s):

Jinsong Li ◽

Weimin Xue ◽

Nanping Dong

Keyword(s):

Bayesian Method ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Current Usage

Download Full-text

The Classification of Documents in Malay and Indonesian Using the Naive Bayesian Method Uses Words and Phrases as a Training Set

MENDEL ◽

10.13164/mendel.2020.2.023 ◽

2020 ◽

Vol 26 (2) ◽

pp. 23-28

Author(s):

Marvin Chandra Wijaya

Keyword(s):

Classification Accuracy ◽

Bayesian Method ◽

New Method ◽

Classification Method ◽

Training Set ◽

Naive Bayesian ◽

Naïve Bayesian

Malay Language and Indonesian Language are two closely related languages, sharing a lot in common in the meanings of words and grammar. Classifying the two languages automatically using a tool is a challenge because the two languages are very similar. The classification method that is widely used today is the Naive Bayesian method. This method needs to be implemented in a particular way to increase the level of classification accuracy. In this study, a new method was used, by using a training set in the form of words and phrases instead of just using a training set in the form of words only. With this method, the level of classification accuracy of the two languages is increased.

Download Full-text