scholarly journals Automated News Classification using N-gram Model and Key Features of Nepali Language

SCITECH Nepal ◽  
2018 ◽  
Vol 13 (1) ◽  
pp. 64-69
Author(s):  
Dinesh Dangol ◽  
Rupesh Dahi Shrestha ◽  
Arun Timalsina

With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.

2021 ◽  
Author(s):  
V. S. Martins ◽  
C. D. Silva

Automatic Text Classification represents a great improvement in law area workflow, mainly in the migration of physical to electronic lawsuits. A systematic review of studies on text classification in law area from January 2017 up to February 2020 was conducted. The search strategy identified 20 studies, that were analyzed and compared. The review investigates from research questions: what are the state-of-art language models, its application of text classification in English and Brazilian Portuguese datasets from legal area, if there are available language models trained on Brazilian Portuguese, and datasets in Brazilian law area. It concludes that there are applications of automatic text classification in Brazil, although there is a gap on the use of language models when compared with English language dataset studies, also the importance of language model in domain pre-training to improve results, as well as there are two studies making available Brazilian Portuguese language models, and one introducing a dataset in Brazilian law area.


2013 ◽  
Vol 20 (3) ◽  
pp. 130 ◽  
Author(s):  
Celso Antonio Alves Kaestner

This work presents kernel functions that can be used in conjunction with the Support Vector Machine – SVM – learning algorithm to solve the automatic text classification task. Initially the Vector Space Model for text processing is presented. According to this model text is seen as a set of vectors in a high dimensional space; then extensions and alternative models are derived, and some preprocessing procedures are discussed. The SVM learning algorithm, largely employed for text classification, is outlined: its decision procedure is obtained as a solution of an optimization problem. The “kernel trick”, that allows the algorithm to be applied in non-linearly separable cases, is presented, as well as some kernel functions that are currently used in text applications. Finally some text classification experiments employing the SVM classifier are conducted, in order to illustrate some text preprocessing techniques and the presented kernel functions.


2020 ◽  
Vol 54 (3) ◽  
pp. 113-123
Author(s):  
V. S. Egorov ◽  
E. S. Kozlova ◽  
K. E. Lomotin ◽  
O. V. Fedorets ◽  
A. V. Filimonov ◽  
...  

2018 ◽  
Vol 7 (04) ◽  
pp. 871-888 ◽  
Author(s):  
Sophie J. Lee ◽  
Howard Liu ◽  
Michael D. Ward

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.


2008 ◽  
Vol E91-D (4) ◽  
pp. 1101-1109 ◽  
Author(s):  
L. S.P. BUSAGALA ◽  
W. OHYAMA ◽  
T. WAKABAYASHI ◽  
F. KIMURA

Sign in / Sign up

Export Citation Format

Share Document