The Research on Tibetan Text Classification Based on N-Gram Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1896 ◽

2014 ◽

Vol 543-547 ◽

pp. 1896-1900

Author(s):

Deng Zhou ◽

Wen Huang He ◽

Tao Tao Wu

Keyword(s):

Feature Selection ◽

Text Classification ◽

Word Segmentation ◽

Classification Model ◽

Treatment Processes ◽

Pre Treatment ◽

N Gram ◽

Selection Of

This Compared with the traditional text classification model, the Tibetan text classification based on N-Gram model has adopted N-Gram model in terms of the level of word. In other words, during the text classification, word segmentation is not required. Also, feature selection and abundant pre-treatment processes are avoided. This paper not only carried out profound research on N-Gram models, but also discusses the selection of parameter N in the model by adopting Naïve Bayes Multinomial classifier.

Download Full-text

An Ensemble Text Classification Model Combining Strong Rules and N-Gram

Third International Conference on Natural Computation (ICNC 2007) ◽

10.1109/icnc.2007.198 ◽

2007 ◽

Author(s):

Jinhong Liu ◽

Yuliang Lu

Keyword(s):

Text Classification ◽

Classification Model ◽

Model Combining ◽

N Gram

Download Full-text

Efficient n-gram construction for text categorization using feature selection techniques

Intelligent Data Analysis ◽

10.3233/ida-205154 ◽

2021 ◽

Vol 25 (3) ◽

pp. 509-525

Author(s):

Maximiliano García ◽

Sebastián Maldonado ◽

Carla Vairetti

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

A Priori ◽

Predictive Performance ◽

Online Reviews ◽

Additional Advantage ◽

Novel Approach ◽

N Gram ◽

Feature Selection Techniques

In this paper, we present a novel approach for n-gram generation in text classification. The a-priori algorithm is adapted to prune word sequences by combining three feature selection techniques. Unlike the traditional two-step approach for text classification in which feature selection is performed after the n-gram construction process, our proposal performs an embedded feature elimination during the application of the a-priori algorithm. The proposed strategy reduces the number of branches to be explored, speeding up the process and making the construction of all the word sequences tractable. Our proposal has the additional advantage of constructing a low-dimensional dataset with only the features that are relevant for classification, that can be used directly without the need for a feature selection step. Experiments on text classification datasets for sentiment analysis demonstrate that our approach yields the best predictive performance when compared with other feature selection approaches, while also facilitating a better understanding of the words and phrases that explain a given task; in our case online reviews and ratings in various domains.

Download Full-text

The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage

International Journal of Computer and Information Technology(2279-0764) ◽

10.24203/ijcit.v10i4.138 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Triyas Hevianto Saputro ◽

Arief Hermawan

Keyword(s):

Machine Learning ◽

Text Mining ◽

Sentiment Analysis ◽

Text Classification ◽

Classification Model ◽

Training Process ◽

Accuracy Improvement ◽

Spelling Correction ◽

Preprocessing Technique ◽

Selection Of

Sentiment analysis is a part of text mining used to dig up information from a sentence or document. This study focuses on text classification for the purpose of a sentiment analysis on hospital review by customers through criticism and suggestion on Google Maps Review. The data of texts collected still contain a lot of nonstandard words. These nonstandard words cause problem in the preprocessing stage. Thus, the selection and combination of techniques in the preprocessing stage emerge as something crucial for the accuracy improvement in the computation of machine learning. However, not all of the techniques in the preprocessing stage can contribute to improve the accuracy on classification machine. The objective of this study is to improve the accuracy of classification model on hospital review by customers for a sentiment analysis modeling. Through the implementation of the preprocessing technique combination, it can produce a highly accurate classification model. This study experimented with several preprocessing techniques: (1) tokenization, (2) case folding, (3) stop words removal, (4) stemming, and (5) removing punctuation and number. The experiment was done by adding the preprocessing methods: (1) spelling correction and (2) Slang. The result shows that spelling correction and Slang method can assist for improving the accuracy value. Furthermore, the selection of suitable preprocessing technique combination can fasten the training process to produce the more ideal text classification model.

Download Full-text

The Research of Feature Selection of Text Classification Based on Integrated Learning Algorithm

2011 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science ◽

10.1109/dcabes.2011.95 ◽

2011 ◽

Cited By ~ 2

Author(s):

Xia Huosong ◽

Liu Jian

Keyword(s):

Feature Selection ◽

Text Classification ◽

Learning Algorithm ◽

Integrated Learning ◽

Selection Of

Download Full-text

Text classification model for methamphetamine-related tweets in Southeast Asia using dual data preprocessing techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i4.pp3617-3628 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3617

Author(s):

Narongsak Chayangkoon ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Southeast Asia ◽

Text Classification ◽

Matthews Correlation Coefficient ◽

Feature Selection Method ◽

Classification Model ◽

Support Vector ◽

Drug Addicts ◽

High Area ◽

Illegal Activities

<span>Methamphetamine addiction is a prominent problem in Southeast Asia. Drug addicts often discuss illegal activities on popular social networking services. These individuals spread messages on social media as a means of both buying and selling drugs online. This paper proposes a model, the “text classification model of methamphetamine tweets in Southeast Asia” (TMTA), to identify whether a tweet from Southeast Asia is related to methamphetamine abuse. The research addresses the weakness of bag of words (BoW) by introducing BoW and Word2Vec feature selection (BWF) techniques. A domain-based feature selection method was performed using the BoW dataset and Word2Vec. The BWF dataset provided a smaller number of features than the BoW and TF–IDF dataset. We experimented with three candidate classifiers: Support vector machine (SVM), decision tree (J48) and naive bayes (NB). We found that the J48 classifier with the BWF dataset provided the best performance for the TMTA in terms of accuracy (0.815), F-measure (0.818), Kappa (0.528), Matthews correlation coefficient (0.529) and high area under the ROC Curve (0.763). Moreover, TMTA provided the lowest runtime (3.480 seconds) using the J48 with the BWF dataset.</span>

Download Full-text

N-gram Feature Selection for Text Classification Based on Symmetrical Conditional Probability and TF-IDF

Journal of the Korean Institute of Industrial Engineers ◽

10.7232/jkiie.2015.41.4.381 ◽

2015 ◽

Vol 41 (4) ◽

pp. 381-388 ◽

Cited By ~ 1

Author(s):

Woo-Sik Choi ◽

Seoung Bum Kim

Keyword(s):

Feature Selection ◽

Conditional Probability ◽

Text Classification ◽

Selection For ◽

N Gram

Download Full-text

Effect of Microwave Pretreatment on Leaching of Tetrahedrite

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/906/1/012111 ◽

2021 ◽

Vol 906 (1) ◽

pp. 012111

Author(s):

Ingrid Znamenácková ◽

Silvia Dolinská ◽

Slavomír Hredzák ◽

Vladimir Cablík

Keyword(s):

Microwave Radiation ◽

Extraction Process ◽

Copper Recovery ◽

Processing Scheme ◽

Hydrometallurgical Processing ◽

Treatment Processes ◽

Sulphide Ores ◽

Pre Treatment ◽

Positive Effect ◽

Selection Of

Abstract In mineral processing, the use of microwave radiation is important especially in pre-treatment processes. At present, there is an acceleration of processes as well as an increase in the efficiency of metal recovery. One of the main problems in copper recovery from complex sulphide ores is the removal of impurities such as antimony, arsenic, mercury. In the hydrometallurgical processing scheme, the key step is the leaching. The extraction process can be influenced by the selection of suitable leaching reagents or by suitable pre-treatment of the ore. The article describes the effect of microwave radiation on the leaching Sb, As and Hg of tetrahedrite and tetrahedrite concentrate. The samples were irradiated at the power 900 W for 30 seconds. The leaching of irradiated and non-irradiated samples was realized in an alkaline sodium sulphide. The positive effect of microwave radiation was confirmed by an increase in the recovery of Sb and As already after 15 min of extraction. After microwave leaching of irradiated tetrahedrite samples, the yield of Sb was 43.2 %, in irradiated tetrahedrite concentrate, the yield of Sb was 81.3 %.

Download Full-text

1671: Association of Race, Cancer Severity, Pre-Treatment Quality of Life, and Patient Optimism/Pessimism with Patient Selection of Primary Prostate Cancer Treatment

The Journal of Urology ◽

10.1016/s0022-5347(18)35793-8 ◽

2005 ◽

Vol 173 (4S) ◽

pp. 453-453

Author(s):

Martin G. Sanda ◽

Rodney L. Dunn ◽

Christopher S. Saigal ◽

Eric A. Klein ◽

Louis L. Pisters ◽

...

Keyword(s):

Quality Of Life ◽

Prostate Cancer ◽

Cancer Treatment ◽

Patient Selection ◽

Treatment Quality ◽

Prostate Cancer Treatment ◽

Primary Prostate Cancer ◽

Pre Treatment ◽

Selection Of

Download Full-text

Feature selection of the armature winding broken coils in synchronous motor using genetic algorithm and mahalanobis distance

Archives of Metallurgy and Materials ◽

10.2478/v10172-012-0091-7 ◽

2012 ◽

Vol 57 (3) ◽

pp. 829-835 ◽

Cited By ~ 1

Author(s):

Z. Głowacz ◽

J. Kozik

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mahalanobis Distance ◽

Distance Measure ◽

Synchronous Motor ◽

Medical Diagnostics ◽

Motor Current ◽

Feature Spaces ◽

Multidimensional Feature Spaces ◽

Selection Of

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.

Download Full-text

Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.933937 ◽

2019 ◽

Vol 7 (4) ◽

pp. 933-937

Author(s):

Varun Saproo ◽

Rujuta Upadhyay ◽

Manisha Valera

Keyword(s):

Feature Selection ◽

Text Classification ◽

Genetic Mutation ◽

Classification Methods

Download Full-text