Incorporating External POS Tagger for Punctuation Restoration

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

POS-tagger for English-Vietnamese bilingual corpus

10.3115/1118905.1118921 ◽

2003 ◽

Cited By ~ 9

Author(s):

Dinh Dien ◽

Hoang Kiem

Keyword(s):

Pos Tagger

Download Full-text

Building a Social Media Adapted PoS Tagger Using FlexTag – A Case Study on Italian Tweets

EVALITA. Evaluation of NLP and Speech Tools for Italian ◽

10.4000/books.aaccademia.1960 ◽

2016 ◽

pp. 95-98

Author(s):

Tobias Horsmann ◽

Torsten Zesch

Keyword(s):

Social Media ◽

Pos Tagger

Download Full-text

Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.3499 ◽

2014 ◽

Vol 8 (1) ◽

pp. 91 ◽

Cited By ~ 5

Author(s):

Noviah Dwi Putranti ◽

Edi Winarko

Keyword(s):

Support Vector Machine ◽

Maximum Entropy ◽

Social Networking Site ◽

Training Data ◽

Classification Model ◽

Support Vector ◽

Public Sentiment ◽

Pos Tagger ◽

Negative Sentiment ◽

Bahasa Indonesia

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif. Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 % pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%. Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter. AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.

Download Full-text

bot.zen $@$ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)

10.18653/v1/w16-2614 ◽

2016 ◽

Author(s):

Egon Stemle

Keyword(s):

Deep Learning ◽

Web Data ◽

Pos Tagger

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

On Building an Interpretable Topic Modeling Approach for the Urdu Language

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/740 ◽

2020 ◽

Author(s):

Zarmeen Nasim

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Topic Modeling ◽

Language Modeling ◽

Low Resource ◽

Modeling Approach ◽

Part Of Speech ◽

Modeling Techniques ◽

Pos Tagger

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.

Download Full-text

Improved CCG Parsing with Semi-supervised Supertagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00186 ◽

2014 ◽

Vol 2 ◽

pp. 327-338 ◽

Cited By ~ 7

Author(s):

Mike Lewis ◽

Mark Steedman

Keyword(s):

Wall Street Journal ◽

State Of The Art ◽

Training Data ◽

Word Embeddings ◽

Important Goal ◽

Wall Street ◽

Feature Sets ◽

Lexical Categories ◽

Unlabelled Data ◽

Pos Tagger

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.

Download Full-text

Malayalam POS Tagger—A Comparison Using SVM and HMM

Evolution in Computational Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-5788-0_40 ◽

2020 ◽

pp. 413-420

Author(s):

K. Usha ◽

S. Lakshmana Pandian

Keyword(s):

Pos Tagger

Download Full-text

Aspect-Based Sentiment Analysis of Online Product Reviews

Advances in Business Information Systems and Analytics - Handbook of Research on Advanced Data Mining Techniques and Applications for Business Intelligence ◽

10.4018/978-1-5225-2031-3.ch010 ◽

2017 ◽

pp. 175-191 ◽

Cited By ~ 1

Author(s):

Vinod Kumar Mishra ◽

Himanshu Tiruwa

Keyword(s):

Sentiment Analysis ◽

Computational Linguistics ◽

Language Processing ◽

Future Trend ◽

Product Reviews ◽

Customer Reviews ◽

Part Of Speech ◽

Lexical Approach ◽

Sentiment Score ◽

Pos Tagger

Sentiment analysis is a part of computational linguistics concerned with extracting sentiment and emotion from text. It is also considered as a task of natural language processing and data mining. Sentiment analysis mainly concentrate on identifying whether a given text is subjective or objective and if it is subjective, then whether it is negative, positive or neutral. This chapter provide an overview of aspect based sentiment analysis with current and future trend of research on aspect based sentiment analysis. This chapter also provide a aspect based sentiment analysis of online customer reviews of Nokia 6600. To perform aspect based classification we are using lexical approach on eclipse platform which classify the review as a positive, negative or neutral on the basis of features of product. The Sentiwordnet is used as a lexical resource to calculate the overall sentiment score of each sentence, pos tagger is used for part of speech tagging, frequency based method is used for extraction of the aspects/features and used negation handling for improving the accuracy of the system.

Download Full-text

Incorporating External POS Tagger for Punctuation Restoration

Handling WSD using Hierarchical Clustering Algorithm with sentences

POS-tagger for English-Vietnamese bilingual corpus

Building a Social Media Adapted PoS Tagger Using FlexTag – A Case Study on Italian Tweets

Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine

bot.zen $@$ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

On Building an Interpretable Topic Modeling Approach for the Urdu Language

Improved CCG Parsing with Semi-supervised Supertagging

Malayalam POS Tagger—A Comparison Using SVM and HMM

Aspect-Based Sentiment Analysis of Online Product Reviews

Export Citation Format