USING MACHINE LEARNING TECHNIQUES FOR PART-OF-SPEECH TAGGING IN THE GREEK LANGUAGE

Part-of-Speech Tagging

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.51 ◽

2017 ◽

Author(s):

Dan Tufiș ◽

Radu Ion

Keyword(s):

Language Processing ◽

Machine Learning Techniques ◽

Computing Power ◽

Data Sparseness ◽

Part Of Speech Tagging ◽

Web Environment ◽

Part Of Speech ◽

Learning Techniques ◽

Lexical Disambiguation ◽

Speech Tagging

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.

Download Full-text

Lemmatization for Ancient Greek

Journal of Greek Linguistics ◽

10.1163/15699846-02002001 ◽

2020 ◽

Vol 20 (2) ◽

pp. 179-196

Author(s):

Alessandro Vatri ◽

Barbara McGillivray

Keyword(s):

Machine Learning ◽

Ancient Greek ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Abstract This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.

Download Full-text

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

The KIPS Transactions PartB ◽

10.3745/kipstb.2011.18b.1.045 ◽

2011 ◽

Vol 18B (1) ◽

pp. 45-50 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unknown Words ◽

Speech Tagging

Download Full-text

Stance detection using diverse feature sets based on machine learning techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202269 ◽

2021 ◽

pp. 1-20

Author(s):

Kashif Ayyub ◽

Saqib Iqbal ◽

Muhammad Wasif Nisar ◽

Saima Gulzar Ahmad ◽

Ehsan Ullah Munir

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Information Gain ◽

Real Life ◽

Machine Learning Techniques ◽

Base Line ◽

Feature Sets ◽

Part Of Speech ◽

Learning Techniques

Sentiment analysis is the field that analyzes sentiments, and opinions of people about entities such as products, businesses, and events. As opinions influence the people’s behaviors, it has numerous applications in real life such as marketing, politics, social media etc. Stance detection is the sub-field of sentiment analysis. The stance classification aims to automatically identify from the source text, whether the source is in favor, neutral, or opposed to the target. This research study proposed a framework to explore the performance of the conventional (NB, DT, SVM), ensemble learning (RF, AdaBoost) and deep learning-based (DBN, CNN-LSTM, and RNN) machine learning techniques. The proposed method is feature centric and extracted the (sentiment, content, tweet specific and part-of-speech) features from both datasets of SemEval2016 and SemEval2017. The proposed study has also explored the role of deep features such as GloVe and Word2Vec for stance classification which has not received attention yet for stance detection. Some base line features such as Bag of words, N-gram, TF-IDF are also extracted from both datasets to compare the proposed features along with deep features. The proposed features are ranked using feature ranking methods such as (information gain, gain ration and relief-f). Further, the results are evaluated using standard performance evaluation measures for stance classification with existing studies. The calculated results show that the proposed feature sets including sentiment, (part-of-speech, content, and tweet specific) are helpful for stance classification when applied with SVM and GloVe a deep feature has given the best results when applied with deep learning method RNN.

Download Full-text

Deep Learning Techniques for Part of Speech Tagging by Natural Language Processing

2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) ◽

10.1109/icimia48430.2020.9074941 ◽

2020 ◽

Author(s):

Rushali Dhumal Deshmukh ◽

Arvind Kiwelekar

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Learning Techniques ◽

Speech Tagging

Download Full-text

Machine Learning

10.1093/oxfordhb/9780199276349.013.0020 ◽

2012 ◽

Author(s):

Raymond J. Mooney

Keyword(s):

Machine Learning ◽

Word Sense Disambiguation ◽

Anaphora Resolution ◽

Syntactic Parsing ◽

Word Sense ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Sense Disambiguation ◽

Training Examples ◽

Speech Tagging

This article introduces the type of symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. Machine learning is the study of computational systems that improve performance on some task with experience. Most machine learning methods concern the task of categorizing examples described by a set of features. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Finally, this article reviews the applications to a variety of these problems, such as morphology, part-of-speech tagging, word-sense disambiguation, syntactic parsing, semantic parsing, information extraction, and anaphora resolution.

Download Full-text

Machine Learning

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.016 ◽

2018 ◽

Author(s):

Raymond J. Mooney

Keyword(s):

Machine Learning ◽

Computational Linguistics ◽

Word Sense Disambiguation ◽

Anaphora Resolution ◽

Word Sense ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

New Concepts ◽

Training Examples ◽

Speech Tagging

This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. It also briefly reviews unsupervised learning, in which new concepts are formed from unannotated examples by clustering them into coherent groups. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.

Download Full-text

Part-of-speech tagging based on dictionary and statistical machine learning

2016 35th Chinese Control Conference (CCC) ◽

10.1109/chicc.2016.7554459 ◽

2016 ◽

Cited By ~ 4

Author(s):

Zhonglin Ye ◽

Zhen Jia ◽

Junfu Huang ◽

Hongfeng Yin

Keyword(s):

Machine Learning ◽

Statistical Machine Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Ontology Extraction and Conceptual Modeling for Web Information

Information Modeling for Internet Applications ◽

10.4018/978-1-59140-050-9.ch009 ◽

2003 ◽

pp. 174-188 ◽

Cited By ~ 6

Author(s):

Hyoil Han ◽

Ramez Elmasri

Keyword(s):

Inductive Logic ◽

Conceptual Modeling ◽

Machine Learning Techniques ◽

Web Pages ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Learning Techniques ◽

Ontology Extraction ◽

Data Content ◽

Entity Relationship

A lot of work has been done in the area of extracting data content from the Web, but less attention has been given to extracting the conceptual schemas or ontologies of underlying Web pages. The goal of the WebOntEx (Web ontology extraction) project is to make progress toward semiautomatically extracting Web ontologies by analyzing a set of Web pages that are in the same application domain. The ontology is considered a complete schema of the domain concepts. Our ontology metaconcepts are based on the extended entity-relationship (EER) model. The concepts are classified into entity types, relationships, attributes, and superclass/subclass hierarchies. WebOntEx attempts to extract ontology concepts by analyzing the use of HTML tags and by utilizing Part-of-Speech tagging. WebOntEx applies heuristic rules and machine learning techniques, in particular, inductive logic programming (ILP).

Download Full-text

Wrap-Up: a Trainable Discourse Module for Information Extraction

Journal of Artificial Intelligence Research ◽

10.1613/jair.68 ◽

1994 ◽

Vol 2 ◽

pp. 131-158 ◽

Cited By ~ 11

Author(s):

S. Soderland ◽

W. Lehnert

Keyword(s):

Machine Learning ◽

Information Extraction ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Lexical Disambiguation ◽

On Line ◽

Speech Tagging ◽

Structured Representation

The vast amounts of on-line text now available have ledto renewed interest in information extraction (IE) systems thatanalyze unrestricted text, producing a structured representation ofselected information from the text. This paper presents a novel approachthat uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logicalrelations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not onlyautomatically decides what classifiers are needed, but even derives the featureset for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.

Download Full-text