USING MACHINE LEARNING TECHNIQUES FOR PART-OF-SPEECH TAGGING IN THE GREEK LANGUAGE

Author(s):  
GEORGIOS PETASIS ◽  
GEORGIOS PALIOURAS ◽  
VANGELIS KARKALETSIS ◽  
CONSTANTINE D. SPYROPOULOS ◽  
ION ANDROUTSOPOULOS
Author(s):  
Dan Tufiș ◽  
Radu Ion

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.


2020 ◽  
Vol 20 (2) ◽  
pp. 179-196
Author(s):  
Alessandro Vatri ◽  
Barbara McGillivray

Abstract This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.


2021 ◽  
pp. 1-20
Author(s):  
Kashif Ayyub ◽  
Saqib Iqbal ◽  
Muhammad Wasif Nisar ◽  
Saima Gulzar Ahmad ◽  
Ehsan Ullah Munir

 Sentiment analysis is the field that analyzes sentiments, and opinions of people about entities such as products, businesses, and events. As opinions influence the people’s behaviors, it has numerous applications in real life such as marketing, politics, social media etc. Stance detection is the sub-field of sentiment analysis. The stance classification aims to automatically identify from the source text, whether the source is in favor, neutral, or opposed to the target. This research study proposed a framework to explore the performance of the conventional (NB, DT, SVM), ensemble learning (RF, AdaBoost) and deep learning-based (DBN, CNN-LSTM, and RNN) machine learning techniques. The proposed method is feature centric and extracted the (sentiment, content, tweet specific and part-of-speech) features from both datasets of SemEval2016 and SemEval2017. The proposed study has also explored the role of deep features such as GloVe and Word2Vec for stance classification which has not received attention yet for stance detection. Some base line features such as Bag of words, N-gram, TF-IDF are also extracted from both datasets to compare the proposed features along with deep features. The proposed features are ranked using feature ranking methods such as (information gain, gain ration and relief-f). Further, the results are evaluated using standard performance evaluation measures for stance classification with existing studies. The calculated results show that the proposed feature sets including sentiment, (part-of-speech, content, and tweet specific) are helpful for stance classification when applied with SVM and GloVe a deep feature has given the best results when applied with deep learning method RNN.


Author(s):  
Raymond J. Mooney

This article introduces the type of symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. Machine learning is the study of computational systems that improve performance on some task with experience. Most machine learning methods concern the task of categorizing examples described by a set of features. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Finally, this article reviews the applications to a variety of these problems, such as morphology, part-of-speech tagging, word-sense disambiguation, syntactic parsing, semantic parsing, information extraction, and anaphora resolution.


Author(s):  
Raymond J. Mooney

This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. It also briefly reviews unsupervised learning, in which new concepts are formed from unannotated examples by clustering them into coherent groups. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.


Author(s):  
Hyoil Han ◽  
Ramez Elmasri

A lot of work has been done in the area of extracting data content from the Web, but less attention has been given to extracting the conceptual schemas or ontologies of underlying Web pages. The goal of the WebOntEx (Web ontology extraction) project is to make progress toward semiautomatically extracting Web ontologies by analyzing a set of Web pages that are in the same application domain. The ontology is considered a complete schema of the domain concepts. Our ontology metaconcepts are based on the extended entity-relationship (EER) model. The concepts are classified into entity types, relationships, attributes, and superclass/subclass hierarchies. WebOntEx attempts to extract ontology concepts by analyzing the use of HTML tags and by utilizing Part-of-Speech tagging. WebOntEx applies heuristic rules and machine learning techniques, in particular, inductive logic programming (ILP).


1994 ◽  
Vol 2 ◽  
pp. 131-158 ◽  
Author(s):  
S. Soderland ◽  
W. Lehnert

The vast amounts of on-line text now available have ledto renewed interest in information extraction (IE) systems thatanalyze unrestricted text, producing a structured representation ofselected information from the text. This paper presents a novel approachthat uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logicalrelations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not onlyautomatically decides what classifiers are needed, but even derives the featureset for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.


Sign in / Sign up

Export Citation Format

Share Document