Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language
Latest Publications


TOTAL DOCUMENTS

11
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781466686908, 9781466686915

Author(s):  
Jan Žižka ◽  
František Dařena

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.


Author(s):  
Jalel Akaichi

In this work, we focus on the application of text mining and sentiment analysis techniques for analyzing Tunisian users' statuses updates on Facebook. We aim to extract useful information, about their sentiment and behavior, especially during the “Arabic spring” era. To achieve this task, we describe a method for sentiment analysis using Support Vector Machine and Naïve Bayes algorithms, and applying a combination of more than two features. The output of this work consists, on one hand, on the construction of a sentiment lexicon based on the Emoticons and Acronyms' lexicons that we developed based on the extracted statuses updates; and on the other hand, it consists on the realization of detailed comparative experiments between the above algorithms by creating a training model for sentiment classification.


Author(s):  
Sergey Maruev ◽  
Dmitry Stefanovskyi ◽  
Alexander Troussov

Nowadays, most of the digital content is generated within techno-social systems like Facebook or Twitter where people are connected to other people and to artefacts such as documents and concepts. These networks provide rich context for understanding the role of particular nodes. It is widely agreed that one of the most important principles in the philosophy of language is Frege's context principle, which states that words have meaning only in the context of a sentence. This chapter puts forward the hypothesis that semantics of the content of techno-social systems should be also analysed in the context of the whole system. The hypothesis is substantiated by the introduction of a method for formal modelling and mining of techno-social systems and is corroborated by a discussion on the nature of meaning in philosophy. In addition we provide an overview of recent trends in knowledge production and management within the context of our hypothesis.


Author(s):  
Goran Klepac ◽  
Marko Velić

This chapter covers natural language processing techniques and their application in predicitve models development. Two case studies are presented. First case describes a project where textual descriptions of various situations in call center of one telecommunication company were processed in order to predict churn. Second case describes sentiment analysis of business news and describes practical and testing issues in text mining projects. Both case studies depict different approaches and are implemented in different tools. Language of the texts processed in these projects is Croatian which belongs to the Slavic group of languages with more complex morphologies and grammar rules than English. Chapter concludes with several points on the future research possible in this domain.


Author(s):  
Georgios Alexandropoulos

This research focuses on the corpus stylistic analysis of the treatises of Great Athanasius. In this interdisciplinary study classical texts are approached through linguistic tools and the main purpose is to describe the style of Great Athanasius in these treatises, after having extracted all these quantitative data utilizing computational tools. The language Great Athanasius uses is a language that expresses intensely his speculations on the achievement of religious change and restructuration. His language expresses his religious ideology. His speeches are persuasive, ideological and represent the rhetorician's opinion. They are based on the speaker's intentionality; it directs him to the specific rhetorical framework, since he aims at one and unique inspirational result, that is, persuasion.


Author(s):  
Tomáš Hudík

This chapter gives a short introduction to machine translation (MT) and its use within commercial companies with special focus on the localization industry. Although MT is not a new field, many scientists and researchers are still interested in this field and are frequently coming up with challenges, discoveries and novel approaches. Commercial companies need to keep track with them and their R&D departments are making good progress with the integration of MT within their complicated workflows as well as minor improvements in core MT in order to gain a competitive advantage. The chapter describes differences in research within university and commercial environments. Furthermore, there will be given the main obstacles in the deployment of new technologies and typical way in which a new technology can be deployed in corporate environment.


Author(s):  
Jasmina Milićević ◽  
Àngels Catena

Translation of sentences featuring clitics often poses a problem to machine translation systems. In this chapter, we illustrate, on the material from a Serbian ~ Catalan parallel corpus, a rule-based approach to solving translational structural mismatches between linguistic representations that underlie source- and target language sentences containing clitics. Unlike most studies in this field, which make use of phrase structure formalisms, ours has been conducted within the dependency framework of the Meaning-Text linguistic theory. We start by providing a brief description of Catalan and Serbian clitic systems, then introduce the basics of our framework to finally illustrate Serbian ~ Catalan translational mismatches involving the operations of clitic doubling, clitic climbing, and clitic possessor raising.


Author(s):  
František Dařena ◽  
Jan Žižka

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.


Author(s):  
Abel Browarnik ◽  
Oded Maimon

In this chapter we analyze Ontology Learning and its goals, as well as the input expected when learning ontologies - peer-reviewed scientific papers in English. After reviewing the Ontology Learning Layer Cake model's shortcomings we suggest an alternative model based on linguistic knowledge. The suggested model would find the meaning of simple components of text – statements. From them it is easy to derive cases and roles that map the reality as a set of entities and relationships or RDF triples, somehow equivalent to Entity-relationship diagrams. Time complexity for the suggested ontology learning framework is constant (O(1)) for a sentence, and O(n) for an ontology with n sentences. We conclude that the Ontology Learning Layer Cake is not adequate for Ontology Learning from text.


Author(s):  
Pavel Makagonov

The measure of perfection of the contents and semantic value of an integrated text is connected with the indicators of perfection in the distribution of content words. This criterion is the coordination of their “frequency-rank” distribution with the Zipf or Zipf-Mandelbrot law. In this chapter the hypothesis verified is that a perfect system should have not only perfect distribution of its elements - objects, but also perfect connections between them. A model is suggested in which the degree of the text perfection from the point of view of the quality of connections between significative words is determined by the quality of distribution of syntactic and link words in the “rank - frequency” representation. As a simplified criterion the ratio of the significant and syntactic words used in the analyzed text and the degree of the closeness of this ratio to the “golden section” is considered.


Sign in / Sign up

Export Citation Format

Share Document