Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language

2015 ◽

pp. 112-140 ◽

Cited By ~ 1

Author(s):

Jan Žižka ◽

František Dařena

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Real World ◽

Supervised Machine Learning ◽

The Internet ◽

Learning Method ◽

Label Information ◽

Document Categorization

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.

Download Full-text

Sentiment Classification

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch001 ◽

2015 ◽

pp. 1-26

Author(s):

Jalel Akaichi

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Sentiment Analysis ◽

Training Model ◽

Sentiment Classification ◽

The Other ◽

Support Vector ◽

Analysis Techniques ◽

Sentiment Lexicon ◽

And Behavior

In this work, we focus on the application of text mining and sentiment analysis techniques for analyzing Tunisian users' statuses updates on Facebook. We aim to extract useful information, about their sentiment and behavior, especially during the “Arabic spring” era. To achieve this task, we describe a method for sentiment analysis using Support Vector Machine and Naïve Bayes algorithms, and applying a combination of more than two features. The output of this work consists, on one hand, on the construction of a sentiment lexicon based on the Emoticons and Acronyms' lexicons that we developed based on the extracted statuses updates; and on the other hand, it consists on the realization of detailed comparative experiments between the above algorithms by creating a training model for sentiment classification.

Download Full-text

Semantics of Techno-Social Spaces

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch008 ◽

2015 ◽

pp. 204-234 ◽

Cited By ~ 2

Author(s):

Sergey Maruev ◽

Dmitry Stefanovskyi ◽

Alexander Troussov

Keyword(s):

Knowledge Production ◽

Philosophy Of Language ◽

Social Systems ◽

Digital Content ◽

Social Spaces ◽

Formal Modelling ◽

Recent Trends

Nowadays, most of the digital content is generated within techno-social systems like Facebook or Twitter where people are connected to other people and to artefacts such as documents and concepts. These networks provide rich context for understanding the role of particular nodes. It is widely agreed that one of the most important principles in the philosophy of language is Frege's context principle, which states that words have meaning only in the context of a sentence. This chapter puts forward the hypothesis that semantics of the content of techno-social systems should be also analysed in the context of the whole system. The hypothesis is substantiated by the introduction of a method for formal modelling and mining of techno-social systems and is corroborated by a discussion on the nature of meaning in philosophy. In addition we provide an overview of recent trends in knowledge production and management within the context of our hypothesis.

Download Full-text

Natural Language Processing as Feature Extraction Method for Building Better Predictive Models

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch006 ◽

2015 ◽

pp. 141-166 ◽

Cited By ~ 1

Author(s):

Goran Klepac ◽

Marko Velić

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Case Studies ◽

Language Processing ◽

Call Center ◽

Future Research ◽

Feature Extraction Method ◽

First Case ◽

Grammar Rules ◽

Mining Projects

This chapter covers natural language processing techniques and their application in predicitve models development. Two case studies are presented. First case describes a project where textual descriptions of various situations in call center of one telecommunication company were processed in order to predict churn. Second case describes sentiment analysis of business news and describes practical and testing issues in text mining projects. Both case studies depict different approaches and are implemented in different tools. Language of the texts processed in these projects is Croatian which belongs to the Slavic group of languages with more complex morphologies and grammar rules than English. Chapter concludes with several points on the future research possible in this domain.

Download Full-text

A Corpus-Stylistic Approach of the Treatises of Great Athanasius About Idolatry

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch011 ◽

2015 ◽

pp. 273-299

Author(s):

Georgios Alexandropoulos

Keyword(s):

Quantitative Data ◽

Stylistic Analysis ◽

Religious Change ◽

Computational Tools ◽

Religious Ideology ◽

Interdisciplinary Study

This research focuses on the corpus stylistic analysis of the treatises of Great Athanasius. In this interdisciplinary study classical texts are approached through linguistic tools and the main purpose is to describe the style of Great Athanasius in these treatises, after having extracted all these quantitative data utilizing computational tools. The language Great Athanasius uses is a language that expresses intensely his speculations on the achievement of religious change and restructuration. His language expresses his religious ideology. His speeches are persuasive, ideological and represent the rhetorician's opinion. They are based on the speaker's intentionality; it directs him to the specific rhetorical framework, since he aims at one and unique inspirational result, that is, persuasion.

Download Full-text

Machine Translation within Commercial Companies

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch010 ◽

2015 ◽

pp. 256-272

Author(s):

Tomáš Hudík

Keyword(s):

Competitive Advantage ◽

Machine Translation ◽

New Technologies ◽

New Technology ◽

Special Focus ◽

Corporate Environment ◽

Short Introduction ◽

Novel Approaches

This chapter gives a short introduction to machine translation (MT) and its use within commercial companies with special focus on the localization industry. Although MT is not a new field, many scientists and researchers are still interested in this field and are frequently coming up with challenges, discoveries and novel approaches. Commercial companies need to keep track with them and their R&D departments are making good progress with the integration of MT within their complicated workflows as well as minor improvements in core MT in order to gain a competitive advantage. The chapter describes differences in research within university and commercial environments. Furthermore, there will be given the main obstacles in the deployment of new technologies and typical way in which a new technology can be deployed in corporate environment.

Download Full-text

Translational Mismatches Involving Clitics (Illustrated from Serbian ~ Catalan Language Pair)

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch009 ◽

2015 ◽

pp. 235-255

Author(s):

Jasmina Milićević ◽

Àngels Catena

Keyword(s):

Machine Translation ◽

Phrase Structure ◽

Linguistic Theory ◽

Target Language ◽

Clitic Doubling ◽

Rule Based ◽

Linguistic Representations ◽

Rule Based Approach ◽

Translation Systems ◽

Language Pair

Translation of sentences featuring clitics often poses a problem to machine translation systems. In this chapter, we illustrate, on the material from a Serbian ~ Catalan parallel corpus, a rule-based approach to solving translational structural mismatches between linguistic representations that underlie source- and target language sentences containing clitics. Unlike most studies in this field, which make use of phrase structure formalisms, ours has been conducted within the dependency framework of the Meaning-Text linguistic theory. We start by providing a brief description of Catalan and Serbian clitic systems, then introduce the basics of our framework to finally illustrate Serbian ~ Catalan translational mismatches involving the operations of clitic doubling, clitic climbing, and clitic possessor raising.

Download Full-text

Revealing Groups of Semantically Close Textual Documents by Clustering

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch004 ◽

2015 ◽

pp. 71-111

Author(s):

František Dařena ◽

Jan Žižka

Keyword(s):

Prior Knowledge ◽

Clustering Algorithms ◽

Data Preprocessing ◽

Unsupervised Clustering ◽

External Evaluation ◽

Natural Languages ◽

Text Documents ◽

Customer Reviews ◽

Textual Data ◽

Expert Validation

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

Download Full-text

Departing the Ontology Layer Cake

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch007 ◽

2015 ◽

pp. 167-203 ◽

Cited By ~ 4

Author(s):

Abel Browarnik ◽

Oded Maimon

Keyword(s):

Time Complexity ◽

Alternative Model ◽

Linguistic Knowledge ◽

Ontology Learning ◽

Learning Framework ◽

Layer Cake ◽

Learning From Text ◽

Scientific Papers ◽

Entity Relationship ◽

Entity Relationship Diagrams

In this chapter we analyze Ontology Learning and its goals, as well as the input expected when learning ontologies - peer-reviewed scientific papers in English. After reviewing the Ontology Learning Layer Cake model's shortcomings we suggest an alternative model based on linguistic knowledge. The suggested model would find the meaning of simple components of text – statements. From them it is easy to derive cases and roles that map the reality as a set of entities and relationships or RDF triples, somehow equivalent to Entity-relationship diagrams. Time complexity for the suggested ontology learning framework is constant (O(1)) for a sentence, and O(n) for an ontology with n sentences. We conclude that the Ontology Learning Layer Cake is not adequate for Ontology Learning from text.

Download Full-text

Model of the Empirical Distribution Law for Syntactic and Link Words in “Perfect” Texts

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch002 ◽

2015 ◽

pp. 27-47 ◽

Cited By ~ 1

Author(s):

Pavel Makagonov

Keyword(s):

Empirical Distribution ◽

Golden Section ◽

Point Of View ◽

Rank Distribution ◽

Distribution Law ◽

Perfect System ◽

Frequency Representation ◽

Semantic Value

The measure of perfection of the contents and semantic value of an integrated text is connected with the indicators of perfection in the distribution of content words. This criterion is the coordination of their “frequency-rank” distribution with the Zipf or Zipf-Mandelbrot law. In this chapter the hypothesis verified is that a perfect system should have not only perfect distribution of its elements - objects, but also perfect connections between them. A model is suggested in which the degree of the text perfection from the point of view of the quality of connections between significative words is determined by the quality of distribution of syntactic and link words in the “rank - frequency” representation. As a simplified criterion the ratio of the significant and syntactic words used in the analyzed text and the degree of the closeness of this ratio to the “golden section” is considered.

Download Full-text

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Sentiment Classification

Semantics of Techno-Social Spaces

Natural Language Processing as Feature Extraction Method for Building Better Predictive Models

A Corpus-Stylistic Approach of the Treatises of Great Athanasius About Idolatry

Machine Translation within Commercial Companies

Translational Mismatches Involving Clitics (Illustrated from Serbian ~ Catalan Language Pair)

Revealing Groups of Semantically Close Textual Documents by Clustering

Departing the Ontology Layer Cake

Model of the Empirical Distribution Law for Syntactic and Link Words in “Perfect” Texts

Export Citation Format

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural LanguageLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Sentiment Classification

Semantics of Techno-Social Spaces

Natural Language Processing as Feature Extraction Method for Building Better Predictive Models

A Corpus-Stylistic Approach of the Treatises of Great Athanasius About Idolatry

Machine Translation within Commercial Companies

Translational Mismatches Involving Clitics (Illustrated from Serbian ~ Catalan Language Pair)

Revealing Groups of Semantically Close Textual Documents by Clustering

Departing the Ontology Layer Cake

Model of the Empirical Distribution Law for Syntactic and Link Words in “Perfect” Texts

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language
Latest Publications