How to build a constructicon in five years

2021 ◽  
Vol 34 ◽  
Author(s):  
Laura A. Janda ◽  
Anna Endresen ◽  
Valentina Zhukova ◽  
Daria Mordashova ◽  
Ekaterina Rakhilina

Abstract We provide a practical step-by-step methodology of how to build a full-scale constructicon resource for a natural language, sharing our experience from the nearly completed project of the Russian Constructicon, an open-access searchable database of over 2,200 Russian constructions (https://site.uit.no/russian-constructicon/). The constructions are organized in families, clusters, and networks based on their semantic and syntactic properties, illustrated with corpus examples, and tagged for the CEFR level of language proficiency. The resource is designed for both researchers and L2 learners of Russian and offers the largest electronic database of constructions built for any language. We explain what makes the Russian Constructicon different from other constructicons, report on the major stages of our work, and share the methods used to systematically expand the inventory of constructions. Our objective is to encourage colleagues to build constructicon resources for additional natural languages, thus taking Construction Grammar to a new quantitative and qualitative level, facilitating cross-linguistic comparison.

Author(s):  
Stephen Neale

Syntax (more loosely, ‘grammar’) is the study of the properties of expressions that distinguish them as members of different linguistic categories, and ‘well-formedness’, that is, the ways in which expressions belonging to these categories may be combined to form larger units. Typical syntactic categories include noun, verb and sentence. Syntactic properties have played an important role not only in the study of ‘natural’ languages (such as English or Urdu) but also in the study of logic and computation. For example, in symbolic logic, classes of well-formed formulas are specified without mentioning what formulas (or their parts) mean, or whether they are true or false; similarly, the operations of a computer can be fruitfully specified using only syntactic properties, a fact that has a bearing on the viability of computational theories of mind. The study of the syntax of natural language has taken on significance for philosophy in the twentieth century, partly because of the suspicion, voiced by Russell, Wittgenstein and the logical positivists, that philosophical problems often turned on misunderstandings of syntax (or the closely related notion of ‘logical form’). Moreover, an idea that has been fruitfully developed since the pioneering work of Frege is that a proper understanding of syntax offers an important basis for any understanding of semantics, since the meaning of a complex expression is compositional, that is, built up from the meanings of its parts as determined by syntax. In the mid-twentieth century, philosophical interest in the systematic study of the syntax of natural language was heightened by Noam Chomsky’s work on the nature of syntactic rules and on the innateness of mental structures specific to the acquisition (or growth) of grammatical knowledge. This work formalized traditional work on grammatical categories within an approach to the theory of computability, and also revived proposals of traditional philosophical rationalists that many twentieth-century empiricists had regarded as bankrupt. Chomskian theories of grammar have become the focus of most contemporary work on syntax.


Discourse ◽  
2020 ◽  
Vol 6 (3) ◽  
pp. 109-117
Author(s):  
O. M. Polyakov

Introduction. The article continues the series of publications on the linguistics of relations (hereinafter R–linguistics) and is devoted to an introduction to the logic of natural language in relation to the approach considered in the series. The problem of natural language logic still remains relevant, since this logic differs significantly from traditional mathematical logic. Moreover, with the appearance of artificial intelligence systems, the importance of this problem only increases. The article analyzes logical problems that prevent the application of classical logic methods to natural languages. This is possible because R-linguistics forms the semantics of a language in the form of world model structures in which language sentences are interpreted.Methodology and sources. The results obtained in the previous parts of the series are used as research tools. To develop the necessary mathematical representations in the field of logic and semantics, the formulated concept of the interpretation operator is used.Results and discussion. The problems that arise when studying the logic of natural language in the framework of R–linguistics are analyzed. These issues are discussed in three aspects: the logical aspect itself; the linguistic aspect; the aspect of correlation with reality. A very General approach to language semantics is considered and semantic axioms of the language are formulated. The problems of the language and its logic related to the most General view of semantics are shown.Conclusion. It is shown that the application of mathematical logic, regardless of its type, to the study of natural language logic faces significant problems. This is a consequence of the inconsistency of existing approaches with the world model. But it is the coherence with the world model that allows us to build a new logical approach. Matching with the model means a semantic approach to logic. Even the most General view of semantics allows to formulate important results about the properties of languages that lack meaning. The simplest examples of semantic interpretation of traditional logic demonstrate its semantic problems (primarily related to negation).


Author(s):  
LI LI ◽  
HONGLAI LIU ◽  
QINGSHI GAO ◽  
PEIFENG WANG

The sentences in several different natural languages can be produced congruously and synchronous by the new generating system USGS = {↔, GI|GI = (TI, N, B-RISU, C-treeI, S, PI, FI), I = 0, 1, 2, …, n}, based on Semantic Language(SL) theory, all are legitimate and reasonable, where, B-RISU is the set of basic-RISU, C-treeI is the set of category-trees, and FI is the set of functions in I-natural language. The characteristic of this new generating system is unified, synchronous and one by one corresponding, based on semantic unit theory and that the number of rules is several millions.


Traditional encryption systems and techniques have always been vulnerable to brute force cyber-attacks. This is due to bytes encoding of characters utf8 also known as ASCII characters. Therefore, an opponent who intercepts a cipher text and attempts to decrypt the signal by applying brute force with a faulty pass key can detect some of the decrypted signals by employing a mixture of symbols that are not uniformly dispersed and contain no meaningful significance. Honey encoding technique is suggested to curb this classical authentication weakness by developing cipher-texts that provide correct and evenly dispersed but untrue plaintexts after decryption with a false key. This technique is only suitable for passkeys and PINs. Its adjustment in order to promote the encoding of the texts of natural languages such as electronic mails, records generated by man, still remained an open-end drawback. Prevailing proposed schemes to expand the encryption of natural language messages schedule exposes fragments of the plaintext embedded with coded data, thus they are more prone to cipher text attacks. In this paper, amending honey encoded system is proposed to promote natural language message encryption. The main aim was to create a framework that would encrypt a signal fully in binary form. As an end result, most binary strings semantically generate the right texts to trick an opponent who tries to decipher an error key in the cipher text. The security of the suggested system is assessed..


Author(s):  
Irisa Berga

<p>This paper addresses unresolved issues in the acquisition, processing and use of multi-word units which account for the learner’s idiomatic, natural language. The aim of the study is to argue for an analytic instructional approach to developing the trainee teacher’s collocational and phonological competences through the medium of the native language employing a set of didactic and linguistic techniques like etymological, phonological, structural, lexical and semantic dissection of multi-word units. Research results imply that analytic processing of multi-word units relate moderately to the enhancement of the learner’s collocational and phonological competences though relations between formal instruction and the language proficiency level may be partly obscured by the probable exposure of the learner to multi-word units in informal settings.<strong></strong></p>


Author(s):  
Jan Žižka ◽  
František Dařena

Gaining new and keeping existing clients or customers can be well-supported by creating and monitoring feedbacks: “Are the customers satisfied? Can we improve our services?” One of possible feedbacks is allowing the customers to freely write their reviews using a simple textual form. The more reviews that are available, the better knowledge can be acquired and applied to improving the service. However, very large data generated by collecting the reviews has to be processed automatically as humans usually cannot manage it within an acceptable time. The main question is “Can a computer reveal an opinion core hidden in text reviews?” It is a challenging task because the text is written in a natural language. This chapter presents a method based on the automatic extraction of expressions that are significant for specifying a review attitude to a given topic. The significant expressions are composed using significant words revealed in the documents. The significant words are selected by a decision-tree generator based on the entropy minimization. Words included in branches represent kernels of the significant expressions. The full expressions are composed of the significant words and words surrounding them in the original documents. The results are here demonstrated using large real-world multilingual data representing customers' opinions concerning hotel accommodation booked on-line, and Internet shopping. Knowledge discovered in the reviews may subsequently serve for various marketing tasks.


2010 ◽  
Vol 1 (3) ◽  
pp. 1-19 ◽  
Author(s):  
Weisen Guo ◽  
Steven B. Kraines

To promote global knowledge sharing, one should solve the problem that knowledge representation in diverse natural languages restricts knowledge sharing effectively. Traditional knowledge sharing models are based on natural language processing (NLP) technologies. The ambiguity of natural language is a problem for NLP; however, semantic web technologies can circumvent the problem by enabling human authors to specify meaning in a computer-interpretable form. In this paper, the authors propose a cross-language semantic model (SEMCL) for knowledge sharing, which uses semantic web technologies to provide a potential solution to the problem of ambiguity. Also, this model can match knowledge descriptions in diverse languages. First, the methods used to support searches at the semantic predicate level are given, and the authors present a cross-language approach. Finally, an implementation of the model for the general engineering domain is discussed, and a scenario describing how the model implementation handles semantic cross-language knowledge sharing is given.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Jens Nevens ◽  
Paul Van Eecke ◽  
Katrien Beuls

AbstractIn order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.


2016 ◽  
Vol 4 (1) ◽  
pp. 42
Author(s):  
Anatoliy Vitryak ◽  
Boris Slipak ◽  
Kirpitnyov Serhii

The article deals with the still topical problem of plain aviation English. This problem has been highlighted by ICAO in its ‘Manual on the Implementation of ICAO Language Proficiency Requirements’ (Doc 9835). According to this ‘Manual’, each pilot and air traffic controller are required to have a good ranked command of not only standardized radiotelephony phraseologies which remains dominant but also of plain English intended to be used in the cases which are not covered by the phraseologies. As far as the authors are aware, the concept of plain aviation English has remained mainly declarative so far. The article under consideration is aimed to make up qualitatively quantitatively for this lack. To master plain aviation English, along with the phraseologies, means in fact to acquire natural language competency.


Sign in / Sign up

Export Citation Format

Share Document