scholarly journals Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

2017 ◽  
Vol 68 (2) ◽  
pp. 169-178
Author(s):  
Leonid Iomdin

Abstract Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.

Author(s):  
Bilous O ◽  
◽  
Mishchenko A ◽  
Datska T ◽  
Ivanenko N ◽  
...  

How often students use IT resources is a key factor in the acquisition of skills associated to the new technologies. Strategies aimed at increasing student autonomy need to be developed and should offer resources that encourage them to make use of computing tools in class hours. The analysis of the modern linguistic technologies, concerning intellectual language processing necessary for the creation and function of the highly effective technologies of knowledge operation was considered in the paper under consideration. Computerization of the information sphere has triggered extensive search for solving the problem of the use of natural language mechanisms in automated systems of various types. One of them was creating Controlled languages based on a set of features which made machine translation more refined. Triggered by the economic demand, they are not artificial languages like Esperanto, but natural simplified languages, in terms of vocabulary, grammatical and syntactic structures. More than ever, the tasks of modern computer linguistics behold creating software for natural language processing, information retrieval in large data sets, support of technical authors in the process of creating professional texts and users of computer technology, hence creating new translation tools. Such powerful linguistic resources as corpora of texts, terminology databases and ontologies may facilitate more efficient use of modern multilingual information technology. Creating and improving all methods considered will help make the job of a translator more efficient. One of the programs, CLAT does not aim at producing machine translation, but allows technical editors to create flawless, sequential professional texts through integrated punctuation and spelling modules. Other programs under consideration are to be implemented in Ukrainian translation departments. Moreover, the databases considered in the paper enable studying of the dynamics of the linguistic system and developing areas of applied research such as terminography, terminology, automated data processing etc. Effective cooperation of developers, translators and declarative institutes in the creation of innovative linguistic technologies will promote further development of translation and applied linguistics.


2020 ◽  
Vol 15 (1) ◽  
pp. 13
Author(s):  
Anne Ferger ◽  
Hanna Hedeland

This paper describes the development of a systematic approach to the creation, management and curation of linguistic resources, particularly spoken language corpora. It also presents first steps towards a framework for continuous quality control to be used within external research projects by non-technical users, and discuss various domain and discipline specific problems and individual solutions. The creation of spoken language corpora is not only a time-consuming and costly process, but the created resources often represent intangible cultural heritage, containing recordings of, for example, extinct languages or historical events. Since high quality resources are needed to enable re-use in as many future contexts as possible, researchers need to be provided with the necessary means for quality control. We believe that this includes methods and tools adapted to Humanities researchers as non-technical users, and that these methods and tools need to be developed to support existing tasks and goals of research projects.


2021 ◽  
Author(s):  
Dimitrios Ververidis ◽  
Panagiotis Migkotzidis ◽  
Efstathios Nikolaidis ◽  
Eleftherios Anastasovitis ◽  
Anastasios Papazoglou Chalikias ◽  
...  

Author(s):  
S. Blaser ◽  
J. Meyer ◽  
S. Nebiker

Abstract. With this contribution, we describe and publish two high-quality street-level datasets, captured with a portable high-performance Mobile Mapping System (MMS). The datasets will be freely available for scientific use. Both datasets, from a city centre and a forest represent area-wide street-level reality captures which can be used e.g. for establishing cloud-based frameworks for infrastructure management as well as for smart city and forestry applications. The quality of these data sets has been thoroughly evaluated and demonstrated. For example, georeferencing accuracies in the centimetre range using these datasets in combination with image-based georeferencing have been achieved. Both high-quality multi sensor system street-level datasets are suitable for evaluating and improving methods for multiple tasks related to high-precision 3D reality capture and the creation of digital twins. Potential applications range from localization and georeferencing, dense image matching and 3D reconstruction to combined methods such as simultaneous localization and mapping and structure-from-motion as well as classification and scene interpretation. Our dataset is available online at: https://www.fhnw.ch/habg/bimage-datasets


LEKSIKA ◽  
2019 ◽  
Vol 12 (2) ◽  
pp. 66
Author(s):  
Dewi Puspitasari

This article explores the use of DST and the explicit teaching in one of universities in Indonesia on how stu-dents use it to help the learning process. Using a digital story to teach English and Based on multimodal theory, the term of DST has been increasingly used by scholars to illustrate various forms of support of learn-ing to help students learn successfully in a classroom. Despite being widely used in educational context of many countries, DST has received scanty attention from teacher especially in ESP classes. This article specif-ically describes our experience of using DST as a learning aid with students of 18 to 19 years old. In this project they individually created collected the photographs based on their interest related to the specified theme as multimodal text. In the process they utilized two linguistic resources (Bahasa Indonesia and Eng-lish) to help them in understanding the process of creation. Several supports from machine translation and machine pronunciation software were employed during the creation of DST project. The result shows that DST helps students in composing narrative writing by analyzing the visual prompts. This proves that DST is impending to support the writing process as students were engaged during the process.


Author(s):  
Margarita G. Bogatkina ◽  
Elena S. Doroschuk ◽  
Tatiana S. Staroverova

Today the convergence processes determine the most promising areas of modern science. This methodological setting in the field of journalism is implemented in a variety of multimedia forms, which has led to the creation of the fundamentally new information and communication environment and the emergence of a variety of multimedia projects. The question about the criteria and methods for creating a high-quality multimedia product remains open. The multimedia method of perception and presentation of materials require special philological preparation and mastering of the interdisciplinary technique of interpretation of the source materials, which would help to create its qualitative multimedia variation. To transfer the literary discourse into a multimedia projection, it is necessary to identify its semantic channels, contexts that can be implemented in the cross-media content using various technical means. In this regard, it is important to substantiate the basic principles of the contextual method of interpreting literary discourse. It is also proposed to highlight contexts that are present in the discourse and are actualized by the perceiving recipient including historical, biographical, literary, linguistic, philosophical, mythological, literary critical, as well as those of various types of art - painting, music, etc., and the scientific context. The structure-forming principle that allows comprehending these contexts as an integral system is the process of dialogic interaction of their intra- and extra-textual existence. The nature of the functioning of this context system is based on the implementation of the following factors: 1) the degree to which authorial/reader determinism manifests itself in the process of implementation of a specific context; 2) the degree of awareness/unconsciousness of the context embodiment in the work; 3) the degree of relative stiffness/probability of the context functioning; and 4) the degree of certainty/uncertainty of the implementation of the external context in the literary discourse. Considering an example of the story by Sholokhov, “The Fate of Man”, it is considered as the development of the context system in the form of a substantive basis for the further transfer of this text into a multimedia projection. It is revealed that the literary discourse is born at the junction of information and communication approaches to the text. The disclosure of the multimedia nature of literary discourse helps to restore the very process of its functioning and its dialogical nature. The contextual methodology for the interpretation of literary discourse also allows determining the dialogue channels - the context system that are the basis for the creation of high-quality multimedia content in the future. Since the multimedia method of material perception and presentation requires special preparation, it is advisable to develop the skills of multimedia thinking using the example of classical literature, based on the principles of contextual methodology. A philological literacy can also play an important role in the process of training future journalists.


2019 ◽  
Vol 19 (3) ◽  
pp. 250-256
Author(s):  
Žaneta Juchnevičienė ◽  
Milda Jucienė ◽  
Vaida Dobilaitė ◽  
Virginija Sacevičienė ◽  
Svetlana Radavičienė

Abstract The embroidery process is one of the means of joining textile materials into a system, which is widely applied in the creation of products of special destinations. The development of the functionality of embroidery systems is indissoluble from high-quality requirements for the accuracy of the form of the element. In the embroidery process, the system of textile materials experiences various dynamic loads, multiple stretching, and crushing; therefore, the geometrical parameters of the embroidery element change. The objective of this paper was to analyze the widths of the different square-form closed-circuit embroidery elements and also to perform their analysis with the purpose to evaluate the embroidery accuracy of the embroidered elements. Test samples were prepared in the form of square-form closed-circuit embroidery elements of five different contour widths: 6 mm, 10 mm, 14 mm, 18 mm, and 22 mm. During the investigation, it has been determined that in most cases the contour widths of the five closed-circuit square-form embroidery elements were obtained, smaller than the size of the digitally designed element.


2020 ◽  
Author(s):  
Guillaume Klein ◽  
Dakun Zhang ◽  
Clément Chouteau ◽  
Josep Crego ◽  
Jean Senellart

Sign in / Sign up

Export Citation Format

Share Document