specialized texts
Recently Published Documents

Abstract Nowadays, the use of language corpora for many purposes has increased significantly. General corpora exist for numerous languages, but research often needs more specialized corpora. The Web’s rapid growth has significantly improved access to thousands of online documents, highly specialized texts and comparable texts on the same subject covering several languages in electronic form. However, research has continued to concentrate on corpus annotation instead of corpus creation tools. Consequently, many researchers create their corpora, independently solve problems, and generate project-specific systems. The corpus construction is used for many NLP applications, including machine translation, information retrieval, and question-answering. This paper presents a new NLP Corpus and Services in the Cloud called HULTIG-C. HULTIG-C is characterized by various languages that include unique annotations such as keywords set, sentences set, named entity recognition set, and multiword set. Moreover, a framework incorporates the main components for license detection, language identification, boilerplate removal and document deduplication to process the HULTIG-C. Furthermore, this paper presents some potential issues related to constructing multilingual corpora from the Web.

Download Full-text

Multi-word term variation

Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics ◽

10.1075/resla.19012.cab ◽

2021 ◽

Vol 34 (2) ◽

pp. 402-434

Author(s):

Melania Cabezas-García ◽

Santiago Chambó

Keyword(s):

Language Processing ◽

Semantic Relations ◽

Conceptual Combination ◽

Term Variation ◽

Discourse Studies ◽

Specialized Texts ◽

Complex Nominals ◽

Germanic Languages ◽

Language Corpus ◽

Productive Method

Abstract Complex nominals (CNs) are frequently found in specialized discourse in all languages, since they are a productive method of creating terms by combining existing lexical units. In Spanish, a conceptual combination may often be rendered with a prepositional CN (PCN) or an equivalent adjectival CN (ACN), e.g., demanda de electricidad vs. demanda eléctrica [electricity demand]. Adjectives in ACNs – usually derived from nouns – are known as ‘relational adjectives’ because they encode semantic relations with other concepts. With recent exceptions, research has focused on the underlying semantic relations in CNs. In natural language processing, several works have dealt with the automatic detection of relation adjectives in Romance and Germanic languages. However, there is no discourse studies of these CNs, to our knowledge, for the goal of establishing writer recommendations. This study analyzed the co-text of equivalent PCNs and ACNs to identify factors governing the use of a certain form. EcoLexicon ES, a corpus of Spanish environmental specialized texts, was used to extract 6 relational adjectives and, subsequently, a set of 12 pairs of equivalent CNs. Their behavior in co-text was analyzed by querying EcoLexicon ES and a general language corpus with 20 expressions in CQP-syntax. Our results showed that immediate linguistic co-text determined the preference for a particular structure. Based on these findings, we provide writing guidelines to assist in the production of CNs.

Download Full-text

Post-Editing as the Means to Activate Students’ Thinking and Analytical Process: Psycholinguistic Aspects

PSYCHOLINGUISTICS ◽

10.31470/2309-1797-2021-30-2-221-239 ◽

2021 ◽

Vol 30 (2) ◽

pp. 221-239

Author(s):

Leonid Chernovaty ◽

Natalia Kovalchuk

Keyword(s):

Online Teaching ◽

English Language ◽

Text Structure ◽

Final Test ◽

Contact Hours ◽

Group A ◽

Specialized Texts ◽

Group B ◽

Independent Work ◽

Group D

The aim of the research is looking for the ways to intensify the future translators’ analytical and thinking activity during their independent work in online teaching. The author strives to achieve it through the combination of post-editing machine-translated texts and the Think-aloud protocol procedure. It is also assumed that this combination reduces the students’ dependence on the MT target text structure, as well as improves their competence in translating specialized texts. The methodology of research involved experimental post-editing-based online teaching (28 contact hours and 92 hours of independent work) of an elective university course ‘Specifics of translating English-language discourse in the domain of Psychology’ to the first-year MA students (majoring in English and Translation) whose command of English ranged from C1 to C2 levels in the CEFR classification. The parameters of analysis included the percentage of the students’ uploaded home tasks, the degree of the subjects’ post-editing intensity in their weekly homework, the students’ independence in the interim and final tests, as well as the marks in the Final test. The results of the analysis demonstrated a substantial difference between various groups of the subjects by all indicators. The amount of home tasks uploaded by the subjects in groups A and B (with more intensive analytical and thinking activity) exceeds the similar parameter in groups C and D (with less intensive activity) more than twofold. There is a considerable advantage of the groups A and B (and even C) subjects’ post-editing intensity in their weekly homework as compared to group D. The intensity of the students’ analytical and thinking activity decreased from the highest (group A) to moderately high (group B) to average (group C) and to low (group D). The degree of the students’ independence in the interim and final tests decreased from 85.0% (group A) to 35.0% in group D, with the remaining groups’ indicators in between – 59.0% (group B) and 46.0% (group C). These indicators clearly correlate with the average marks in the final test, which amounted to 93.80, 63 and 53 points (out of 100) in groups А, В, C and D respectively. Conclusions. Post-editing, in combination with the modified TAP procedure, contributes to the efficient development of the specialized texts translation competence due to the intensification of the students’ analytical and thinking activity, reduces their dependence on the MT target text structure and correlates with the improvement of the overall quality of their translation.

Download Full-text

Harmonization of TIQA standards for specialized texts

Linguistics and Culture Review ◽

10.21744/lingcure.v5ns2.1411 ◽

2021 ◽

Vol 5 (S2) ◽

pp. 678-696

Author(s):

Roksolana V. Povoroznyuk

Keyword(s):

Quality Assurance ◽

Quality Criteria ◽

Codes Of Ethics ◽

Professional Standards ◽

Mutual Trust ◽

End User ◽

Mediated Communication ◽

Translation Quality ◽

Assessment Standards ◽

Specialized Texts

This research explores translation and interpreting quality assessment standards (TIQA), selecting those fit for the purpose of specialized translation quality assurance, with the aim to systematize them into a step-by-step framework, referred to as “the TIQA pyramid”, a framework that provides valid and reproducible benchmarks that are endowed with universal features and reflected in codes of ethics and professional standards. The TIQA standards may be subdivided into two major groups: text-oriented and ethical-deontological ones. Such classification is based on the notion of translation quality which is the projection of a translator (interpreter)’s personality (inchoate quality assurance arising out of a system of ethical and deontological precepts), or of textual requirements (choate quality assurance arising out of a system of text-oriented criteria). The “pas-de-trois” in a translated interaction among the commissioner of a specialized translation, its performer and end-user is grounded in the presumably existing mediated communication contract (typically a translation brief). Its positive upshot is manifested in the confidence-imbued multi-party polypragmatic interlingual and intercultural behaviour; the negative, however, is underscored by its implicit nature which leads to the absence of a concerted system of quality criteria, resulting in a lack of satisfaction and mutual trust.

Download Full-text

Evidentiality in science from specialization to popularization: A case study of COVID-19 texts

Journal of World Languages ◽

10.1515/jwl-2021-0007 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Jinyi Huang ◽

Jinjun Wang

Keyword(s):

Information Source ◽

Original Research ◽

Discourse Communities ◽

Medical Texts ◽

Four Dimensions ◽

Linguistic System ◽

Key Issues ◽

Specialized Texts ◽

Popular Texts

Abstract Since the beginning of the COVID-19 pandemic outbreak, medical texts on the pandemic have enjoyed wide popularity, and one of the key issues has always been the accuracy and dependability of the information they contain. The use of evidentiality, a linguistic system which functions to indicate the source and credibility of information, is thus worth exploring in COVID-19 texts. Adopting a synthesized framework within the overall model of systemic functional linguistics, this paper sets out to investigate the lexicogrammar and semantics of evidentiality on the basis of data collected in the form of both specialized and popular texts on COVID-19. Evidentiality in these texts is explored along four dimensions: (i) evidential taxonomy, where specialized texts favor reporting, while popular texts favor belief and inferring; (ii) information source, where specialized texts highlight the voices of authorship, original research, and patients, whereas popular texts highlight the voices of scientists, institutions, countries, and laypeople; (iii) modalization, where specialized texts typically indicate a higher degree of modal responsibility than their popular counterparts; and (iv) engagement, where specialized texts favor dialogic expansion and popular texts favor contraction. It is hoped that these findings will shed light on linguistic variation according to different contextual configurations, as well as clarifying rhetorical conventions in discourse communities of science.

Download Full-text

HULTIG-C: NLP Corpus and Services in the Cloud

10.21203/rs.3.rs-696114/v1 ◽

2021 ◽

Author(s):

Sebastião Pais ◽

João Cordeiro ◽

Muhammad Jamil

Keyword(s):

Question Answering ◽

Named Entity Recognition ◽

Language Identification ◽

Entity Recognition ◽

Corpus Annotation ◽

Named Entity ◽

Corpus Construction ◽

Corpus Creation ◽

Specialized Texts ◽

Main Components

Abstract Nowadays, the use of language corpora for many purposes has increased significantly. General corpora exist for numerous languages, but research often needs more specialized corpora. The Web's rapid growth has significantly improved access to thousands of online documents, highly specialized texts and comparable texts on the same subject covering several languages in electronic form. However, research has continued to concentrate on corpus annotation instead of corpus creation tools. Consequently, many researchers create their own corpora, independently solve problems, and generate project-specific systems. The corpus construction is used for many NLP applications, including machine translation, information retrieval, and question-answering. This paper presents a new NLP Corpus and Services in the Cloud called HULTIG-C. HULTIG-C is characterized by various languages that include unique annotations such as keywords set, sentences set, named entity recognition set, and multiword set. Moreover, a framework incorporates the main components for license detection, language identification, boilerplate removal and document deduplication to process the HULTIG-C. Furthermore, this paper presents some potential issues related to constructing multilingual corpora from the Web.

Download Full-text

About Methods for Classifying Hidden Language Concepts in Specialized Texts Involving Pseudoinverse, Clustering and Data Grouping

Cybernetics and Computer Technologies ◽

10.34229/2707-451x.21.2.7 ◽

2021 ◽

pp. 68-75

Author(s):

Iurii Krak ◽

Anatoliy Kulias ◽

Valentina Petrovych ◽

Vladyslav Kuznetsov

Keyword(s):

Text Processing ◽

Original Data ◽

Feature Representation ◽

Clustering Methods ◽

Scientific Texts ◽

Linear Classifiers ◽

Document Frequency ◽

Specialized Texts ◽

High Recognition Accuracy ◽

The Stability

This paper discusses the problems of analysis of hidden language concepts in scientific texts in the Ukrainian language, using methods of text mining, dimensionality reduction, grouping of features and linear classifiers. A corpus of scientific texts and dictionaries, as well as stop words and affixes, has been formed for processing specialized texts. The resulting texts were analyzed and converted into text frequency-inverse document frequency (TF-IDF) feature representation. In order to process the feature vector, we propose to use methods of dimensionality rteduction of the data, in particular, the algorithm for the synthesis of linear systems and Karunen – Loeve transform and grouping of features: T-stochastic grouping of nearest neighbors (T-SNE). A series of experiments were performed on test examples, in particular, for the determination of informational density in the text and classification by keywords in specialized texts using the method of random samples consensus (RANSAC). A method of classification of hidden language concepts was proposed, making use of clustering methods (K-means). As a result of the experiment, the structure of the classifier of hidden language concepts was obtained in structured texts was obtained, which gained a relatively high recognition accuracy (97 – 99 %) using such linear classification algorithms: decision trees and extreme gradient boost machine. The stability of the proposed method is investigated by using the perturbation of the original data by a variational autoencoder, test runs shown that sparse autocoder reduces the mean square error, but the separation band decreases, which affects the convergence of the classification algorithm. In further research, we propose to apply other methods of analysis of structured texts and ways to improve the separability of specialized texts with similar authorial styles and different topic using a proposed set of parameters. Keywords: text processing, language concepts, pseudoinverse, clusterization, methods of data groupings.

Download Full-text

Machine Learning in Terminology Extraction from Czech and English Texts

Linguistic Frontiers ◽

10.2478/lf-2021-0001 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Dominika Kováříková

Keyword(s):

Machine Learning ◽

Low Frequency ◽

Single Word ◽

Terminology Extraction ◽

Academic Texts ◽

Quantitative Term ◽

Characteristic Features ◽

Specialized Texts ◽

Cross Lingual ◽

Automatic Term Recognition

Abstract The method of automatic term recognition based on machine learning is focused primarily on the most important quantitative term attributes. It is able to successfully identify terms and non-terms (with success rate of more than 95 %) and find characteristic features of a term as a terminological unit. A single-word term can be characterized as a word with a low frequency that occurs considerably more often in specialized texts than in non-academic texts, occurs in a small number of disciplines, its distribution in the corpus is uneven as is the distance between its two instances. A multi-word term is a collocation consisting of words with low frequency and contains at least one single-word term. The method is based on quantitative features and it makes it possible to utilize the algorithms in multiple disciplines as well as to create cross-lingual applications (verified on Czech and English).

Download Full-text

INTERSUBJECTIVE APPROACH TO PRE-TRANSLATION ANALYSIS OF ARCHITECTURAL CRITICISM: REACHING AFTER THE BALANCE

Journal of Teaching English for Specific and Academic Purposes ◽

10.22190/jtesap2102247l ◽

2021 ◽

pp. 247

Author(s):

Svetlana Latysheva

Keyword(s):

Original Text ◽

New Approach ◽

Personal Meanings ◽

Social Sphere ◽

Translation Analysis ◽

Cognitive Information ◽

Translation Methods ◽

Specialized Texts ◽

Expert Community ◽

High Degree

The study discusses a new approach to translation of architectural critical materials focusing on the stage of source texts analysis. Architecture synthesizes science, technology, art and social sphere resulting in the heterogeneity of translatological characteristics of texts representing architectural phenomena. They combine the parameters of institutional and personal discourse possessing the features of specialized texts, such as the predominance of cognitive information and a high degree of conventionality, and artistic texts with their emotional, aesthetic and axiological aspects. This functional ambivalence limits the use of traditional methods based on genre or stylistic analysis. This research is an attempt to develop a new approach to pre-translation analysis of architectural texts yielding adequate translation methods for architectural nominations and contributing to the retainment of original text identity. The study views intersubjectivity in pre-translation analysis as conceptual coordination within the discourse of the expert community carried out in the process of interaction between authors of source texts and addressees of translation texts through interlingual mediation. The developed method allows translators to reveal relevant cognitive and discursive parameters of nominations of architectural phenomena at various language levels. In addition, it reveals the translation dominants that assist both to preserve the sufficient level of translation text conventionality within the institutional discourse of architecture and to transfer personal meanings and values implied by source text authors within their personal discourse.

Download Full-text