original text
Recently Published Documents


TOTAL DOCUMENTS

1339
(FIVE YEARS 778)

H-INDEX

14
(FIVE YEARS 5)

2022 ◽  
Vol 3 (1) ◽  
pp. 1-27
Author(s):  
Md Momin Al Aziz ◽  
Tanbir Ahmed ◽  
Tasnia Faequa ◽  
Xiaoqian Jiang ◽  
Yiyu Yao ◽  
...  

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.


2022 ◽  
Vol 40 (1) ◽  
pp. 1-24
Author(s):  
Seyed Ali Bahrainian ◽  
George Zerveas ◽  
Fabio Crestani ◽  
Carsten Eickhoff

Neural sequence-to-sequence models are the state-of-the-art approach used in abstractive summarization of textual documents, useful for producing condensed versions of source text narratives without being restricted to using only words from the original text. Despite the advances in abstractive summarization, custom generation of summaries (e.g., towards a user’s preference) remains unexplored. In this article, we present CATS, an abstractive neural summarization model that summarizes content in a sequence-to-sequence fashion while also introducing a new mechanism to control the underlying latent topic distribution of the produced summaries. We empirically illustrate the efficacy of our model in producing customized summaries and present findings that facilitate the design of such systems. We use the well-known CNN/DailyMail dataset to evaluate our model. Furthermore, we present a transfer-learning method and demonstrate the effectiveness of our approach in a low resource setting, i.e., abstractive summarization of meetings minutes, where combining the main available meetings’ transcripts datasets, AMI and International Computer Science Institute(ICSI) , results in merely a few hundred training documents.


Author(s):  
Hassan Najadat ◽  
Mohammad A. Alzubaidi ◽  
Islam Qarqaz

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.


Author(s):  
Б. В. Эльбикова

Исследование посвящено сравнительному анализу оригинального и переводных текстов калмыцкой народной сказки «Аю Чикт Авха Цецен хойр» («Аю Чикте и Авха Цецен») из репертуара сказителя М. Буринова. В процессе сличения исходного текста сказки на калмыцком языке (1960) и русскоязычного перевода М. Г. Ватагина (1964) отмечается характер разночтений и неточностей, обнаруженных в иноязычном нарративе в передаче смысла отдельных эпизодов сюжета, формульных выражений, словосочетаний, играющих важную роль в сказочном повествовании. Изучение фольклорного текста в его разноязычных воплощениях представляется актуальным в свете проблем, возникающих при взаимодействии текстов дистантных культур. Для передачи национальной специфики сказочной традиции требуется максимальная точность при переводе, имеющим важное значение для понимания исконного смысла оригинального текста. The study is devoted to a comparative analysis of the original and translated texts of the Kalmyk folk tale "Ayu Chikt Avkha Tsetsn khoir" ("Ayu Chikte and Avkha Tsetsen") from the repertoire of the narrator M. Burinov. In the process of comparing the original text of the fairy tale in the Kalmyk language (1960) and the Russian translation by M. G. Vatagina (1964) notes the nature of the discrepancies and inaccuracies found in the foreign language narrative in the transfer of the meaning of individual episodes of the plot, formula expressions, word combinations), which play an important role in the fairy tale narration. The study of a folklore text in its multilingual embodiments is relevant in the light of the problems that arise within the interaction of texts of distant cultures. To convey the national specifics of the fairy - tale tradition, maximum accuracy is required when translating episodes, formulas and some words that are important for understanding the original meaning of an original text.


Author(s):  
Б. В. Эльбикова

Исследование посвящено сравнительному анализу оригинального и переводных текстов калмыцкой народной сказки «Аю Чикт Авха Цецен хойр» («Аю Чикте и Авха Цецен») из репертуара сказителя М. Буринова. В процессе сличения исходного текста сказки на калмыцком языке (1960) и русскоязычного перевода М. Г. Ватагина (1964) отмечается характер разночтений и неточностей, обнаруженных в иноязычном нарративе в передаче смысла отдельных эпизодов сюжета, формульных выражений, словосочетаний, играющих важную роль в сказочном повествовании. Изучение фольклорного текста в его разноязычных воплощениях представляется актуальным в свете проблем, возникающих при взаимодействии текстов дистантных культур. Для передачи национальной специфики сказочной традиции требуется максимальная точность при переводе, имеющим важное значение для понимания исконного смысла оригинального текста. The study is devoted to a comparative analysis of the original and translated texts of the Kalmyk folk tale "Ayu Chikt Avkha Tsetsn khoir" ("Ayu Chikte and Avkha Tsetsen") from the repertoire of the narrator M. Burinov. In the process of comparing the original text of the fairy tale in the Kalmyk language (1960) and the Russian translation by M. G. Vatagina (1964) notes the nature of the discrepancies and inaccuracies found in the foreign language narrative in the transfer of the meaning of individual episodes of the plot, formula expressions, word combinations), which play an important role in the fairy tale narration. The study of a folklore text in its multilingual embodiments is relevant in the light of the problems that arise within the interaction of texts of distant cultures. To convey the national specifics of the fairy - tale tradition, maximum accuracy is required when translating episodes, formulas and some words that are important for understanding the original meaning of an original text.


2022 ◽  
Vol 16 (2) ◽  
pp. 271-282
Author(s):  
Nur Rosyidah Syahbaniyah ◽  
Totok Suhardijanto

This study discusses class and semantic shifts of adverbs of modality in the Korean short story and its Bahasa Indonesia translation in the short story anthology of ‘Langit dan Kupu-Kupu. This study aims to identify how the adverbs of modality original text change into a different word class in the target text. The sources of data in this study were six Korean short stories entitled ‘Dua Generasi yang Teraniaya’, ‘Seoul Musim Dingin 1964’, ‘Jalan ke Sampho’, ‘Bung Kim di Kampung Kami’, ‘Dinihari ke Garis Depan’, dan ‘Betulkah? Saya Jerapah’ and its Indonesian translation. This study was conducted using a descriptive qualitative method, and the design of a linguistic corpus was used to collect analytical data. The analysis results found that from 46 adverbs of modality, four translated adverbs remained classified as adverbs. At the same time, the other ten words change their class into pronouns, nouns, particles, adjectives, and verbs. Additionally, the other 32 words have a combination of adverbs and other word classes. Furthermore, of the 290 adverb words in the source text, 143 words were accurately translated, 100 were deleted, and 47 changed their meaning in the TT. In the translation of Korean-Indonesian short stories, the shifting technique is used to adjust differences between Korean and Indonesian grammar systems. Translators also make a shift in the word's meaning of short stories as long as they do not deviate from the context and message in the ST to produce a natural translation that TL readers can easily understand.


2022 ◽  
Vol 7 (5) ◽  
pp. 24-35
Author(s):  
E. S. Goncharenko

This article offers the results of the investigation of repetitions in the modern Spanish language. To understand the role of the repetitions in a certain text, first of all, it’s necessary to determine whether they are immanent in the language or culture, and, therefore, unmarked, or, on the contrary, carry some charge: stylistic, rhythmic or pragmatic. Such differentiation is carried out by means of the analysis and synthesis of the theoretic material (А. Аlonso, E. А. Llorach, J. Nogeira, V. Iovenko, V. Vinogradov, etc.), contrastive and comparative analysis. The results show the redundancy of the Spanish language in comparison with Russian, which accounts for the numerous unmarked repetitions in Spanish. On the other hand, the frequent use of repetitions as stylistic, semantic or rhythmic device becomes evident too. For the analysis, we chose some official documents, characterized by the absence of stylistic devices, and some appellative and literary texts (poetry by A. Carvajal, a novel by S. Puertolas, etc.), which are apriori aimed at the form and pragmatic effect. This approach helps achieve the most objective conclusions concerning the nature of the repetitions in a text. We considered lexical and grammar repetitions, grammar, semantic and concept repetitions. Phonetic and lexical repetitions, as the basic stylistic devices, have not been subjected to analysis, as their markedness is evident. The results of the research, presented in the article, may be useful both for the people studying the Spanish language in order to speak it correctly and to understand the pragmatic function of repetition, and for translators to decide whether to follow the structure and rhythm of the text if repetitions are marked, or to omit them when they are in the original text, should they be immanent in the language and the culture.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-18
Author(s):  
Krishnaveni P. ◽  
Balasundaram S. R.

The day-to-day growth of online information necessitates intensive research in automatic text summarization (ATS). The ATS software produces summary text by extracting important information from the original text. With the help of summaries, users can easily read and understand the documents of interest. Most of the approaches for ATS used only local properties of text. Moreover, the numerous properties make the sentence selection difficult and complicated. So this article uses a graph based summarization to utilize structural and global properties of text. It introduces maximal clique based sentence selection (MCBSS) algorithm to select important and non-redundant sentences that cover all concepts of the input text for summary. The MCBSS algorithm finds novel information using maximal cliques (MCs). The experimental results of recall oriented understudy for gisting evaluation (ROUGE) on Timeline dataset show that the proposed work outperforms the existing graph algorithms Bushy Path (BP), Aggregate Similarity (AS), and TextRank (TR).


Sign in / Sign up

Export Citation Format

Share Document