text collections Latest Research Papers

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

ACM Transactions on Computing for Healthcare ◽

10.1145/3473939 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-16

Author(s):

Haoran Ding ◽

Xiao Luo

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Language Processing ◽

Medical Literature ◽

Graph Model ◽

The Self ◽

Keyphrase Extraction ◽

Text Data ◽

Text Collections ◽

Extraction Model

Searching, reading, and finding information from the massive medical text collections are challenging. A typical biomedical search engine is not feasible to navigate each article to find critical information or keyphrases. Moreover, few tools provide a visualization of the relevant phrases to the query. However, there is a need to extract the keyphrases from each document for indexing and efficient search. The transformer-based neural networks—BERT has been used for various natural language processing tasks. The built-in self-attention mechanism can capture the associations between words and phrases in a sentence. This research investigates whether the self-attentions can be utilized to extract keyphrases from a document in an unsupervised manner and identify relevancy between phrases to construct a query relevancy phrase graph to visualize the search corpus phrases on their relevancy and importance. The comparison with six baseline methods shows that the self-attention-based unsupervised keyphrase extraction works well on a medical literature dataset. This unsupervised keyphrase extraction model can also be applied to other text data. The query relevancy graph model is applied to the COVID-19 literature dataset and to demonstrate that the attention-based phrase graph can successfully identify the medical phrases relevant to the query terms.

17. Text as Data: Finding Stories in Text Collections

The Data Journalism Handbook ◽

10.1515/9789048542079-018 ◽

2021 ◽

pp. 116-123

Author(s):

Barbara Maseda

Keyword(s):

Text Collections

The South Estonian language islands in the context of the Central Baltic area

Eesti ja soome-ugri keeleteaduse ajakiri Journal of Estonian and Finno-Ugric Linguistics ◽

10.12697/jeful.2021.12.2.02 ◽

2021 ◽

Vol 12 (2) ◽

pp. 33-72

Author(s):

Miina Norvik ◽

Uldis Balodis ◽

Valts Ernštreits ◽

Gunta Kļava ◽

Helle Metslang ◽

...

Keyword(s):

Comparative Analysis ◽

Distribution Patterns ◽

The South ◽

Phonological Features ◽

Text Collections ◽

Linguistic Patterns ◽

Estonian Language ◽

Baltic Area ◽

Comparative Information ◽

Over Time

This article offers a comparative analysis of several morphosyntactic and phonological features in the South Estonian language islands: Leivu, Lutsi, and Kraasna. The objective is to give an overview of the distribution of selected features, their (in)stability over time, and discuss their form and use in a broader areal context. To achieve this goal, comparative information was also included from the closest cognate varieties (Estonian and the South Estonian varieties, Courland Livonian and Salaca Livonian) and the main contact varieties (Latgalian, Latvian, and Russian). The data analysed in this study originated from various sources: text collections, dictionaries, and language corpora. The results reveal a multitude of linguistic patterns and distribution patterns, which means that the studied varieties are similar to / different from one another in various ways and points to multifaceted contact situations and outcomes in this area. Kokkuvõte. Miina Norvik, Uldis Balodis, Valts Ernštreits, Gunta Kļava, Helle Metslang, Karl Pajusalu, Eva Saar: Lõunaeesti keelesaared Kesk-Balti mõjuväljas. Artikkel esitab lõunaeesti keelesaarte – Leivu, Lutsi ja Kraasna – mitme morfosüntaktilise ja fonoloogilise joone võrdleva analüüsi. Uurimuse eesmärgiks on anda ülevaade valitud joonte levikust ja püsivusest ajas ning arutleda nende vormide ja kasutuse üle laiemas areaalses kontekstis. Selleks võetakse arvesse lähimate sugulaskeelte (eesti ja lõunaeesti, Kuramaa ja Salatsi liivi) ja -murrete ning tähtsamate kontaktkeelte (latgali, läti, vene) esinemusi. Analüüsitakse erinevatest allikatest, mh tekstikogudest, sõna- raamatutest ja keelekorpustest pärit ainest. Uurimistulemused toovad esile mitmesuguseid vormiseoseid ja muutuste levikuviise, osutades uuritud keelte ja murrete omavaheliste kontaktide mitmelaadsusele ning sellest tingitud erinevatele keelesüsteemi arengutele.

Нариси мовного образу ПРАВЕДНОГО в українських текстах Псалтиря (переклад Івана Огієнка)

Studia Linguistica ◽

10.19195/0137-1169.40.1 ◽

2021 ◽

Vol 40 ◽

pp. 7-19

Author(s):

Olga Barabasz-Rewak

Keyword(s):

Old Testament ◽

Image Of God ◽

Literary Language ◽

Text Collections ◽

Scientific Nature ◽

Biblical Concepts

This article is a part of a study on the integral linguistic image of God in the Ukrainian translation of the Psalter translated by Ivan Ohienko. The important role of Ohienko’s texts comes from the scientiﬁc nature of the translation and the inﬂuences in the formation of literary language. The author of the study is interested in the ways and means by which the concept of the RIGHTEOUS – one of the most frequent elements God functions with in text collections – is verbally expressed. Therefore, in this study, attention is focused on an attempt to ethnolinguistically analyse (based on the conception of proﬁling by J. Bartmiński) of the Ukrainian lingual implementation of such biblical concepts as ‘righteous person’, ‘the main signs of a righteous person associated with God’, and ‘the actions of a righteous person towards a) God, b) sinners’. As a result, it will be possible to trace the richness and diversity of the language image ‘righteous’ created by Ivan Ohienko, by bringing readers closer to the understanding of the ways of linguistic ﬁlling of in Old Testament texts with Ukrainian language means.

Paper2vec and Cite2vec Methods for Analyzing Collections of Scientific Publications

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2021-19-3-61-69 ◽

2021 ◽

Vol 19 (3) ◽

pp. 61-69

Author(s):

N. I. Tikhonov

Keyword(s):

Scientific Publications ◽

Text Collections ◽

Vector Representations

Visualizations are used to better understand collections of scientific publications. Various methods of analyzing text collections can be used to build these visualizations. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. To demonstrate a work of these techniques and an example of their application, visualizations were developed, which are described in this paper.

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2021.10.pp.032-039 ◽

2021 ◽

pp. 32-39

Author(s):

N. I. Tikhonov

Keyword(s):

Language Processing ◽

Document Representation ◽

Labor Costs ◽

Scientific Contribution ◽

Scientific Publications ◽

New Methods ◽

Text Collections ◽

Vector Representations ◽

Embedding Methods ◽

Document Visualization

Collections of scientific publications are growing rapidly. Scientists have access to portals containing a large number of documents. Such a large amount of data is difficult to investigate. Methods of document visualization are used to reduce labor costs, search for necessary and similar documents, evaluate the scientific contribution of certain publications and reveal hidden links between documents. The methods of document visualization can be based on various models of document representation. In recent years, word embedding methods for natural language processing have become extremely popular. Following them, methods for analyzing text collections began to appear to obtain vector representations of documents. Although there are many document analyzing systems, new methods can give new understandings of collections, have better performance for analyzing large collections of documents, or find new relationships between documents. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. The text provides a brief description of the considered methods for analyzing collections of scientific publications, describes experiments with these methods, including the visualization of the results of the methods and a description of the problems that arise.

A Comparison of Search Functionalities in Several Tools Used for Searching within Digital Text Collections

Proceedings of the Association for Information Science and Technology ◽

10.1002/pra2.527 ◽

2021 ◽

Vol 58 (1) ◽

pp. 679-681

Author(s):

Liezl H. Ball ◽

Theo J.D. Bothma

Keyword(s):

Digital Text ◽

Text Collections

The Islamic Call to Prayer and Its Origin: A Story about Cultural Memory’s Permanence and Adaptability

Religions ◽

10.3390/rel12100817 ◽

2021 ◽

Vol 12 (10) ◽

pp. 817

Author(s):

Maroussia Bednarkiewicz

Keyword(s):

Cultural Memory ◽

Muslim Identity ◽

Written Text ◽

Text Collections ◽

Fertile Ground ◽

Common Identity ◽

Call To Prayer

For more than two centuries, Muslims have been retelling different stories about the origin of their call to prayer. While the converging details of these narratives offer a glimpse of Muslim cultural memory and its preservation, the diverging elements reflect different mechanisms that facilitate the adaption of this cultural memory to new contexts and concerns. Based on the work of Jan Assmann, the present study explores how Muslims conserved and adapted their cultural memory to keep their common identity and expand their diversity following distinctive religious, political, or personal forms of belongings. The narratives concerned with the origin of the Islamic call to prayer and preserved in various written text collections offer a fertile ground to analyze how this part of Muslim cultural memory became the vehicle of a permanent but adaptable Muslim identity.

Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections

Communication Methods and Measures ◽

10.1080/19312458.2021.1955845 ◽

2021 ◽

pp. 1-20

Author(s):

Daniel Maier ◽

Christian Baden ◽

Daniela Stoltenberg ◽

Maya De Vries-Kedem ◽

Annie Waldherr

Keyword(s):

Machine Translation ◽

Topic Modeling ◽

Text Collections ◽

Multilingual Text

Child-directed speech is optimized for syntax-free semantic inference

Scientific Reports ◽

10.1038/s41598-021-95392-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Guanghao You ◽

Balthasar Bickel ◽

Moritz M. Daum ◽

Sabine Stoll

Keyword(s):

Structural Information ◽

Syntactic Structure ◽

Test Case ◽

Semantic Inference ◽

The Core ◽

Text Collections ◽

Different Types ◽

Complex Adaptive ◽

Core Meaning ◽

Extract Information

AbstractThe way infants learn language is a highly complex adaptive behavior. This behavior chiefly relies on the ability to extract information from the speech they hear and combine it with information from the external environment. Most theories assume that this ability critically hinges on the recognition of at least some syntactic structure. Here, we show that child-directed speech allows for semantic inference without relying on explicit structural information. We simulate the process of semantic inference with machine learning applied to large text collections of two different types of speech, child-directed speech versus adult-directed speech. Taking the core meaning of causality as a test case, we find that in child-directed speech causal meaning can be successfully inferred from simple co-occurrences of neighboring words. By contrast, semantic inference in adult-directed speech fundamentally requires additional access to syntactic structure. These results suggest that child-directed speech is ideally shaped for a learner who has not yet mastered syntactic structure.

text collections
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

17. Text as Data: Finding Stories in Text Collections

The South Estonian language islands in the context of the Central Baltic area

Нариси мовного образу ПРАВЕДНОГО в українських текстах Псалтиря (переклад Івана Огієнка)

Paper2vec and Cite2vec Methods for Analyzing Collections of Scientific Publications

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

A Comparison of Search Functionalities in Several Tools Used for Searching within Digital Text Collections

The Islamic Call to Prayer and Its Origin: A Story about Cultural Memory’s Permanence and Adaptability

Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections

Child-directed speech is optimized for syntax-free semantic inference

Export Citation Format

text collectionsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

17. Text as Data: Finding Stories in Text Collections

The South Estonian language islands in the context of the Central Baltic area

Нариси мовного образу ПРАВЕДНОГО в українських текстах Псалтиря (переклад Івана Огієнка)

Paper2vec and Cite2vec Methods for Analyzing Collections of Scientific Publications

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

A Comparison of Search Functionalities in Several Tools Used for Searching within Digital Text Collections

The Islamic Call to Prayer and Its Origin: A Story about Cultural Memory’s Permanence and Adaptability

Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections

Child-directed speech is optimized for syntax-free semantic inference

text collections
Recently Published Documents