scholarly journals Making and Using AI in the Library: Creating a BERT Model at the National Library of Sweden

2022 ◽  
Author(s):  
Chris Haffenden ◽  
Elena Fano ◽  
Martin Malmsten ◽  
Love Börjeson

How can novel AI techniques be made and put to use in the library? Combining methods from data and library science, this article focuses on Natural Language Processing technologies in especially national libraries. It explains how the National Library of Sweden’s collections enabled the development of a new BERT language model for Swedish. It also outlines specific use cases for the model in the context of academic libraries, detailing strategies for how such a model could make digital collections available for new forms of research: from automated classification to enhanced searchability and improved OCR cohesion. Highlighting the potential for cross-fertilizing AI with libraries, the conclusion suggests that while AI may transform the workings of the library, libraries can also have a key role to play in the future development of AI.

2015 ◽  
Vol 1 (1) ◽  
Author(s):  
Keith W. Kintigh

AbstractTo address archaeology’s most pressing substantive challenges, researchers must discover, access, and extract information contained in the reports and articles that codify so much of archaeology’s knowledge. These efforts will require application of existing and emerging natural language processing technologies to extensive digital corpora. Automated classification can enable development of metadata needed for the discovery of relevant documents. Although it is even more technically challenging, automated extraction of and reasoning with information from texts can provide urgently needed access to contextualized information within documents. Effective automated translation is needed for scholars to benefit from research published in other languages.


2019 ◽  
Vol 8 (4) ◽  
pp. 10289-10293

Sentiment Analysis is a tool used for determining the Polarity or Emotion of a Sentence. It is a field of Natural Language Processing which focuses on the study of opinions. In this study, the researchers solved one key challenge in Sentiment Analysis, which is to consider the Ending Punctuation Marks present in a sentence. Ending punctuation marks plays a significant role in Emotion Recognition and Intensity Level Recognition. The research made used of tweets expressing opinions about Philippine President Rodrigo Duterte. These downloaded tweets served as the inputs. It was initially subjected to pre-processing stage to be able to prepare the sentences for processing. A Language Model was created to serve as the classifier for determining the scores of the tweets. The scores give the polarity of the sentence. Accuracy is very important in sentiment analysis. To increase the chance of correctly identifying the polarity of the tweets, the input undergone Intensity Level Recognition which determines the intensifiers and negations within the sentences. The system was evaluated with overall performance of 80.27%.


Author(s):  
Yinjun Hu ◽  
Mengmeng Chen ◽  
Qian Wang ◽  
Yue Zhu ◽  
Bei Wang ◽  
...  

Abstract [Background] On January 7, 2020, the novel coronavirus named "COVID-19" aroused worldwide concern was identified by Chinese scientists. Many related research works were developed for the emerging, rapidly evolving situation of this epidemic. This study aimed to analyze the research literatures on SARS, MERS and COVID-19 to retrieve important information for virologists, epidemiologist and policy decision makers. [Methods] In this study, we collected data from multi data sources and compared bibliometrics indices among COVID-19, Severe Acute Respiratory Syndrome (SARS), and Middle East Respiratory Syndrome (MERS) up to March 25, 2020. In purpose to extract data in corresponding quantity and scale, the volume of search results will be balance with the limitation of publication years. For further analysis, we extracted 1,480 documents from 1,671 candidates with Natural Language Processing technologies. [Results] In total, 13,945 research literatures of 7 datasets were selected for analysis. Unlike other topics, research passion on epidemic may reach its peak at the first year the outbreak happens. The document type distribution of SARS, MERS and COVID-19 are nearly the same (less than 6 point difference for each type), however, there were notable growth in the research qualities during these three epidemics (3.68, 6.63 and 11.35 for Field-Weighted Citation Impact scores). Asian countries has less international collaboration (less than 35.1\%) than the Occident (more than 49.5\%), which should be noticed as same as research itself. [Conclusions] We found that research passion on epidemics may always reach its peak at the first year after outburst, however, the peak of research on MERS appeared at the third year because of its outburst of reproduction in 2015. For the research quality, although we did better in research qualities than before especially on COVID-19, research on epidemics not started from our own country should not be looked down. Another important effective strategy for enhancing epidemic prevention for China and other Asian countries is to continue strengthening international collaboration.


2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


2020 ◽  
Vol 10 (18) ◽  
pp. 6429
Author(s):  
SungMin Yang ◽  
SoYeop Yoo ◽  
OkRan Jeong

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.


Sign in / Sign up

Export Citation Format

Share Document