Making and Using AI in the Library: Creating a BERT Model at the National Library of Sweden

Mapping Intimacies ◽

10.31235/osf.io/k9duq ◽

2022 ◽

Author(s):

Chris Haffenden ◽

Elena Fano ◽

Martin Malmsten ◽

Love Börjeson

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Language Model ◽

Library Science ◽

Automated Classification ◽

National Library ◽

Digital Collections ◽

Processing Technologies ◽

National Libraries ◽

Combining Methods

How can novel AI techniques be made and put to use in the library? Combining methods from data and library science, this article focuses on Natural Language Processing technologies in especially national libraries. It explains how the National Library of Sweden’s collections enabled the development of a new BERT language model for Swedish. It also outlines specific use cases for the model in the context of academic libraries, detailing strategies for how such a model could make digital collections available for new forms of research: from automated classification to enhanced searchability and improved OCR cohesion. Highlighting the potential for cross-fertilizing AI with libraries, the conclusion suggests that while AI may transform the workings of the library, libraries can also have a key role to play in the future development of AI.

Download Full-text

Extracting Information from Archaeological Texts

Open Archaeology ◽

10.1515/opar-2015-0004 ◽

2015 ◽

Vol 1 (1) ◽

Cited By ~ 6

Author(s):

Keith W. Kintigh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automated Classification ◽

Automated Extraction ◽

Processing Technologies ◽

Automated Translation ◽

Extract Information

AbstractTo address archaeology’s most pressing substantive challenges, researchers must discover, access, and extract information contained in the reports and articles that codify so much of archaeology’s knowledge. These efforts will require application of existing and emerging natural language processing technologies to extensive digital corpora. Automated classification can enable development of metadata needed for the discovery of relevant documents. Although it is even more technically challenging, automated extraction of and reasoning with information from texts can provide urgently needed access to contextualized information within documents. Effective automated translation is needed for scholars to benefit from research published in other languages.

Download Full-text

EMOSIS Sentiment Analysis on Tweets with Emotion and Intensity Level Recognition Considering Ending Punctuation Marks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4518.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10289-10293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Language Processing ◽

Significant Role ◽

Language Model ◽

Intensity Level ◽

Processing Stage ◽

Overall Performance

Sentiment Analysis is a tool used for determining the Polarity or Emotion of a Sentence. It is a field of Natural Language Processing which focuses on the study of opinions. In this study, the researchers solved one key challenge in Sentiment Analysis, which is to consider the Ending Punctuation Marks present in a sentence. Ending punctuation marks plays a significant role in Emotion Recognition and Intensity Level Recognition. The research made used of tweets expressing opinions about Philippine President Rodrigo Duterte. These downloaded tweets served as the inputs. It was initially subjected to pre-processing stage to be able to prepare the sentences for processing. A Language Model was created to serve as the classifier for determining the scores of the tweets. The scores give the polarity of the sentence. Accuracy is very important in sentiment analysis. To increase the chance of correctly identifying the polarity of the tweets, the input undergone Intensity Level Recognition which determines the intensifiers and negations within the sentences. The system was evaluated with overall performance of 80.27%.

Download Full-text

Automated Classification of Computer-Based Medical Device Recalls: An Application of Natural Language Processing and Statistical Learning

2014 IEEE 27th International Symposium on Computer-Based Medical Systems ◽

10.1109/cbms.2014.134 ◽

2014 ◽

Author(s):

Homa Alemzadeh ◽

Raymond Hoagland ◽

Zbigniew Kalbarczyk ◽

Ravishankar K. Iyer

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Medical Device ◽

Statistical Learning ◽

Language Processing ◽

Automated Classification ◽

Computer Based

Download Full-text

An Empirical Study of Writing Feedback Analysis of Non-English Majors in China with Natural Language Processing Technologies

International Journal of e-Education e-Business e-Management and e-Learning ◽

10.17706/ijeeee.2015.5.2.85-93 ◽

2015 ◽

Vol 5 (2) ◽

pp. 85-93 ◽

Cited By ~ 1

Author(s):

Ming Liu ◽

Weiwei Xu ◽

Qiuxia Ran

Keyword(s):

Natural Language Processing ◽

Empirical Study ◽

Natural Language ◽

Language Processing ◽

Processing Technologies ◽

Feedback Analysis ◽

English Majors ◽

Writing Feedback

Download Full-text

From SARS to COVID-19: A Bibliometric study on Emerging Infectious Diseases with Natural Language Processing technologies

10.21203/rs.3.rs-25354/v1 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yinjun Hu ◽

Mengmeng Chen ◽

Qian Wang ◽

Yue Zhu ◽

Bei Wang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

International Collaboration ◽

Citation Impact ◽

Policy Decision ◽

Research Quality ◽

First Year ◽

Asian Countries ◽

Processing Technologies

Abstract [Background] On January 7, 2020, the novel coronavirus named "COVID-19" aroused worldwide concern was identified by Chinese scientists. Many related research works were developed for the emerging, rapidly evolving situation of this epidemic. This study aimed to analyze the research literatures on SARS, MERS and COVID-19 to retrieve important information for virologists, epidemiologist and policy decision makers. [Methods] In this study, we collected data from multi data sources and compared bibliometrics indices among COVID-19, Severe Acute Respiratory Syndrome (SARS), and Middle East Respiratory Syndrome (MERS) up to March 25, 2020. In purpose to extract data in corresponding quantity and scale, the volume of search results will be balance with the limitation of publication years. For further analysis, we extracted 1,480 documents from 1,671 candidates with Natural Language Processing technologies. [Results] In total, 13,945 research literatures of 7 datasets were selected for analysis. Unlike other topics, research passion on epidemic may reach its peak at the first year the outbreak happens. The document type distribution of SARS, MERS and COVID-19 are nearly the same (less than 6 point difference for each type), however, there were notable growth in the research qualities during these three epidemics (3.68, 6.63 and 11.35 for Field-Weighted Citation Impact scores). Asian countries has less international collaboration (less than 35.1\%) than the Occident (more than 49.5\%), which should be noticed as same as research itself. [Conclusions] We found that research passion on epidemics may always reach its peak at the first year after outburst, however, the peak of research on MERS appeared at the third year because of its outburst of reproduction in 2015. For the research quality, although we did better in research qualities than before especially on COVID-19, research on epidemics not started from our own country should not be looked down. Another important effective strategy for enhancing epidemic prevention for China and other Asian countries is to continue strengthening international collaboration.

Download Full-text

MorphoBr: an open source large-coverage full-form lexicon for morphological analysis of Portuguese

Texto Livre Linguagem e Tecnologia ◽

10.17851/1983-3652.11.3.1-25 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1-25

Author(s):

Leonel Figueiredo de Alencar ◽

Bruno Cuconato ◽

Alexandre Rademaker

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Computational Linguistics ◽

Language Processing ◽

Morphological Analysis ◽

Computational Techniques ◽

Processing Technologies ◽

Finite State ◽

Full Form

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.

Download Full-text

DeNERT-KG: Named Entity and Relation Extraction Model Using DQN, Knowledge Graph, and BERT

Applied Sciences ◽

10.3390/app10186429 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6429

Author(s):

SungMin Yang ◽

SoYeop Yoo ◽

OkRan Jeong

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Knowledge Graph ◽

Named Entity ◽

Artificial Intelligence Technology

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.

Download Full-text

Automated classification of NASA anomalies using natural language processing techniques

2013 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) ◽

10.1109/issrew.2013.6688849 ◽

2013 ◽

Cited By ~ 1

Author(s):

Davide Falessi ◽

Lucas Layman

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automated Classification ◽

Processing Techniques

Download Full-text

Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2013.06.004 ◽

2013 ◽

Vol 46 (5) ◽

pp. 765-773 ◽

Cited By ~ 49

Author(s):

Carol Friedman ◽

Thomas C. Rindflesch ◽

Milton Corn

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Significant Progress ◽

National Library

Download Full-text

Natural language processing technologies in artificial intelligence: The science and industry perspective

Microelectronics Reliability ◽

10.1016/0026-2714(90)90423-k ◽

1990 ◽

Vol 30 (3) ◽

pp. 610-611

Author(s):

Florin Popentiu

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Technologies

Download Full-text