Automated Mapping of Environmental Higher Education Ranking Systems Indicators to SDGs Indicators using Natural Language Processing and Document Similarity

Author(s):  
Anwaar Buzaboon ◽  
Hanan Alboflasa ◽  
Waheeb Alnaser ◽  
Safwan Shatnawi ◽  
Khawla Albinali
Author(s):  
Soumya Raychaudhuri

Using algorithms to analyze natural language text is a challenging task. Recent advances in algorithms, and increased availability of computational power and online text has resulted in incremental progress in text analysis (Rosenfeld 2000). For certain specific applications natural language processing algorithms can rival human performance. Even the simplest algorithms and approaches can glean information from the text and do it at a rate much faster than humans. In the case of functional genomics, where an individual assay might include thousands of genes, and tens of thousands of documents pertinent to those genes, the speed of text mining approaches offers a great advantage to investigators trying to understand the data. In this chapter, we will focus on techniques to convert text into simple numerical vectors to facilitate computation. Then we will go on to discuss how these vectors can be combined into textual profiles for genes; these profiles offer additional biologically meaningful information that can complement available genomics data sets. The previous chapter introduced methods to analyze gene expression data and sequence data. The focus of many analytical methods was comparing and grouping genes by similarity. Some sequence analysis methods like dynamic programming and BLAST offer opportunities to compare two sequences, while multiple sequence alignment and weight matrices provide a means to compare families of sequences. Similarly, gene expression array analysis approaches are mostly contingent on distance metrics that compare gene expression profiles to each other; clustering and classification algorithms provide a means to group similar genes. The primary goal of applying these methods was to transfer knowledge between similar genes. We can think of the scientific literature as yet another data type and define document similarity metrics. Algorithms that tap the knowledge locked in the scientific literature require sophisticated natural language processing approaches. On the other hand, assessing document similarity is a comparatively easier task. A measure of document similarity that corresponds to semantic similarity between documents can also be powerful. For example, we might conclude that two genes are related if documents that refer to them are semantically similar.


2020 ◽  
Vol 27 (10) ◽  
pp. 1576-1584 ◽  
Author(s):  
Long Chen ◽  
Wenbo Fu ◽  
Yu Gu ◽  
Zhiyong Sun ◽  
Haodan Li ◽  
...  

Abstract Objective Normalizing clinical mentions to concepts in standardized medical terminologies, in general, is challenging due to the complexity and variety of the terms in narrative medical records. In this article, we introduce our work on a clinical natural language processing (NLP) system to automatically normalize clinical mentions to concept unique identifier in the Unified Medical Language System. This work was part of the 2019 n2c2 (National NLP Clinical Challenges) Shared-Task and Workshop on Clinical Concept Normalization. Materials and Methods We developed a hybrid clinical NLP system that combines a generic multilevel matching framework, customizable matching components, and machine learning ranking systems. We explored 2 machine leaning ranking systems based on either ensemble of various similarity features extracted from pretrained encoders or a Siamese attention network, targeting at efficient and fast semantic searching/ranking. Besides, we also evaluated the performance of a general-purpose clinical NLP system based on Unstructured Information Management Architecture. Results The systems were evaluated as part of the 2019 n2c2 challenge, and our original best system in the challenge obtained an accuracy of 0.8101, ranked fifth in the challenge. The improved system with newly designed machine learning ranking based on Siamese attention network improved the accuracy to 0.8209. Conclusions We demonstrate the successful practice of combining multilevel matching and machine learning ranking for clinical concept normalization. Our results indicate the capability and interpretability of our proposed approach, as well as the limitation, suggesting the opportunities of achieving better performance by combining general clinical NLP systems.


2003 ◽  
Vol 17 (5) ◽  
Author(s):  
Paola Merlo ◽  
James Henderson ◽  
Gerold Schneider ◽  
Eric Wehrli

The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

Author(s):  
Pamela Rogalski ◽  
Eric Mikulin ◽  
Deborah Tihanyi

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.  


Sign in / Sign up

Export Citation Format

Share Document