Information Retrieval in Biomedicine
Latest Publications


TOTAL DOCUMENTS

20
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781605662749, 9781605662756

Author(s):  
Christophe Jouis ◽  
Magali Roux-Rouquié ◽  
Jean-Gabriel Ganascia

Identical molecules could play different roles depending of the relations they may have with different partners embedded in different processes, at different time and/or localization. To address such intricate networks that account for the complexity of living systems, systems biology is an emerging field that aims at understanding such dynamic interactions from the knowledge of their components and the relations between these components. Among main issues in system biology, knowledge on entities spatial relations is of importance to assess the topology of biological networks. In this perspective, mining data and texts could afford specific clues. To address this issue we examine the use of contextual exploration method to develop extraction rules that can retrieve information on relations between biological entities in scientific literature. We propose the system Seekbio that could be plugged at Pubmed output as an interface between results of PubMed query and articles selection following spatial relationships requests.


Author(s):  
Asanee Kawtrakul ◽  
Chaveevarn Pechsiri ◽  
Sachit Rajbhandari ◽  
Frederic Andres

Valuable knowledge has been distributed in heterogeneous formats on many different Web sites and other sources over the Internet. However, finding the needed information is a complex task since there is a lack of semantic relations and organization between them. This chapter presents a problem-solving map framework for extracting and integrating knowledge from unstructured documents on the Internet by exploiting the semantic links between problems, methods for solving them and the people who could solve them. This challenging area of research needs both complex natural language processing, including deep semantic relation interpretation, and the participation of end-users for annotating the answers scattered on the Web. The framework is evaluated by generating problem solving maps for rice and human diseases.


Author(s):  
Burr Settles

ABNER (A Biomedical Named Entity Recognizer) is an open-source software tool for text mining in the molecular biology literature. It processes unstructured biomedical documents in order to discover and annotate mentions of genes, proteins, cell types, and other entities of interest. This task, known as named entity recognition (NER), is an important first step for many larger information management goals in biomedicine, namely extraction of biochemical relationships, document classification, information retrieval, and the like. To accomplish this task, ABNER uses state-of-the-art machine learning models for sequence labeling called conditional random fields (CRFs). The software distribution comes bundled with two models that are pre-trained on standard evaluation corpora. ABNER can run as a stand-alone application with a graphical user interface, or be accessed as a Java API allowing it to be re-trained with new labeled corpora and incorporated into other, higher-level applications. This chapter describes the software and its features, presents an overview of the underlying technology, and provides a discussion of some of the more advanced natural language processing systems for which ABNER has been used as a component. ABNER is open-source and freely available from http://pages. cs.wisc.edu/~bsettles/abner/


Author(s):  
Jon Patrick ◽  
Pooyan Asgari

There have been few studies of large corpora of narrative notes collected from the health clinicians working at the point of care. This chapter describes the principle issues in analysing a corpus of 44 million words of clinical notes drawn from the Intensive Care Service of a Sydney hospital. The study identifies many of the processing difficulties in dealing with written materials that have a high degree of informality, written in circumstances where the authors are under significant time pressures, and containing a large technical lexicon, in contrast to formally published material. Recommendations on the processing tasks needed to turn such materials into a more usable form are provided. The chapter argues that these problems require a return to issues of 30 years ago that have been mostly solved for computational linguists but need to be revisited for this entirely new genre of materials. In returning to the past and studying the contents of these materials in retrospective studies we can plan to go forward to a future that provides technologies that better support clinicians. They need to produce both lexically and grammatically higher quality texts that can then be leveraged successfully for advanced translational research thereby bolstering its momentum.


Author(s):  
Francisco M. Couto ◽  
Mário J. Silva ◽  
Vivian Lee ◽  
Emily Dimmer ◽  
Evelyn Camon ◽  
...  

Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusions from it. On the other hand, research in biomedical information retrieval and information extraction are nowadays delivering Text Mining solutions that can support curators to improve the efficiency of their work to deliver better data resources. Over the past decades, automatic text processing systems have successfully exploited biomedical scientific literature to reduce the researchers’ efforts to keep up to date, but many of these systems still rely on domain knowledge that is integrated manually leading to unnecessary overheads and restrictions in its use. A more efficient approach would acquire the domain knowledge automatically from publicly available biological sources, such as BioOntologies, rather than using manually inserted domain knowledge. An example of this approach is GOAnnotator, a tool that assists the verification of uncurated protein annotations. It provided correct evidence text at 93% precision to the curators and thus achieved promising results. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.


Author(s):  
Laura I. Furlong ◽  
Ferran Sanz

SNPs constitute key elements in genetic epidemiology and pharmacogenomics. While data about genetic variation is found at sequence databases, functional and phenotypic information on consequences of the variations resides in literature. Literature mining is mainly hampered by the terminology problem. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. We have reported the development of OSIRIS, aimed at retrieving literature about allelic variants of genes, a system that evolved towards a new version incorporating a new entity recognition module. The new version is based on a terminology of variations and a pattern-based search algorithm for the identification of variation terms and their disambiguation to dbSNP identifiers. OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and is suitable for collecting current knowledge on gene sequence variations for supporting the functional annotation of variation databases.


Author(s):  
Dimosthenis Kyriazis ◽  
Anastasios Doulamis ◽  
Theodora Varvarigou

In this chapter, a non-linear relevance feedback mechanism is proposed for increasing the performance and the reliability of information (medical content) retrieval systems. In greater detail, the user who searches for information is considered to be part of the retrieval process in an interactive framework, who evaluates the results provided by the system so that the user automatically updates its performance based on the users’ feedback. In order to achieve the latter, we propose an adaptively trained neural network (NN) architecture that is able to implement the non- linear feedback. The term “adaptively” refers to the functionality of the neural network to update its weights based on the user’s content selection and optimize its performance.


Author(s):  
Yves Kodratoff ◽  
Jérôme Azé ◽  
Lise Fontaine

This chapter argues that in order to extract significant knowledge from masses of technical texts, it is necessary to provide the field specialists with programming tools with which they themselves may use to program their text analysis tools. These programming tools, besides helping the programming effort of the field specialists, must also help them to gather the field knowledge necessary for defining and retrieving what they define as significant knowledge. This necessary field knowledge must be included in a well-structured and easy to use part of the programming tool. In this chapter, we present CorTag, a programming tool which is designed to correct existing tags in a text and to assist the field specialist to retrieve the knowledge and/or information he or she is looking for.


Author(s):  
Nadine Lucas

This chapter presents the challenge of integrating knowledge at higher levels of discourse than the sentence, to avoid “missing the forest for the trees”. Characterisation tasks aimed at filtering collections are introduced, showing use of the whole set of layout constituents from sentence to text body. Few text descriptors encapsulating knowledge on text properties are used for each granularity level. Text processing differs according to tasks, whether individual document mining or tagging small or large collections prior to information extraction. Very shallow and domain independent techniques are used to tag collections to save costs on sentence parsing and semantic manual annotation. This approach achieves satisfactory characterisation of text types, for example reviews versus clinical reports, or argumentation-type articles versus explanation-type. These collection filtering techniques are fit for a wider domain of biomedical literature than genomics.


Author(s):  
Yitao Zhang ◽  
Jon Patrick

The fast growing content of online articles of clinical case studies provides a useful source for extracting domain-specific knowledge for improving healthcare systems. However, current studies are more focused on the abstract of a published case study which contains little information about the detailed case profiles of a patient, such as symptoms and signs, and important laboratory test results of the patient from the diagnostic and treatment procedures. This paper proposes a novel category set to cover a wide variety of semantics in the description of clinical case studies which distinguishes each unique patient case. A manually annotated corpus consisting of over 5000 sentences from 75 journal articles of clinical case studies has been created. A sentence classification system which identifies 13 classes of clinically relevant content has been developed. A golden standard for assessing the automatic classifications has been established by manual annotation. A maximum entropy (MaxEnt) classifier is shown to produce better results than a Support Vector Machine (SVM) classifier on the corpus.


Sign in / Sign up

Export Citation Format

Share Document