scholarly journals Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT

2014 ◽  
Vol 22 (3) ◽  
pp. 640-648 ◽  
Author(s):  
Jonathan M Mortensen ◽  
Evan P Minty ◽  
Michael Januszyk ◽  
Timothy E Sweeney ◽  
Alan L Rector ◽  
...  

Abstract Objectives The verification of biomedical ontologies is an arduous process that typically involves peer review by subject-matter experts. This work evaluated the ability of crowdsourcing methods to detect errors in SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) and to address the challenges of scalable ontology verification. Methods We developed a methodology to crowdsource ontology verification that uses micro-tasking combined with a Bayesian classifier. We then conducted a prospective study in which both the crowd and domain experts verified a subset of SNOMED CT comprising 200 taxonomic relationships. Results The crowd identified errors as well as any single expert at about one-quarter of the cost. The inter-rater agreement (κ) between the crowd and the experts was 0.58; the inter-rater agreement between experts themselves was 0.59, suggesting that the crowd is nearly indistinguishable from any one expert. Furthermore, the crowd identified 39 previously undiscovered, critical errors in SNOMED CT (eg, ‘septic shock is a soft-tissue infection’). Discussion The results show that the crowd can indeed identify errors in SNOMED CT that experts also find, and the results suggest that our method will likely perform well on similar ontologies. The crowd may be particularly useful in situations where an expert is unavailable, budget is limited, or an ontology is too large for manual error checking. Finally, our results suggest that the online anonymous crowd could successfully complete other domain-specific tasks. Conclusions We have demonstrated that the crowd can address the challenges of scalable ontology verification, completing not only intuitive, common-sense tasks, but also expert-level, knowledge-intensive tasks.

Author(s):  
Nur Zareen Zulkarnain ◽  
Farid Meziane

There is an abundance of existing biomedical ontologies such as the National Cancer Institute Thesaurus and the Systematized Nomenclature of Medicine-Clinical Terms. Implementing these ontologies in a particular system however, may cause unnecessary high usage of memory and slows down the systems' performance. On the other hand, building a new ontology from scratch will require additional time and efforts. Therefore, this research explores the ontology reuse approach in order to develop an Abdominal Ultrasound Ontology by extracting concepts from existing biomedical ontologies. This article presents the reader with a step by step method in reusing ontologies together with suggestions of the off-the-shelf tools that can be used to ease the process. The results show that ontology reuse is beneficial especially in the biomedical field as it allows for developers from the non-technical background to build and use domain specific ontology with ease. It also allows for developers with technical background to develop ontologies with minimal involvements from domain experts.


2020 ◽  
Vol 20 (S10) ◽  
Author(s):  
Ankur Agrawal ◽  
Licong Cui

AbstractBiological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pilar López-Úbeda ◽  
Alexandra Pomares-Quimbaya ◽  
Manuel Carlos Díaz-Galiano ◽  
Stefan Schulz

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.


2021 ◽  
Vol 11 (12) ◽  
pp. 5476
Author(s):  
Ana Pajić Simović ◽  
Slađan Babarogić ◽  
Ognjen Pantelić ◽  
Stefan Krstović

Enterprise resource planning (ERP) systems are often seen as viable sources of data for process mining analysis. To perform most of the existing process mining techniques, it is necessary to obtain a valid event log that is fully compliant with the eXtensible Event Stream (XES) standard. In ERP systems, such event logs are not available as the concept of business activity is missing. Extracting event data from an ERP database is not a trivial task and requires in-depth knowledge of the business processes and underlying data structure. Therefore, domain experts require proper techniques and tools for extracting event data from ERP databases. In this paper, we present the full specification of a domain-specific modeling language for facilitating the extraction of appropriate event data from transactional databases by domain experts. The modeling language has been developed to support complex ambiguous cases when using ERP systems. We demonstrate its applicability using a case study with real data and show that the language includes constructs that enable a domain expert to easily model data of interest in the log extraction step. The language provides sufficient information to extract and transform data from transactional ERP databases to the XES format.


2017 ◽  
Vol 41 (S1) ◽  
pp. s834-s834 ◽  
Author(s):  
S. Khouadja ◽  
R. Ben Soussia ◽  
S. Younes ◽  
A. Bouallagui ◽  
I. Marrag ◽  
...  

IntroductionTreatment resistance to clozapine is estimated at 40–70% of the treated population. Several clozapine potentiation strategies have come into clinical practice although often without evidence-based support.ObjectiveThe aim of our work was to identify the potentiation strategies in ultra-resistant schizophrenia depending on the subtype of schizophrenia.MethodologyThis is a prospective study conducted on patients with the diagnosis of schizophrenia, based on DSM-IV-TR criteria, and hospitalized in the psychiatric department of the university hospital in Mahdia, Tunisia. The study sample consisted of patients meeting the resistant schizophrenia criteria as defined by national institute for clinical excellence (NICE), and the prescription of clozapine for 6 to 8 weeks was shown without significant improvement.Resultswe have collected 10 patients. The mean serum level of clozapine was 462.25 mg/L. The potentiation strategies were different depending on the subtype of schizophrenia. For the undifferentiated schizophrenia, we have chosen ECT sessions. For the disorganized schizophrenia, we opted for amisulpiride and aripiprazole. For the paranoid forms, we have chosen the association of risperidone and ECT. A psychometric improvement was noted in BPRS ranging from 34 to 40%.ConclusionEvery potentiation strategy entails a cost, whether it is an additional monetary cost, adverse effects or greater stress to caregivers. The cost/benefit equation should be thoroughly evaluated and discussed before commencing a strategy.Disclosure of interestThe authors have not supplied their declaration of competing interest.


2018 ◽  
Author(s):  
Andre Lamurias ◽  
Luka A. Clarke ◽  
Francisco M. Couto

AbstractRecent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are continuously being updated. Biomedical ontologies are nowadays a mainstream approach to formalize existing knowledge about entities, such as genes, chemicals, phenotypes, and disorders. These resources contain supplementary information that may not be yet encoded in training data, particularly in domains with limited labeled data.We propose a new model, BO-LSTM, that takes advantage of domain-specific ontologies, by representing each entity as the sequence of its ancestors in the ontology. We implemented BO-LSTM as a recurrent neural network with long short-term memory units and using an open biomedical ontology, which in our case-study was Chemical Entities of Biological Interest (ChEBI). We assessed the performance of BO-LSTM on detecting and classifying drug-drug interactions in a publicly available corpus from an international challenge, composed of 792 drug descriptions and 233 scientific abstracts. By using the domain-specific ontology in addition to word embeddings and WordNet, BO-LSTM improved both the F1-score of the detection and classification of drug-drug interactions, particularly in a document set with a limited number of annotations. Our findings demonstrate that besides the high performance of current deep learning techniques, domain-specific ontologies can still be useful to mitigate the lack of labeled data.Author summaryA high quantity of biomedical information is only available in documents such as scientific articles and patents. Due to the rate at which new documents are produced, we need automatic methods to extract useful information from them. Text mining is a subfield of information retrieval which aims at extracting relevant information from text. Scientific literature is a challenge to text mining because of the complexity and specificity of the topics approached. In recent years, deep learning has obtained promising results in various text mining tasks by exploring large datasets. On the other hand, ontologies provide a detailed and sound representation of a domain and have been developed to diverse biomedical domains. We propose a model that combines deep learning algorithms with biomedical ontologies to identify relations between concepts in text. We demonstrate the potential of this model to extract drug-drug interactions from abstracts and drug descriptions. This model can be applied to other biomedical domains using an annotated corpus of documents and an ontology related to that domain to train a new classifier.


2019 ◽  
Vol 56 (2) ◽  
pp. 440-443
Author(s):  
Mircea Dorin Vasilescu

The aim of the work is conduct to highlight how the technological parameters has influence of 3D printed DLP on the generation of wheel, made from resin type material. In the first part of the paper is presents how to generate in terms of dimensional aspects specific design cylindrical gears, conical and worm gear. Generating elements intended to reduce the cost of manufacturing of these elements. Also are achieve the specific components of this work are put to test with a laboratory test stand which is presented in the paper in the third part of the paper. The tested gears generated by 3D-printed technique made with 3D printed with FDM or DLP technique. After the constructive aspects, proceed to the identification of conserved quantities, which have an impact both in terms of mechanical strength, but his cinematic, in order to achieve a product with kinematic features and good functional domain specific had in mind. The next part is carried out an analysis of the layers are generated using the DLP and FDM method using an optical microscope with magnification up to 500 times, specially adapted in order to achieve both visualization and measurement of specific elements. In the end part, it will highlight the main issues and the specific recommendations made to obtain such constructive mechanical elements.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8010
Author(s):  
Ismail Butun ◽  
Yusuf Tuncel ◽  
Kasim Oztoprak

This paper investigates and proposes a solution for Protocol Independent Switch Architecture (PISA) to process application layer data, enabling the inspection of application content. PISA is a novel approach in networking where the switch does not run any embedded binary code but rather an interpreted code written in a domain-specific language. The main motivation behind this approach is that telecommunication operators do not want to be locked in by a vendor for any type of networking equipment, develop their own networking code in a hardware environment that is not governed by a single equipment manufacturer. This approach also eases the modeling of equipment in a simulation environment as all of the components of a hardware switch run the same compatible code in a software modeled switch. The novel techniques in this paper exploit the main functions of a programmable switch and combine the streaming data processor to create the desired effect from a telecommunication operator perspective to lower the costs and govern the network in a comprehensive manner. The results indicate that the proposed solution using PISA switches enables application visibility in an outstanding performance. This ability helps the operators to remove a fundamental gap between flexibility and scalability by making the best use of limited compute resources in application identification and the response to them. The experimental study indicates that, without any optimization, the proposed solution increases the performance of application identification systems 5.5 to 47.0 times. This study promises that DPI, NGFW (Next-Generation Firewall), and such application layer systems which have quite high costs per unit traffic volume and could not scale to a Tbps level, can be combined with PISA to overcome the cost and scalability issues.


2016 ◽  
Vol 50 (2) ◽  
pp. 302-308 ◽  
Author(s):  
Maynara Fernanda Carvalho Barreto ◽  
Mara Solange Gomes Dellaroza ◽  
Gilselena Kerbauy ◽  
Cintia Magalhães Carvalho Grion

Abstract OBJECTIVE To estimate the cost of hospitalization of patients with severe sepsis or septic shock admitted or diagnosed in the Urgent and Emergency sector at a university hospital and followed until the clinical outcome. METHOD An epidemiological, prospective, observational study conducted in a public hospital in southern Brazil for the period of one year (August 2013 to August 2014). Sepsis notification forms, medical records and data of the cost sector were used for the collection of clinical and epidemiological data. RESULTS The sample comprised 95 patients, resulting in a total high cost of hospitalization (R$ 3,692,421.00), and an average of R$ 38,867.60 per patient. Over half of the total value of the treatment of sepsis (R$ 2,215,773.50) was assigned to patients who progressed to death (59.0%). The higher costs were related to discharge, diagnosis of severe sepsis, the pulmonary focus of infection and the age group of up to 59 years. CONCLUSION The high cost of the treatment of sepsis justifies investments in training actions and institution of protocols that can direct preventive actions, and optimize diagnosis and treatment in infected and septic patients.


2015 ◽  
Vol 22 (3) ◽  
pp. 649-658 ◽  
Author(s):  
Kin Wah Fung ◽  
Julia Xu

Abstract Objective Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is the emergent international health terminology standard for encoding clinical information in electronic health records. The CORE Problem List Subset was created to facilitate the terminology’s implementation. This study evaluates the CORE Subset’s coverage and examines its growth pattern as source datasets are being incorporated. Methods Coverage of frequently used terms and the corresponding usage of the covered terms were assessed by “leave-one-out” analysis of the eight datasets constituting the current CORE Subset. The growth pattern was studied using a retrospective experiment, growing the Subset one dataset at a time and examining the relationship between the size of the starting subset and the coverage of frequently used terms in the incoming dataset. Linear regression was used to model that relationship. Results On average, the CORE Subset covered 80.3% of the frequently used terms of the left-out dataset, and the covered terms accounted for 83.7% of term usage. There was a significant positive correlation between the CORE Subset’s size and the coverage of the frequently used terms in an incoming dataset. This implies that the CORE Subset will grow at a progressively slower pace as it gets bigger. Conclusion The CORE Problem List Subset is a useful resource for the implementation of Systematized Nomenclature of Medicine Clinical Terms in electronic health records. It offers good coverage of frequently used terms, which account for a high proportion of term usage. If future datasets are incorporated into the CORE Subset, it is likely that its size will remain small and manageable.


Sign in / Sign up

Export Citation Format

Share Document