scholarly journals Gold-standard ontology-based anatomical annotation in the CRAFT Corpus

Database ◽  
2017 ◽  
Vol 2017 ◽  
Author(s):  
Michael Bada ◽  
Nicole Vasilevsky ◽  
William A Baumgartner ◽  
Melissa Haendel ◽  
Lawrence E Hunter

Abstract Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml

2020 ◽  
Author(s):  
Olessia Jouravlev ◽  
Alexander J.E. Kell ◽  
Zachary Mineroff ◽  
A.J. Haskins ◽  
Dima Ayyash ◽  
...  

AbstractOne of the few replicated functional brain differences between individuals with autism spectrum disorders (ASD) and neurotypical (NT) controls is reduced language lateralization. However, most prior reports relied on comparisons of group-level activation maps or functional markers that had not been validated at the individual-subject level, and/or used tasks that do not isolate language processing from other cognitive processes, complicating interpretation. Furthermore, few prior studies have examined functional responses in other functional networks, as needed to determine the selectivity of the effect. Using fMRI, we compared language lateralization between 28 ASD participants and carefully pairwise-matched controls, with the language regions defined individually with a well-validated language localizer. ASD participants showed less lateralized responses due to stronger right hemisphere activations. Further, this effect did not stem from a ubiquitous reduction in lateralization across the brain: ASD participants did not differ from controls in the lateralization of two other large-scale networks—the Theory of Mind network and the Multiple Demand network. Finally, in an exploratory study, we tested whether reduced language lateralization may also be present in NT individuals with high autistic trait load. Indeed, autistic trait load in a large set of NT participants (n=189) was associated with less lateralized language activations. These results suggest that reduced language lateralization is a robust and spatially selective neural marker of autism, present in individuals with ASD, but also in NT individuals with higher genetic liability for ASD, in line with a continuum model of underlying genetic risk.


2020 ◽  
Vol 59 (06) ◽  
pp. 219-226
Author(s):  
Mindy K. Ross ◽  
Henry Zheng ◽  
Bing Zhu ◽  
Ailina Lao ◽  
Hyejin Hong ◽  
...  

Abstract Objectives Asthma is a heterogenous condition with significant diagnostic complexity, including variations in symptoms and temporal criteria. The disease can be difficult for clinicians to diagnose accurately. Properly identifying asthma patients from the electronic health record is consequently challenging as current algorithms (computable phenotypes) rely on diagnostic codes (e.g., International Classification of Disease, ICD) in addition to other criteria (e.g., inhaler medications)—but presume an accurate diagnosis. As such, there is no universally accepted or rigorously tested computable phenotype for asthma. Methods We compared two established asthma computable phenotypes: the Chicago Area Patient-Outcomes Research Network (CAPriCORN) and Phenotype KnowledgeBase (PheKB). We established a large-scale, consensus gold standard (n = 1,365) from the University of California, Los Angeles Health System's clinical data warehouse for patients 5 to 17 years old. Results were manually reviewed and predictive performance (positive predictive value [PPV], sensitivity/specificity, F1-score) determined. We then examined the classification errors to gain insight for future algorithm optimizations. Results As applied to our final cohort of 1,365 expert-defined gold standard patients, the CAPriCORN algorithms performed with a balanced PPV = 95.8% (95% CI: 94.4–97.2%), sensitivity = 85.7% (95% CI: 83.9–87.5%), and harmonized F1 = 90.4% (95% CI: 89.2–91.7%). The PheKB algorithm was performed with a balanced PPV = 83.1% (95% CI: 80.5–85.7%), sensitivity = 69.4% (95% CI: 66.3–72.5%), and F1 = 75.4% (95% CI: 73.1–77.8%). Four categories of errors were identified related to method limitations, disease definition, human error, and design implementation. Conclusion The performance of the CAPriCORN and PheKB algorithms was lower than previously reported as applied to pediatric data (PPV = 97.7 and 96%, respectively). There is room to improve the performance of current methods, including targeted use of natural language processing and clinical feature engineering.


2021 ◽  
Author(s):  
Mayla R Boguslav ◽  
Nourah M Salem ◽  
Elizabeth K White ◽  
Sonia M Leach ◽  
Lawrence E Hunter

Motivation: Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize ignorance statements, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals. Results: We present a novel ignorance taxonomy driven by the role ignorance statements play in the research, identifying specific goals for future scientific knowledge. Using this taxonomy and reliable annotation guidelines (inter-annotator agreement above 80%), we created a gold standard ignorance corpus of 60 full-text documents from the prenatal nutrition literature with over 10,000 annotations and used it to train classifiers that achieved over 0.80 F1 scores. Availability: Corpus and source code freely available for download at https://github.com/UCDenver-ccp/Ignorance-Question-Work. The source code is implemented in Python.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
I Diemberger ◽  
C Martignani ◽  
G Massaro ◽  
S Lorenzetti ◽  
J De Bie

Abstract Introduction Atrial fibrillation and flutter (AF/AFl) are the more common sustained arrhythmias in the elderly. ESC guidelines underline the need for large-scale screening strategies, especially to improve primary prevention of thromboembolic complications. However, the current gold-standard for identification of AF/AFl is a 12-lead ECG reviewed by an appropriately trained physician. The availability of automatic discrimination between AF/AFl and sinus rhythm (SR) by automatic- diagnostic computer programs (ACP) implemented in current 12-lead ECG recorders is a possible solution to improve this process. Aim To assess the reliability and agreement of the main world-wide available ACPs implemented in current 12-lead ECG recorders in discriminating between AF/AFl vs. SR in a large dataset of real-world ECGs. Methods We assessed seven ECG interpretation programs from seven different manufacturers (GE 12SL, Glasgow, MEANS, Midmark, Mortara VERITAS, Philips DXL and Schiller). We created a large set of representative ECGs converted from previously recorded digital ECGs acquired with equipment that complied with the requirements of International Electrotechnical Commission standard IEC 60601–2-51:2003 and were representative of those in hospital settings. We excluded ECGs from pacemaker carriers. We used a specific device for playing back ECGs to 12-lead ECG recorders implementing the seven programs. Each statement from automatic diagnosis provided by each device was recorded and combined appropriately for the purpose of this analysis: identification of AF/AFl vs. SR. Gold standard was built by independent re-assessment by three different reviewers. Results We collected 2064 10s 12-lead ECGs with SR (1882) or AF/AFl (182) that were analyzed by seven different ACP. ECG's with other arrhythmias were excluded for this analysis (to increase transferability of the results). All seven programs agreed on SR in 1645 (87.4%) and AF/AFl in 139 cases (76.4%) (Figure 1, panel A). In 280 cases (13.6%), at least one program did not agree with the others. After revision by cardiologists 237 were found to be SR and 43 AF/AFl. Sensitivity for AF/AFl ranged between 90%-97% and false positive diagnosis ranged between 3.4% and 0.4%. Notably, the chance of obtaining at least a wrong diagnosis from one device was 280/2064 (13.6%), with a number of possible false AF/AFl greater than real prevalence of AF/AFl (Figure 1, panel B). Figure 1 Conclusions Despite a general good reliability of each single ACP for AF/AFl recognition the chance of between-device discordance is not negligible and the risk of false positive automatic diagnosis of AF/AFl should be considered when managing real-world patients especially when deciding to start oral anticoagulation.


2011 ◽  
Vol 37 (4) ◽  
pp. 753-809 ◽  
Author(s):  
David Vadas ◽  
James R. Curran

Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.


2014 ◽  
Vol 22 (1) ◽  
pp. 166-178 ◽  
Author(s):  
Yizhao Ni ◽  
Stephanie Kennebeck ◽  
Judith W Dexheimer ◽  
Constance M McAneney ◽  
Huaxiu Tang ◽  
...  

Abstract Objectives (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials. Data and methods We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial–patient matches and a reference standard of historical trial–patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed. Results Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial–patient matches. Discussion and conclusion By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials.


2018 ◽  
Author(s):  
Wasila Dahdul ◽  
Prashanti Manda ◽  
Hong Cui ◽  
James P. Balhoff ◽  
T. Alexander Dececchi ◽  
...  

AbstractNatural language descriptions of organismal phenotypes - a principal object of study in biology, are abundant in biological literature. Expressing these phenotypes as logical statements using formal ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make the semantics of phenotype descriptions amenable to machine reasoning by (a) recognizing appropriate on-tological terms for entities in text and (b) stringing these terms into logical statements. Most existing Natural Language Processing tools stop at entity recognition, leaving a need for tools that can assist with both aspects of the task. The recently described Semantic CharaParser aims to meet this need. We describe the first expert-curated Gold Standard corpus for ontology-based annotation of phenotypes from the systematics literature. We use it to evaluate Semantic CharaParser’s annotations and explore differences in performance between humans and machine. We use four annotation accuracy metrics that can account for both semantically identical and similar matches. We found that machine-human consistency was significantly lower than inter-curator (human–human) consistency. Surprisingly, allowing curators access to external information that was not available to Semantic CharaParser did not significantly increase the similarity of their annotations to the Gold Standard nor have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the Gold Standard increased after new ontology terms relevant to the input text had been added. Evaluation by the original authors of the character descriptions indicated that the Gold Standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design of software to augment human curators, and the Gold Standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.


2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


2016 ◽  
Author(s):  
Vrushali Shah ◽  
Sarang Shankar Bhola

Sign in / Sign up

Export Citation Format

Share Document