current annotation
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 3)

H-INDEX

5
(FIVE YEARS 0)

JAMIA Open ◽  
2021 ◽  
Author(s):  
Himanshu S Sahoo ◽  
Greg M Silverman ◽  
Nicholas E Ingraham ◽  
Monica I Lupei ◽  
Michael A Puskarich ◽  
...  

Abstract Objective With COVID-19 there was a need for rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from high resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods Performance, resource utilization and runtime of the rule-based gazetteer was compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP and MedTagger. Results This rule-based gazetteer was fastest, had low resource footprint and similar performance for weighted micro-average and macro-average measures of precision, recall and f1-score compared to other annotation systems. Discussion Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of health care settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of post-acute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime and similar weighted micro-average and macro-average measures for precision, recall and f1-score compared to industry standard annotation systems. Lay Summary With COVID-19 came an unprecedented need to identify symptoms of COVID-19 patients under investigation (PUIs) in a time sensitive, resource-efficient and accurate manner. While available annotation systems perform well for smaller healthcare settings, they fail to scale in larger healthcare systems where 10,000+ clinical notes are generated a day. This study covers 3 improvements addressing key limitations of current annotation systems. (1) High resource utilization and poor scalability of existing annotation systems. The presented rule-based gazetteer is a high-throughput annotation system for processing high volume of notes, thus, providing opportunity for clinicians to make more informed time-sensitive decisions around patient care. (2) Equally important is our developed rule-based gazetteer performs similar or better than current annotation systems for symptom identification. (3) Due to minimal resource needs of the rule-based gazetteer, it could be deployed at healthcare sites lacking a robust infrastructure where industry standard annotation systems cannot be deployed because of low resource availability.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Tianjia Liu ◽  
Muzi Li ◽  
Zhongchi Liu ◽  
Xiaoyan Ai ◽  
Yongping Li

AbstractCultivated strawberry (Fragaria × ananassa) is an important fruit crop species whose fruits are enjoyed by many worldwide. An octoploid of hybrid origin, the complex genome of this species was recently sequenced, serving as a key reference genome for cultivated strawberry and related species of the Rosaceae family. The current annotation of the F. ananassa genome mainly relies on ab initio predictions and, to a lesser extent, transcriptome data. Here, we present the structure and functional reannotation of the F. ananassa genome based on one PacBio full-length RNA library and ninety-two Illumina RNA-Seq libraries. This improved annotation of the F. ananassa genome, v1.0.a2, comprises a total of 108,447 gene models, with 97.85% complete BUSCOs. The models of 19,174 genes were modified, 360 new genes were identified, and 11,044 genes were found to have alternatively spliced isoforms. Additionally, we constructed a strawberry genome database (SGD) for strawberry gene homolog searching and annotation downloading. Finally, the transcriptome of the receptacles and achenes of F. ananassa at four developmental stages were reanalyzed and qualified, and the expression profiles of all the genes in this annotation are also provided. Together, this study provides an updated annotation of the F. ananassa genome, which will facilitate genomic analyses across the Rosaceae family and gene functional studies in cultivated strawberry.


Author(s):  
Thomas Stricker ◽  
Ron Bonner ◽  
Frédérique Lisacek ◽  
Gérard Hopfgartner

AbstractAnnotation and interpretation of full scan electrospray mass spectra of metabolites is complicated by the presence of a wide variety of ions. Not only protonated, deprotonated, and neutral loss ions but also sodium, potassium, and ammonium adducts as well as oligomers are frequently observed. This diversity challenges automatic annotation and is often poorly addressed by current annotation tools. In many cases, annotation is integrated in metabolomics workflows and is based on specific chromatographic peak-picking tools. We introduce mzAdan, a nonchromatography-based multipurpose standalone application that was developed for the annotation and exploration of convolved high-resolution ESI-MS spectra. The tool annotates single or multiple accurate mass spectra using a customizable adduct annotation list and outputs a list of [M+H]+ candidates. MzAdan was first tested with a collection of 408 analytes acquired with flow injection analysis. This resulted in 402 correct [M+H]+ identifications and, with combinations of sodium, ammonium, and potassium adducts and water and ammonia losses within a tolerance of 10 mmu, explained close to 50% of the total ion current. False positives were monitored with mass accuracy and bias as well as chromatographic behavior which led to the identification of adducts with calcium instead of the expected potassium. MzAdan was then integrated in a workflow with XCMS for the untargeted LC-MS data analysis of a 52 metabolite standard mix and a human urine sample. The results were benchmarked against three other annotation tools, CAMERA, findMAIN, and CliqueMS: findMAIN and mzAdan consistently produced higher numbers of [M+H]+ candidates compared with CliqueMS and CAMERA, especially with co-eluting metabolites. Detection of low-intensity ions and correct grouping were found to be essential for annotation performance.


2017 ◽  
Vol 5 (32) ◽  
Author(s):  
Isabelle Caldelari ◽  
Béatrice Chane-Woon-Ming ◽  
Céline Noirot ◽  
Karen Moreau ◽  
Pascale Romby ◽  
...  

ABSTRACT Staphylococcus aureus is an opportunistic Gram-positive pathogen responsible for a wide range of infections from minor skin abscesses to life-threatening diseases. Here, we report the draft genome assembly and current annotation of the HG001 strain, a derivative of the RN1 (NCT8325) strain with restored rbsU (a positive activator of SigB).


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2689 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Pedro J. Martínez-García ◽  
Abhaya M. Dandekar

Background: The transcriptome, a treasure trove of gene space information, remains severely under-used by current genome annotation methods. Methods: Here, we present an annotation method in the YeATS suite (YeATSAM), based on information encoded by the transcriptome, that demonstrates artifacts of the assembler, which must be addressed to achieve proper annotation. Results and Discussion: YeATSAM was applied to the transcriptome obtained from twenty walnut tissues and compared to MAKER-P annotation of the recently published walnut genome sequence (WGS). MAKER-P and YeATSAM both failed to annotate several hundred proteins found by the other. Although many of these unannotated proteins have repetitive sequences (possibly transposable elements), other crucial proteins were excluded by each method. An egg cell-secreted protein and a homer protein were undetected by YeATSAM, although these did not produce any transcripts. Importantly, MAKER-P failed to classify key photosynthesis-related proteins, which we show emanated from Trinity assembly artifacts potentially not handled by MAKER-P. Also, no proteins from the large berberine bridge enzyme (BBE) family were annotated by MAKER-P. BBE is implicated in biosynthesis of several alkaloids metabolites, like anti-microbial berberine. As further validation, YeATSAM identified ~1000 genes that are not annotated in the NCBI database by Gnomon. YeATSAM used a RNA-seq derived chickpea (Cicer arietinum L.) transcriptome assembled using Newbler v2.3. Conclusions: Since the current version of YeATSAM does not have an ab initio module, we suggest a combined annotation scheme using both MAKER-P and YeATSAM to comprehensively and accurately annotate the WGS.


2016 ◽  
Vol 7 (2) ◽  
pp. 1-28 ◽  
Author(s):  
Merel C.J. Scholman ◽  
Jacqueline Evers-Vermeul ◽  
Ted J.M. Sanders

Over the last decennia, annotating discourse coherence relations has gained increasing interest of the linguistics research community. Because of the complexity of coherence relations, there is no agreement on an annotation standard. Current annotation methods often lack a systematic order of coherence relations. In this article, we investigate the usability of the cognitive approach to coherence relations, developed by Sanders et al. (1992, 1993), for discourse annotation. The theory proposes a taxonomy of coherence relations in terms of four cognitive primitives. In this paper, we first develop a systematic, step-wise annotation process. The reliability of this annotation scheme is then tested in an annotation experiment with non-trained, non-expert annotators. An implicit and explicit version of the annotation instruction was created to determine whether the type of instruction influences the annotator agreement. The results show that two of the four primitives, polarity and order of the segments, can be applied reliably by non-trained annotators. The other two primitives, basic operation and source of coherence, are more problematic. Participants using the explicit instruction show higher agreement on the primitives than participants used the implicit instruction. These results are comparable to agreement statistics of other discourse corpora annotated by trained, expert annotators. Given that non-trained, non-expert annotators show similar amounts of agreement, these results indicate that the cognitive approach to coherence relations is a promising method for annotating discourse.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Mikhail A. Golyshev ◽  
Eugene V. Korotkov

Over the last years a great number of bacterial genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences. In this study we presented the computational method and the annotation system for predicting biological functions using phylogenetic profiles. The phylogenetic profile of a gene was created by way of searching for similarities between the nucleotide sequence of the gene and 1204 reference genomes, with further estimation of the statistical significance of found similarities. The profiles of the genes with known functions were used for prediction of possible functions and functional groups for the new genes. We conducted the functional annotation for genes from 104 bacterial genomes and compared the functions predicted by our system with the already known functions. For the genes that have already been annotated, the known function matched the function we predicted in 63% of the time, and in 86% of the time the known function was found within the top five predicted functions. Besides, our system increased the share of annotated genes by 19%. The developed system may be used as an alternative or complementary system to the current annotation systems.


Microbiology ◽  
2014 ◽  
Vol 160 (5) ◽  
pp. 832-843 ◽  
Author(s):  
Steven R. Cockerell ◽  
Alex C. Rutkovsky ◽  
Josiah P. Zayner ◽  
Rebecca E. Cooper ◽  
Lindsay R. Porter ◽  
...  

The polyamines norspermidine and spermidine are among the environmental signals that regulate Vibrio cholerae biofilm formation. The effects of these polyamines are mediated by NspS, a member of the bacterial periplasmic solute binding protein superfamily. Almost all members of this superfamily characterized to date are components of ATP-binding cassette-type transporters involved in nutrient uptake. Consequently, in the current annotation of the V. cholerae genome, NspS has been assigned a function in transport. The objective of this study was to further characterize NspS and investigate its potential role in transport. Our results support a role for NspS in signal transduction in response to norspermidine and spermidine, but not their transport. In addition, we provide evidence that these polyamine signals are processed by c-di-GMP signalling networks in the cell. Furthermore, we present comparative genomics analyses which reveal the presence of NspS-like proteins in a variety of bacteria, suggesting that periplasmic ligand binding proteins may be widely utilized for sensory transduction.


Microbiology ◽  
2011 ◽  
Vol 157 (10) ◽  
pp. 2831-2840 ◽  
Author(s):  
Mollie W. Jewett ◽  
Sunny Jain ◽  
Angelika K. Linowski ◽  
Amit Sarkar ◽  
Patricia A. Rosa

The conversion of nicotinamide to nicotinic acid by nicotinamidase enzymes is a critical step in maintaining NAD+ homeostasis and contributes to numerous important biological processes in diverse organisms. In Borrelia burgdorferi, the nicotinamidase enzyme, PncA, is required for spirochaete survival throughout the infectious cycle. Mammals lack nicotinamidases and therefore PncA may serve as a therapeutic target for Lyme disease. Contrary to the in vivo importance of PncA, the current annotation for the pncA ORF suggests that the encoded protein may be inactive due to the absence of an N-terminal aspartic acid residue that is a conserved member of the catalytic triad of characterized PncA proteins. Herein, we have used genetic and biochemical strategies to determine the N-terminal sequence of B. burgdorferi PncA. Our data demonstrate that the PncA protein is 24 aa longer than the currently annotated sequence and that pncA translation is initiated from the rare, non-canonical initiation codon AUU. These findings are an important first step in understanding the catalytic function of this in vivo-essential protein.


2007 ◽  
Vol 35 (Database) ◽  
pp. D595-D598 ◽  
Author(s):  
K. Hervold ◽  
A. Martin ◽  
R. A. Kirkpatrick ◽  
P. F. Mc Kenna ◽  
F. A. Ramirez-Weber

Sign in / Sign up

Export Citation Format

Share Document