annotation strategy
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 12)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xuan Gu ◽  
Zhengya Sun ◽  
Wensheng Zhang

Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1194
Author(s):  
Daniel Longhi Fernandes Pedro ◽  
Tharcisio Soares Amorim ◽  
Alessandro Varani ◽  
Romain Guyot ◽  
Douglas Silva Domingues ◽  
...  

Advances in genomic sequencing have recently offered vast opportunities for biological exploration, unraveling the evolution and improving our understanding of Earth biodiversity. Due to distinct plant species characteristics in terms of genome size, ploidy and heterozygosity, transposable elements (TEs) are common characteristics of many genomes. TEs are ubiquitous and dispersed repetitive DNA sequences that frequently impact the evolution and composition of the genome, mainly due to their redundancy and rearrangements. For this study, we provided an atlas of TE data by employing an easy-to-use portal (APTE website). To our knowledge, this is the most extensive and standardized analysis of TEs in plant genomes. We evaluated 67 plant genomes assembled at chromosome scale, recovering a total of 49,802,023 TE records, representing a total of 47,992,091,043 (~47,62%) base pairs (bp) of the total genomic space. We observed that new types of TEs were identified and annotated compared to other data repositories. By establishing a standardized catalog of TE annotation on 67 genomes, new hypotheses, exploration of TE data and their influences on the genomes may allow a better understanding of their function and processes. All original code and an example of how we developed the TE annotation strategy is available on GitHub (Extended data).


2021 ◽  
pp. 100153
Author(s):  
Shiori Kuraoka ◽  
Hideyuki Higashi ◽  
Yoshihiro Yanagihara ◽  
Abhijeet R. Sonawane ◽  
Shin Mukai ◽  
...  

GigaScience ◽  
2020 ◽  
Vol 9 (3) ◽  
Author(s):  
Matthias Hörtenhuber ◽  
Abdul K Mukarram ◽  
Marcus H Stoiber ◽  
James B Brown ◽  
Carsten O Daub

Abstract Background Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. Findings We first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing procedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality. Conclusions We present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts.


2019 ◽  
Author(s):  
Adriano de Bernardi Schneider ◽  
Denis Jacob Machado ◽  
Daniel Janies

The ongoing and severe public health threat of viruses of the family Flaviviridae, including dengue, hepatitis C, West Nile, yellow fever, and zika, demand a greater understanding of how these viruses evolve, emerge and spread in order to respond. Central to this understanding is an updated phylogeny of the entire family. Unfortunately, most cladograms of Flaviviridae focus on specific lineages, ignore outgroups, and rely on midpoint rooting, hampering their ability to test ingroup monophyly and estimate ingroup relationships. This problem is partly due to the lack of fully annotated genomes of Flaviviridae, which has genera with slightly different gene content, hindering genome analysis without partitioning. To tackle these problems, we developed an annotation pipeline for Flaviviridae that uses a combination of ab initio and homology-based strategies. The pipeline recovered 100% of the genes in reference genomes and annotated over 97% of the expected genes in the remaining non curated sequences. We further demonstrate that the combined analysis of genomes of all genera of Flaviviridae (Flavivirus, Hepacivirus, Pegivirus, and Pestivirus), as made possible by our annotation strategy, enhances the phylogenetic analyses of these viruses for all optimality criteria that we tested (parsimony, maximum likelihood, and posterior probability). The final tree sheds light on the phylogenetic relationship of viruses that are divergent from most Flaviviridae and should be reclassified, especially the soybean cyst nematode virus 5 (SbCNV-5) and the Tamana bat virus. We also corroborate the close phylogenetic relationship of dengue and zika viruses with an unprecedented degree of support.


2019 ◽  
Author(s):  
Patricia P. Chan ◽  
Brian Y. Lin ◽  
Allysia J. Mak ◽  
Todd M. Lowe

ABSTRACTtRNAscan-SE has been widely used for whole-genome transfer RNA gene prediction for nearly two decades. With the increased availability of new genomes, a vastly larger training set has enabled creation of nearly one hundred specialized isotype-specific models, greatly improving tRNAscan-SE’s ability to identify and classify both typical and atypical tRNAs. We employ a new multi-model annotation strategy where predicted tRNAs are scored against a full set of isotype-specific covariance models. A post-filtering feature also better identifies tRNA-derived SINEs that are abundant in many eukaryotic genomes, and provides a “high confidence” tRNA gene set which improves upon prior pseudogene prediction. These new enhancements of tRNAscan-SE will provide researchers more accurate detection and more comprehensive annotation for tRNA genes.


Sign in / Sign up

Export Citation Format

Share Document