annotation quality
Recently Published Documents


TOTAL DOCUMENTS

55
(FIVE YEARS 31)

H-INDEX

8
(FIVE YEARS 3)

Author(s):  
Francesca Lizzi ◽  
Abramo Agosti ◽  
Francesca Brero ◽  
Raffaella Fiamma Cabini ◽  
Maria Evelina Fantacci ◽  
...  

Abstract Purpose This study aims at exploiting artificial intelligence (AI) for the identification, segmentation and quantification of COVID-19 pulmonary lesions. The limited data availability and the annotation quality are relevant factors in training AI-methods. We investigated the effects of using multiple datasets, heterogeneously populated and annotated according to different criteria. Methods We developed an automated analysis pipeline, the LungQuant system, based on a cascade of two U-nets. The first one (U-net$$_1$$ 1 ) is devoted to the identification of the lung parenchyma; the second one (U-net$$_2$$ 2 ) acts on a bounding box enclosing the segmented lungs to identify the areas affected by COVID-19 lesions. Different public datasets were used to train the U-nets and to evaluate their segmentation performances, which have been quantified in terms of the Dice Similarity Coefficients. The accuracy in predicting the CT-Severity Score (CT-SS) of the LungQuant system has been also evaluated. Results Both the volumetric DSC (vDSC) and the accuracy showed a dependency on the annotation quality of the released data samples. On an independent dataset (COVID-19-CT-Seg), both the vDSC and the surface DSC (sDSC) were measured between the masks predicted by LungQuant system and the reference ones. The vDSC (sDSC) values of 0.95±0.01 and 0.66±0.13 (0.95±0.02 and 0.76±0.18, with 5 mm tolerance) were obtained for the segmentation of lungs and COVID-19 lesions, respectively. The system achieved an accuracy of 90% in CT-SS identification on this benchmark dataset. Conclusion We analysed the impact of using data samples with different annotation criteria in training an AI-based quantification system for pulmonary involvement in COVID-19 pneumonia. In terms of vDSC measures, the U-net segmentation strongly depends on the quality of the lesion annotations. Nevertheless, the CT-SS can be accurately predicted on independent test sets, demonstrating the satisfactory generalization ability of the LungQuant.


2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-16
Author(s):  
Federico Bonetti ◽  
Sara Tonelli

Gamification has been recently growing in popularity among researchers investigating Information and Communication Technologies. Scholars have been trying to take advantage of this approach in the field of natural language processing (NLP), developing Games With A Purpose (GWAPs) for corpus annotation that have obtained encouraging results both in annotation quality and overall cost. However, GWAPs implement gamification in different ways and to different degrees. We propose a new framework to investigate the mechanics employed in the gamification process and their magnitude in terms of complexity. This framework is based on an analysis of some of the most important contributions in the field of NLP-related gamified applications and GWAP theory. Its primary purpose is to provide a first step towards classifying mechanics that mimic mainstream video games and may require skills that are not relevant to the annotation task, defined as orthogonal mechanics. In order to test our framework, we develop and evaluate Spacewords, a linguistic space game for synonymy annotation.


Insects ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 748
Author(s):  
Surya Saha ◽  
Amanda M. Cooksey ◽  
Anna K. Childers ◽  
Monica F. Poelchau ◽  
Fiona M. McCarthy

Genome sequencing of a diverse array of arthropod genomes is already underway, and these genomes will be used to study human health, agriculture, biodiversity, and ecology. These new genomes are intended to serve as community resources and provide the foundational information required to apply ‘omics technologies to a more diverse set of species. However, biologists require genome annotation to use these genomes and derive a better understanding of complex biological systems. Genome annotation incorporates two related, but distinct, processes: Demarcating genes and other elements present in genome sequences (structural annotation); and associating a function with genetic elements (functional annotation). While there are well-established and freely available workflows for structural annotation of gene identification in newly assembled genomes, workflows for providing the functional annotation required to support functional genomics studies are less well understood. Genome-scale functional annotation is required for functional modeling (enrichment, networks, etc.). A first-pass genome-wide functional annotation effort can rapidly identify under-represented gene sets for focused community annotation efforts. We present an open-source, open access, and containerized pipeline for genome-scale functional annotation of insect proteomes and apply it to various arthropod species. We show that the performance of the predictions is consistent across a set of arthropod genomes with varying assembly and annotation quality.


2021 ◽  
Author(s):  
Roman Martin ◽  
Hagen Dreßler ◽  
Georges Hattab ◽  
Thomas Hackl ◽  
Matthias G Fischer ◽  
...  

Due to the highly growing number of available genomic information, the need for accessible and easy-to-use analysis tools is increasing. To facilitate eukaryotic genome annotations, we created MOSGA. In this work, we show how MOSGA~2 is developed by including several advanced analyses for genomic data. Since the genomic data quality greatly impacts the annotation quality, we included multiple tools to validate and ensure high-quality user-submitted genome assemblies. Moreover, thanks to the integration of comparative genomics methods, users can benefit from a broader genomic view by analyzing multiple genomic data sets simultaneously. Further, we demonstrate the new functionalities of MOSGA~2 by different use-cases and practical examples. MOSGA~2 extends the already established application to the quality control of the genomic data and integrates and analyzes multiple genomes in a larger context, e.g., by phylogenetics.


2021 ◽  
Vol 8 (2) ◽  
pp. 76-106
Author(s):  
Samreen Anjum ◽  
Ambika Verma ◽  
Brandon Dang ◽  
Danna Gurari

We investigate what, if any, benefits arise from employing hybrid algorithm-crowdsourcing approaches over conventional approaches of relying exclusively on algorithms or crowds to annotate images.  We introduce a framework that enables users to investigate different hybrid workflows for three popular image analysis tasks: image classification, object detection, and image captioning.   Three hybrid approaches are included that are based on having workers: (i) verify predicted labels, (ii) correct predicted labels, and (iii) annotate images for which algorithms have low confidence in their predictions.  Deep learning algorithms are employed in these workflows since they offer high performance for image annotation tasks.  Each workflow is evaluated with respect to annotation quality and worker time to completion on images coming from three diverse datasets (i.e., VOC, MSCOCO, VizWiz). Inspired by our findings, we offer recommendations regarding when and how to employ deep learning with crowdsourcing to achieve desired quality and efficiency for image annotation.


Author(s):  
Madhvi Venkatraman ◽  
Robert C Fleischer ◽  
Mirian T N Tsuchiya

Abstract Introduced into Hawaii in the early 1900s, the Japanese white-eye or warbling white-eye (Zosterops japonicus) is now the most abundant land bird in the archipelago. Here, we present the first Z. japonicus genome, sequenced from an individual in its invasive range. This genome provides an important resource for future studies in invasion genomics. We annotated the genome using two workflows – standalone AUGUSTUS and BRAKER2. We found that AUGUSTUS was more conservative with gene predictions when compared to BRAKER2. The final number of annotated gene models was similar between the two workflows, but standalone AUGUSTUS had over 70% of gene predictions with Blast2GO annotations versus under 30% using BRAKER2. Additionally, we tested whether using RNA-seq data from 47 samples had a significant impact on annotation quality when compared to data from a single sample, as generating RNA-seq data for genome annotation can be expensive and requires well preserved tissue. We found that more data did not significantly change the number of annotated genes using AUGUSTUS but using BRAKER2 the number increased substantially. The results presented here will aid researchers in annotating draft genomes of non-model species as well as those studying invasion success.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hoi-Yan Wu ◽  
Kwun-Tin Chan ◽  
Grace Wing-Chiu But ◽  
Pang-Chui Shaw

AbstractDNA-based method is a promising tool in species identification and is widely used in various fields. DNA barcoding method has already been included in different pharmacopoeias for identification of medicinal materials or botanicals. Accuracy and validity of DNA-based methods rely on the accuracy and taxonomic reliability of the DNA sequences in the database to be compared against. Here we evaluated the annotation quality and taxonomic reliability of selected barcode loci (rbcL, matK, psbA-trnH, trnL-trnF and ITS) of 41 medicinal Dendrobium species downloaded from GenBank. Annotations of most accessions are incomplete. Only 53.06% of the 2041 accessions downloaded contain a reference to a voucher specimen. Only 31.60% and 4.8% of the entries are annotated with country of origin and collector or assessor, respectively. Taxonomic reliability of the sequences was evaluated by a Megablast search based on similarity to sequences submitted by other research groups. A small number of sequences (211, 7.14%) was regarded as highly doubted. Moreover, 10 out of 60 complete chloroplast genomes contain highly doubted sequences. Our findings suggest that sequences of GenBank should be used with caution for species-level identification. The scientific community should provide more important information regarding identity and traceability of the sample when they deposit sequences to public databases.


2021 ◽  
Author(s):  
Holly Lopez Long ◽  
◽  
Alexandra O’Neil ◽  
Sandra Kübler ◽  
◽  
...  

Sign in / Sign up

Export Citation Format

Share Document