human annotator
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 8)

H-INDEX

2
(FIVE YEARS 1)

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jonathan Shapey ◽  
Aaron Kujawa ◽  
Reuben Dorent ◽  
Guotai Wang ◽  
Alexis Dimitriadis ◽  
...  

AbstractAutomatic segmentation of vestibular schwannomas (VS) from magnetic resonance imaging (MRI) could significantly improve clinical workflow and assist patient management. We have previously developed a novel artificial intelligence framework based on a 2.5D convolutional neural network achieving excellent results equivalent to those achieved by an independent human annotator. Here, we provide the first publicly-available annotated imaging dataset of VS by releasing the data and annotations used in our prior work. This collection contains a labelled dataset of 484 MR images collected on 242 consecutive patients with a VS undergoing Gamma Knife Stereotactic Radiosurgery at a single institution. Data includes all segmentations and contours used in treatment planning and details of the administered dose. Implementation of our automated segmentation algorithm uses MONAI, a freely-available open-source framework for deep learning in healthcare imaging. These data will facilitate the development and validation of automated segmentation frameworks for VS and may also be used to develop other multi-modal algorithmic models.


2021 ◽  
Author(s):  
Jonathan Shapey ◽  
Aaron Kujawa ◽  
Reuben Dorent ◽  
Guotai Wang ◽  
Alexis Dimitriadis ◽  
...  

Automatic segmentation of vestibular schwannomas (VS) from magnetic resonance imaging (MRI) could significantly improve clinical workflow and assist patient management. We have previously developed a novel artificial intelligence framework based on a 2.5D convolutional neural network achieving excellent results equivalent to those achieved by an independent human annotator. Here, we provide the first publicly-available annotated imaging dataset of VS by releasing the data and annotations used in our prior work. This collection contains a labelled dataset of 484 MR images collected on 242 consecutive patients with a VS undergoing Gamma Knife Stereotactic Radiosurgery at a single institution. Data includes all segmentations and contours used in treatment planning and details of the administered dose. Implementation of our automated segmentation algorithm uses MONAI, a freely-available open-source framework for deep learning in healthcare imaging. These data will facilitate the development and validation of automated segmentation frameworks for VS and may also be used to develop other multi-modal algorithmic models.


2021 ◽  
Vol 14 (11) ◽  
pp. 2410-2418
Author(s):  
Yinjun Wu ◽  
James Weimer ◽  
Susan B. Davidson

High-quality labels are expensive to obtain for many machine learning tasks, such as medical image classification tasks. Therefore, probabilistic (weak) labels produced by weak supervision tools are used to seed a process in which influential samples with weak labels are identified and cleaned by several human annotators to improve the model performance. To lower the overall cost and computational overhead of this process, we propose a solution called CHEF (CHEap and Fast label cleaning), which consists of the following three components. First, to reduce the cost of human annotators, we use INFL, which prioritizes the most influential training samples for cleaning and provides cleaned labels to save the cost of one human annotator. Second, to accelerate the sample selector phase and the model constructor phase, we use Increm-INFL to incrementally produce influential samples, and DeltaGrad-L to incrementally update the model. Third, we redesign the typical label cleaning pipeline so that human annotators iteratively clean smaller batch of samples rather than one big batch of samples. This yields better overall model performance and enables possible early termination when the expected model performance has been achieved. Extensive experiments show that our approach gives good model prediction performance while achieving significant speed-ups.


2021 ◽  
Vol 2 ◽  
Author(s):  
Henrique Santos ◽  
Mayank Kejriwal ◽  
Alice M. Mulvehill ◽  
Gretchen Forbush ◽  
Deborah L. McGuinness

Abstract Developing agents capable of commonsense reasoning is an important goal in Artificial Intelligence (AI) research. Because commonsense is broadly defined, a computational theory that can formally categorize the various kinds of commonsense knowledge is critical for enabling fundamental research in this area. In a recent book, Gordon and Hobbs described such a categorization, argued to be reasonably complete. However, the theory’s reliability has not been independently evaluated through human annotator judgments. This paper describes such an experimental study, whereby annotations were elicited across a subset of eight foundational categories proposed in the original Gordon-Hobbs theory. We avoid bias by eliciting annotations on 200 sentences from a commonsense benchmark dataset independently developed by an external organization. The results show that, while humans agree on relatively concrete categories like time and space, they disagree on more abstract concepts. The implications of these findings are briefly discussed.


2020 ◽  
Vol 34 (05) ◽  
pp. 8123-8130
Author(s):  
Caterina Lacerra ◽  
Michele Bevilacqua ◽  
Tommaso Pasini ◽  
Roberto Navigli

Word Sense Disambiguation (WSD) is the task of associating a word in context with one of its meanings. While many works in the past have focused on raising the state of the art, none has even come close to achieving an F-score in the 80% ballpark when using WordNet as its sense inventory. We contend that one of the main reasons for this failure is the excessively fine granularity of this inventory, resulting in senses that are hard to differentiate between, even for an experienced human annotator. In this paper we cope with this long-standing problem by introducing Coarse Sense Inventory (CSI), obtained by linking WordNet concepts to a new set of 45 labels. The results show that the coarse granularity of CSI leads a WSD model to achieve 85.9% F1, while maintaining a high expressive power. Our set of labels also exhibits ease of use in tagging and a descriptiveness that other coarse inventories lack, as demonstrated in two annotation tasks which we performed. Moreover, a few-shot evaluation proves that the class-based nature of CSI allows the model to generalise over unseen or under-represented words.


Algorithms ◽  
2019 ◽  
Vol 12 (10) ◽  
pp. 217 ◽  
Author(s):  
Alaa E. Abdel Hakim ◽  
Wael Deabes

In supervised Activities of Daily Living (ADL) recognition systems, annotating collected sensor readings is an essential, yet exhaustive, task. Readings are collected from activity-monitoring sensors in a 24/7 manner. The size of the produced dataset is so huge that it is almost impossible for a human annotator to give a certain label to every single instance in the dataset. This results in annotation gaps in the input data to the adopting learning system. The performance of the recognition system is negatively affected by these gaps. In this work, we propose and investigate three different paradigms to handle these gaps. In the first paradigm, the gaps are taken out by dropping all unlabeled readings. A single “Unknown” or “Do-Nothing” label is given to the unlabeled readings within the operation of the second paradigm. The last paradigm handles these gaps by giving every set of them a unique label identifying the encapsulating certain labels. Also, we propose a semantic preprocessing method of annotation gaps by constructing a hybrid combination of some of these paradigms for further performance improvement. The performance of the proposed three paradigms and their hybrid combination is evaluated using an ADL benchmark dataset containing more than 2.5 × 10 6 sensor readings that had been collected over more than nine months. The evaluation results emphasize the performance contrast under the operation of each paradigm and support a specific gap handling approach for better performance.


Author(s):  
Yanbing Xue ◽  
Milos Hauskrecht

In this paper, we study the problem of learning multi-class classification models from a limited set of labeled examples obtained from human annotator. We propose a new machine learning framework that learns multi-class classification models from ordered class sets the annotator may use to express not only her top class choice but also other competing classes still under consideration. Such ordered sets of competing classes are common, for example, in various diagnostic tasks. In this paper, we first develop strategies for learning multi-class classification models from examples associated with ordered class set information. After that we develop an active learning strategy that considers such a feedback. We evaluate the benefit of the framework on multiple datasets. We show that class-order feedback and active learning can reduce the annotation cost both individually and jointly.


2019 ◽  
Vol 8 (4) ◽  
pp. 161 ◽  
Author(s):  
Morteza Karimzadeh ◽  
Alan MacEachren

Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP.


2012 ◽  
Vol 7 ◽  
Author(s):  
Masood Ghayoomi

In this paper, we describe an ongoing research to develop an HPSG-based treebank for Persian. To this aim, we use a bootstrapping approach for the data annotation. In the first step, a set of seed rules are defined as regular expressions in the CLaRK system. Then, the data is shallow processed with this set of rules. In the next step, a human annotator completes the annotation of sentences manually. To increase automatic annotation, we extract the manual applied rules and iteratively augment the seed rules with the rules applied frequently in the manual annotation. Our experiment in building the Persian treebank which currently contains 1000 sentences shows that the proposed method reduces human intervention from 74.05% in first iterations to 39.01% in last iterations. 


2011 ◽  
Vol 2 ◽  
Author(s):  
Øistein E. Andersen

AbstractManual error annotation of learner corpora is time-consuming and error-prone, whereas existing automatic techniques cannot reliably detect and correct all types of error. This paper shows that the two methods can successfully complement each other: automatic detection and partial correction of trivial errors relieves the human annotator from the laborious task of incessantly marking up oft-committed mistakes and enables him or her to focus on errors which cannot or cannot yet be handled mechanically, thus enabling more consistent annotation with considerably less manual time and effort expended.


Sign in / Sign up

Export Citation Format

Share Document