Manual Annotation | ScienceGate

During natural conversation, people must quickly understand the meaning of what the other speaker is saying. This concerns not just the semantic content of an utterance, but also the social action (i.e., what the utterance is doing—requesting information, offering, evaluating, checking mutual understanding, etc.) that the utterance is performing. The multimodal nature of human language raises the question of whether visual signals may contribute to the rapid processing of such social actions. However, while previous research has shown that how we move reveals the intentions underlying instrumental actions, we do not know whether the intentions underlying fine-grained social actions in conversation are also revealed in our bodily movements. Using a corpus of dyadic conversations combined with manual annotation and motion tracking, we analyzed the kinematics of the torso, head, and hands during the asking of questions. Manual annotation categorized these questions into six more fine-grained social action types (i.e., request for information, other-initiated repair, understanding check, stance or sentiment, self-directed, active participation). We demonstrate, for the first time, that the kinematics of the torso, head and hands differ between some of these different social action categories based on a 900 ms time window that captures movements starting slightly prior to or within 600 ms after utterance onset. These results provide novel insights into the extent to which our intentions shape the way that we move, and provide new avenues for understanding how this phenomenon may facilitate the fast communication of meaning in conversational interaction, social action, and conversation.

Download Full-text

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3424247 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-12

Author(s):

Shumin Shi ◽

Dan Luo ◽

Xing Wu ◽

Congjun Long ◽

Heyan Huang

Keyword(s):

Language Processing ◽

Manual Annotation ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Low Resource ◽

Resource Setting ◽

Dependency Tree ◽

Low Resource Setting ◽

Novel Method ◽

Multi Level

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

Download Full-text

Fast Accurate and Automatic Brushstroke Extraction

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3429742 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-24

Author(s):

Yunfei Fu ◽

Hongchuan Yu ◽

Chih-Kuo Yeh ◽

Tong-Yee Lee ◽

Jian J. Zhang

Keyword(s):

Neural Network ◽

Efficient Algorithm ◽

Deep Neural Network ◽

High Efficiency ◽

State Of The Art ◽

High Reliability ◽

The Other ◽

Manual Annotation ◽

Stroke Extraction ◽

Art Research

Brushstrokes are viewed as the artist’s “handwriting” in a painting. In many applications such as style learning and transfer, mimicking painting, and painting authentication, it is highly desired to quantitatively and accurately identify brushstroke characteristics from old masters’ pieces using computer programs. However, due to the nature of hundreds or thousands of intermingling brushstrokes in the painting, it still remains challenging. This article proposes an efficient algorithm for brush Stroke extraction based on a Deep neural network, i.e., DStroke. Compared to the state-of-the-art research, the main merit of the proposed DStroke is to automatically and rapidly extract brushstrokes from a painting without manual annotation, while accurately approximating the real brushstrokes with high reliability. Herein, recovering the faithful soft transitions between brushstrokes is often ignored by the other methods. In fact, the details of brushstrokes in a master piece of painting (e.g., shapes, colors, texture, overlaps) are highly desired by artists since they hold promise to enhance and extend the artists’ powers, just like microscopes extend biologists’ powers. To demonstrate the high efficiency of the proposed DStroke, we perform it on a set of real scans of paintings and a set of synthetic paintings, respectively. Experiments show that the proposed DStroke is noticeably faster and more accurate at identifying and extracting brushstrokes, outperforming the other methods.

Download Full-text

Borehole and Ice Feature Annotation Tool (BIFAT): A program for the automatic and manual annotation of glacier borehole images

Computers & Geosciences ◽

10.1016/j.cageo.2012.09.002 ◽

2013 ◽

Vol 51 ◽

pp. 381-389 ◽

Cited By ~ 9

Author(s):

Terry Malone ◽

Bryn Hubbard ◽

Derek Merton-Lyn ◽

Paul Worthington ◽

Reyer Zwiggelaar

Keyword(s):

Manual Annotation ◽

Annotation Tool

Download Full-text

Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500406 ◽

2021 ◽

Vol 31 (09) ◽

pp. 1251-1275

Author(s):

Di Wu ◽

Xiao-Yuan Jing ◽

Haowen Chen ◽

Xiaohui Kong ◽

Jifeng Xuan

Keyword(s):

Natural Language ◽

State Of The Art ◽

Metric Learning ◽

Application Programming Interface ◽

Manual Annotation ◽

Candidate List ◽

Novel Approach ◽

Application Programming ◽

Programming Interface ◽

Reciprocal Rank

Application Programming Interface (API) tutorial is an important API learning resource. To help developers learn APIs, an API tutorial is often split into a number of consecutive units that describe the same topic (i.e. tutorial fragment). We regard a tutorial fragment explaining an API as a relevant fragment of the API. Automatically recommending relevant tutorial fragments can help developers learn how to use an API. However, existing approaches often employ supervised or unsupervised manner to recommend relevant fragments, which suffers from much manual annotation effort or inaccurate recommended results. Furthermore, these approaches only support developers to input exact API names. In practice, developers often do not know which APIs to use so that they are more likely to use natural language to describe API-related questions. In this paper, we propose a novel approach, called Tutorial Fragment Recommendation (TuFraRec), to effectively recommend relevant tutorial fragments for API-related natural language questions, without much manual annotation effort. For an API tutorial, we split it into fragments and extract APIs from each fragment to build API-fragment pairs. Given a question, TuFraRec first generates several clarification APIs that are related to the question. We use clarification APIs and API-fragment pairs to construct candidate API-fragment pairs. Then, we design a semi-supervised metric learning (SML)-based model to find relevant API-fragment pairs from the candidate list, which can work well with a few labeled API-fragment pairs and a large number of unlabeled API-fragment pairs. In this way, the manual effort for labeling the relevance of API-fragment pairs can be reduced. Finally, we sort and recommend relevant API-fragment pairs based on the recommended strategy. We evaluate TuFraRec on 200 API-related natural language questions and two public tutorial datasets (Java and Android). The results demonstrate that on average TuFraRec improves NDCG@5 by 0.06 and 0.09, and improves Mean Reciprocal Rank (MRR) by 0.07 and 0.09 on two tutorial datasets as compared with the state-of-the-art approach.

Download Full-text

Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model Without Manual Annotation

Computer Vision – ECCV 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58520-4_9 ◽

2020 ◽

pp. 139-155

Author(s):

Qingqiu Huang ◽

Lei Yang ◽

Huaiyi Huang ◽

Tong Wu ◽

Dahua Lin

Keyword(s):

Face Recognition ◽

State Of The Art ◽

Manual Annotation ◽

Face Model

Download Full-text

Biomedical Image Segmentation via Representative Annotation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015901 ◽

2019 ◽

Vol 33 ◽

pp. 5901-5908 ◽

Cited By ~ 8

Author(s):

Hao Zheng ◽

Lin Yang ◽

Jianxu Chen ◽

Jun Han ◽

Yizhe Zhang ◽

...

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Active Learning ◽

Iterative Process ◽

Image Data ◽

Manual Annotation ◽

Convolutional Network ◽

Biomedical Image ◽

Image Patches ◽

Representative Image

Deep learning has been applied successfully to many biomedical image segmentation tasks. However, due to the diversity and complexity of biomedical image data, manual annotation for training common deep learning models is very timeconsuming and labor-intensive, especially because normally only biomedical experts can annotate image data well. Human experts are often involved in a long and iterative process of annotation, as in active learning type annotation schemes. In this paper, we propose representative annotation (RA), a new deep learning framework for reducing annotation effort in biomedical image segmentation. RA uses unsupervised networks for feature extraction and selects representative image patches for annotation in the latent space of learned feature descriptors, which implicitly characterizes the underlying data while minimizing redundancy. A fully convolutional network (FCN) is then trained using the annotated selected image patches for image segmentation. Our RA scheme offers three compelling advantages: (1) It leverages the ability of deep neural networks to learn better representations of image data; (2) it performs one-shot selection for manual annotation and frees annotators from the iterative process of common active learning based annotation schemes; (3) it can be deployed to 3D images with simple extensions. We evaluate our RA approach using three datasets (two 2D and one 3D) and show our framework yields competitive segmentation results comparing with state-of-the-art methods.

Download Full-text

Schaalvergroting in het syntactische alternantieonderzoek : Een nieuwe analyse van het presentatieve er met automatisch gegenereerde predictoren

Nederlandse taalkunde ◽

10.5117/nedtaa2020.1.005.spee ◽

2020 ◽

Vol 25 (1) ◽

pp. 101-123

Author(s):

Dirk Speelman ◽

Stefan Grondelaers ◽

Benedikt Szmrecsanyi ◽

Kris Heylen

Keyword(s):

The Other ◽

Manual Annotation ◽

Distributional Analysis ◽

Syntactic Variation ◽

Semantic Class ◽

Semantic Classes ◽

Reference Corpus ◽

Lexical Collocations ◽

The Subject ◽

Pragmatic Factor

Abstract In this paper, we revisit earlier analyses of the distribution of er ‘there’ in adjunct-initial sentences to demonstrate the merits of computational upscaling in syntactic variation research. Contrary to previous studies, in which major semantic and pragmatic predictors (viz. adjunct type, adjunct concreteness, and verb specificity) had to be coded manually, the present study operationalizes these predictors on the basis of distributional analysis: instead of hand-coding for specific semantic classes, we determine the semantic class of the adjunct, verb, and subject automatically by clustering the lexemes in those slots on the basis of their ‘semantic passport’ (as established on the basis of their distributional behaviour in a reference corpus). These clusters are subsequently interpreted as proxies for semantic classes. In addition, the pragmatic factor ‘subject predictability’ is operationalized automatically on the basis of collocational attraction measures, as well as distributional similarity between the other slots and the subject. We demonstrate that the distribution of er can be modelled equally successfully with the automated approach as in manual annotation-based studies. Crucially, the new method replicates our earlier findings that the Netherlandic data are easier to model than the Belgian data, and that lexical collocations play a bigger role in the Netherlandic than in the Belgian data. On a methodological level, the proposed automatization opens up a window of opportunities. Most important is its scalability: it allows for a larger gamut of alternations that can be investigated in one study, and for much larger datasets to represent each alternation.

Download Full-text

Semantic Web Image Search through Manual Annotation

International Journal of Computer Applications ◽

10.5120/2238-2861 ◽

2011 ◽

Vol 17 (8) ◽

pp. 39-42

Author(s):

M. Gokul Prasad ◽

T. Sumathi ◽

M. Hemalatha

Keyword(s):

Semantic Web ◽

Manual Annotation ◽

Image Search

Download Full-text

A Check on Annotation in Sentiment Research

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1065.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1346-1350

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Research Area ◽

Research Literature ◽

Manual Annotation ◽

Automatic Annotation ◽

Annotation Method ◽

New Concepts

The research literature on sentiment analysis methodologies has exponentially grown in recent years. In any research area, where new concepts and techniques are constantly introduced, it is, therefore, of interest to analyze the latest trends in this literature. In particular, we have chosen to primarily focus on the literature of the last five years, on annotation methodologies, including frequently used datasets and from which they were obtained. Based on the survey, it appears that researchers do more manual annotation in the formation of sentiment corpus. As for the dataset, there are still many uses of English language taken from social media such as Twitter. In this area of research, there are still many that need to be explored, such as the use of semi-automatic annotation method that is still very rarely used by researchers. Also, less popular languages, such as Malay, Korean, Japanese, and so on, still require corpus for sentiment analysis research.

Download Full-text