An XML engine to model and query multimodal concurrent linguistic annotations
This paper presents an XML engine defined to model and query multimodal concurrent annotated data. This work stands in the context of the OTIM (Tools for Multimodal Annotation) project which aims at developing conventions and tools for multimodal annotation of a large conversational French speech corpus; it groups together Social Science and Computer Science researchers. Within OTIM, our objective is to provide linguists with a unique framework to encode and manipulate numerous linguistic domains: morpho-syntax, prosody, phonetics, disfluencies, discourse, gesture and posture. For that, it has to be possible to bring together and align all the different pieces of information (called annotations) associated to a corpus. We propose a complete pipeline from the annotation step to the management of the data within an XML Information System. This pipeline first relies on the formalisation of the linguistic knowledge and data within a OTIM specific XML format. A Java framework is proposed for interfacing with both linguists specific annotation tools and XML Information System. Finally, the querying of multimodal annotations within the XML information system using XQuery is presented. As annotations are time aligned, an extension of XQuery to Allen temporal relations is proposed. The paper conclude on a discussion about the interest of a pure XML approach for linguistic annotations information system and the question of the integration of the semantic within the pipeline.