Managing, storing, and sharing long-form recordings and their annotations
The technique of long-form recordings via wearables is gaining momentum in different fields of research, notably linguistics and pathology. This technique, however, poses several technical challenges, some of which are amplified by the peculiarities of the data, including their sensitivity and their volume. In this paper, we begin by outlining key problems related to the management, storage, and sharing of the corpora that emerge when using this technique. We continue by proposing a multi-component solution to these problems, specifically in the case of daylong recordings of children. As part of this solution, we release ChildProject, a python package for performing the operations typically required by such datasets and for evaluating the reliability of annotations using a number of measures commonly used in speech processing and linguistics. Our proposal could be generalized to other populations.