scholarly journals ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings

2020 ◽  
Author(s):  
Okko Räsänen ◽  
Shreyas Seshadri ◽  
Marvin Lavechin ◽  
Alejandrina Cristia ◽  
Marisa Casillas

Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language-independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, allowing automatic phoneme, syllable, and word count estimation from child-centered audio recordings.

2019 ◽  
Author(s):  
Okko Räsänen ◽  
Shreyas Seshadri ◽  
julien karadayi ◽  
Eric Riebling ◽  
John P Bunce ◽  
...  

Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of- the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.


2019 ◽  
Vol 113 ◽  
pp. 63-80 ◽  
Author(s):  
Okko Räsänen ◽  
Shreyas Seshadri ◽  
Julien Karadayi ◽  
Eric Riebling ◽  
John Bunce ◽  
...  

Author(s):  
Okko Räsänen ◽  
Shreyas Seshadri ◽  
Marvin Lavechin ◽  
Alejandrina Cristia ◽  
Marisa Casillas

2021 ◽  
Vol 14 (3) ◽  
pp. 4-11
Author(s):  
Evgeniy Anikeev

Various methods of collecting data on passenger traffic, their advantages and disadvantages are considered. It is shown that in order to improve the quality of transport services, it is necessary to regularly collect and refine data on passenger traffic. The goals and methods of obtaining information about passenger traffic in the system of municipal passenger transport are indicated. All currently existing methods are divided into three categories: data collection using technical means, data collection with the help of censors and volunteers, and interpretation of fare payments. All the methods presented in the article were compared in terms of labor intensity, costs and accuracy of the results obtained. The advantages and disadvantages of each method are considered. The general structure of an automated system for collecting data on passenger traffic is presented. The necessity of creating a centralized system for collecting and processing data associated with all passenger transport control systems has been substantiated. The tasks solved by this system at all levels of transport services for passengers are shown. Each of the tasks is assigned to one of three service levels: pre-transport, transport and post-transport. It is shown that only solving problems at all levels can ensure high-quality operation of the municipal passenger transport system.


Author(s):  
Anthony A. Piña

In this chapter, the reader is taken through a macro level view of learning management systems, with a particular emphasis on systems offered by commercial vendors. Included is a consideration of the growth of learning management systems during the past decade, the common features and tools contained within these systems, and a look at the advantages and disadvantages that learning management systems provide to institutions. In addition, the reader is presented with specific resources and options for evaluating, selecting and deploying learning management systems. A section highlighting the possible advantages and disadvantages of selecting a commercial versus an open source system is followed by a series of brief profiles of the leading vendors of commercial and open source learning management systems.


Author(s):  
Shahriar Shams

There has been a significant development in the area of free and open source geospatial software. Research has flourished over the decades from vendor-dependent software to open source software where researchers are paying increasing attention to maximize the value of their data. It is often a difficult task to choose particular open source GIS (OGIS) software among a number of emerging OGIS software. It is important to characterise the projects according to some unified criteria. Each software has certain advantages and disadvantages and it is always time consuming to identify exactly which software to select for a specific purpose. This chapter focuses on the assessment criteria enabling developers, researchers, and GIS users to select suitable OGIS software to meet their requirements for analysis and design of geospatial application in multidisciplinary fields. This chapter highlights the importance of assessment criteria, followed by an explanation of each criteria and their significance with examples from existing OGIS software.


2009 ◽  
pp. 603-619
Author(s):  
Walt Scacchi

This study examines the development of open source software supporting e-commerce (EC) or e-business (EB) capabilities. This entails a case study within a virtual organization engaged in an organizational initiative to develop, deploy, and support free/open source software systems for EC or EB services, like those supporting enterprise resource planning. The objective of this study is to identify and characterize the resource-based software product development capabilities that lie at the center of the initiative, rather than the software itself, or the effectiveness of its operation in a business enterprise. By learning what these resources are, and how they are arrayed into product development capabilities, we can provide the knowledge needed to understand what resources are required to realize the potential of free EC and EB software applications. In addition, the resource-based view draws attention to those resources and capabilities that provide potential competitive advantages and disadvantages to the organization in focus.


2019 ◽  
Vol 19 (3) ◽  
pp. 237-243 ◽  
Author(s):  
Sofia Z. Sheikh

AbstractIt can be difficult to develop an effective and balanced search strategy in SETI, especially from a funding perspective, given the diverse methodologies and myriad orthogonal proposals for the best technosignatures. Here I propose a framework to compare the relative advantages and disadvantages of various proposed technosignatures based on nine ‘axes of merit’. This framework was first developed at the NASA Technosignatures Workshop in Houston in 2018 and published in that report. I give the definition and rationale behind the nine axes as well as the history of each axis in the SETI and technosignature literature. These axes are then applied to three classes of technosignature searches as an illustration of their use. An open-source software tool is available to allow technosignature researchers to make their own version of the figure.


Geosciences ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. 397
Author(s):  
Marcin Budzynski ◽  
Kazimierz Jamroz ◽  
Jerzy Pyrchla ◽  
Wojciech Kustra ◽  
Adam Inglot ◽  
...  

This paper presents the results of research conducted to develop an automated system capable of determining parameters for horizontal curves. The system presented in this article could calculate the actual course of a road by means of a two-stage positioning of recorded points along the road. In the first stage, measurements were taken with a Real-Time Network (RTN) receiver installed in a research vehicle. In the second stage, pictures from three cameras, also installed in the vehicle, were analyzed in order to correct the accuracy of the location of the measurement points along the road. The RTN messages and the pictures from the cameras were sent to a mobile workstation which integrated the received signals in an ArcGIS (Esri) environment. The system provides a way to quickly accumulate highly accurate data on the actual geometric parameters of a road. The computer scripts developed by the authors on the basis of the acquired data could automatically determine the parameters of the horizontal curves. The solution was tested in the field and some comments on its advantages and disadvantages are presented in this paper. The automation of data acquisition with regards to the run of a road provides effective data input for mathematical models that include the effect of horizontal curve parameters on road safety. These could be used to implement more effective ways of improving road safety.


Sign in / Sign up

Export Citation Format

Share Document