The segmentation of speech and its implications for the emergence of language structure

2001 ◽  
Vol 4 (2) ◽  
pp. 161-182 ◽  
Author(s):  
Caroline Lyon ◽  
Bob Dickerson ◽  
Chrystopher L. Nehaniv

This paper reports a phenomenon supporting the hypothesis that the emergence of structure in the evolution of language was a staged process. To develop a grammatical structure it seems necessary to first have discrete constituents which can be the building blocks of a hierarchical system. By analysing observed speech we show that the development of a linear sequence of grammatical constituents has its own advantage, before a possible next stage when constituents are integrated into a hierarchical structure. A stream of speech sounds has to be segmented to allow for breathing. This segmentation has further developed in a certain way that makes it easier for the hearer to decode than if it were not segmented, or if it were segmented in an arbitrary manner. Well known tools from Information Theory are employed to analyse the ease of decoding speech. Segmentation depends on prosodic discontinuities, such as pauses and intonation marked by tone unit boundaries. These discontinuities usually mark groups of words with some syntactic cohesion, such as phrases and clauses. We show that in a modern corpus of spoken language observed segmentation facilitates the effective transfer of information, while lack of segmentation or arbitrary segmentation imposed on a stream of words makes decoding less efficient. This supports the hypothesis that the necessary constituents of a grammatical structure may have evolved as a consequence of developments favouring more efficient decoding of a linear stream of spoken words. The source material for this investigation is taken from the prosodically marked up Machine Readable Spoken English Corpus (MARSEC).

2015 ◽  
Vol 60 (8) ◽  
pp. 12-24
Author(s):  
Renata Bielak ◽  
Ewa Czumaj

The mission of official statistics is to provide credible, reliable, independent and high-quality information on the state and changes in society, the economy and the environment, meet the needs of domestic and foreign users. Fulfillment of this obligation is reflected in the work of the current official statistics and in development activities. Monitoring of the socio-economic development requires continuous adaptation of statistics to the changing reality and the description of the phenomena and processes. The role of official statistics in the modern world goes far beyond the implementation of the research. Measuring sustainable development and information support for development policy are tangible examples of the undertaken challenges. The form and manner of data presentation have increasing importance for the effective transfer of information. To answer the need to improve communication with users, the CSO is implementing new systems of information share. The article describes, among others, ”Strateg” and ”Knowledge Databases”.


Author(s):  
Alexandros Ioannidis-Pantopikos ◽  
Donat Agosti

In the landscape of general-purpose repositories, Zenodo was built at the European Laboratory for Particle Physics' (CERN) data center to facilitate the sharing and preservation of the long tail of research across all disciplines and scientific domains. Given Zenodo’s long tradition of making research artifacts FAIR (Findable, Accessible, Interoperable, and Reusable), there are still challenges in applying these principles effectively when serving the needs of specific research domains. Plazi’s biodiversity taxonomic literature processing pipeline liberates data from publications, making it FAIR via extensive metadata, the minting of a DataCite Digital Object Identifier (DOI), a licence and both human- and machine-readable output provided by Zenodo, and accessible via the Biodiversity Literature Repository community at Zenodo. The deposits (e.g., taxonomic treatments, figures) are an example of how local networks of information can be formally linked to explicit resources in a broader context of other platforms like GBIF (Global Biodiversity Information Facility). In the context of biodiversity taxonomic literature data workflows, a general-purpose repository’s traditional submission approach is not enough to preserve rich metadata and to capture highly interlinked objects, such as taxonomic treatments and digital specimens. As a prerequisite to serve these use cases and ensure that the artifacts remain FAIR, Zenodo introduced the concept of custom metadata, which allows enhancing submissions such as figures or taxonomic treatments (see as an example the treatment of Eurygyrus peloponnesius) with custom keywords, based on terms from common biodiversity vocabularies like Darwin Core and Audubon Core and with an explicit link to the respective vocabulary term. The aforementioned pipelines and features are designed to be served first and foremost using public Representational State Transfer Application Programming Interfaces (REST APIs) and open web technologies like webhooks. This approach allows researchers and platforms to integrate existing and new automated workflows into Zenodo and thus empowers research communities to create self-sustained cross-platform ecosystems. The BiCIKL project (Biodiversity Community Integrated Knowledge Library) exemplifies how repositories and tools can become building blocks for broader adoption of the FAIR principles. Starting with the above literature processing pipeline, the concepts of and resulting FAIR data, with a focus on the custom metadata used to enhance the deposits, will be explained.


Author(s):  
Roy Gelbard ◽  
Israel Spiegler

The research proposes a model for the representation and storage of motion data that enables the communication, storage, and analysis of patterns of motion, as with spoken and written languages. The basic problem is the lack of a machine-readable motion alphabet. We thus set out to define the elemental components and building blocks of motion, coming up with what we call the motion byte as the basis for a motion language that has words, phrases, and sentences. The binary-based model we develop, which is significantly different from the common “key frames” approach, is also a method of storing motion data. Comparison with a standard motion system, based on key frames, indicates a significant advantage for our binary model.


2020 ◽  
Vol 35 (2) ◽  
pp. 61-81
Author(s):  
Jaejoo Lim ◽  
Jim R. Wollscheid ◽  
Ramakrishna Ayyagari

PurposeConsumers often encounter issues of perceived ambiguity and performance risk when attempting to evaluate experience goods being offered online. Sellers try to alleviate this knowledge gap often seen in a medium of low naturalness by engaging in effective compensatory adaptation. This research theoretically looks into three primary aspects of compensatory adaption and their potential in securing communication of high-quality information between the online seller and consumer.Design/methodology/approachUtilizing survey data and structural equation modeling, this study tests the effectiveness of different aspects of compensatory adaption to alleviate the knowledge gap in a medium of low naturalness.FindingsDrawing on media naturalness theory and the tripartite model of attitude, this paper identifies three theoretical components that significantly affect the effectiveness of compensatory adaption. They are information retrieval capability from the cognitive/logical aspect, information richness from the affective/audiovisual aspect and interactivity from the behavioral aspect. The effectiveness of compensatory adaptation proves to have a positive impact on perceived information quality.Originality/valueTo the best of our knowledge, this is the first paper in the information systems literature to examine the compensatory adaptation tools for effective transfer of information. This study contributes to the academics by providing three handles to improve effectiveness of compensatory adaptation toward information quality. We focus on three compensatory adaptation tools in cognitive/logical, affective/audiovisual and behavioral aspects, and this compensation perspective leads to three practical factors that affect effective transfer of information between online sellers and consumers. The result of this study complements the nomological network of the enablers and impediments of e-commerce.


2012 ◽  
Vol 12 (2) ◽  
pp. 247-291 ◽  
Author(s):  
Irit Meir ◽  
Assaf Israel ◽  
Wendy Sandler ◽  
Carol A. Padden ◽  
Mark Aronoff

By comparing two sign languages of approximately the same age but which arose and developed under different social circumstances, we are able to identify possible relationships between social factors and language structure. We argue that two structural properties of these languages are related to the size and the heterogeneity versus homogeneity of their respective communities: use of space in grammatical structure and degree of lexical and sublexical variability. A third characteristic, the tendency toward single-argument clauses appears to be a function of a different social factor: language age. Our study supports the view that language is not just a structure in the brain, nor is it strictly the domain of the individual. It is very much a socio-cultural artifact. Keywords: community and language structure; sign languages; ISL; ABSL; variation; space; argument structure


1993 ◽  
Vol 23 (2) ◽  
pp. 47-54 ◽  
Author(s):  
Peter Roach ◽  
Gerry Knowles ◽  
Tamas Varadi ◽  
Simon Arnfield

The purpose of this paper is to describe a new version of the Spoken English Corpus which will be of interest to phoneticians and other speech scientists. The Spoken English Corpus is a well-known collection of spoken-language texts that was collected and transcribed in the 1980's in a joint project involving IBM UK and the University of Lancaster (Alderson and Knowles forthcoming, Knowles and Taylor 1988). One valuable aspect of it is that the recorded material on which it was based is fairly freely available and the recording quality is generally good. At the time when the recordings were made, the idea of storing all the recorded material in digital form suitable for computer processing was of limited practicality. Although storage on digital tape was certainly feasible, this did not provide rapid computer access. The arrival of optical disk technology, with the possibility of storing very large amounts of digital data on a compact disk at relatively low cost, has brought about a revolution in ideas on database construction and use. It seemed to us that the recordings of the Spoken English Corpus (hereafter SEC) should now be converted into a form which would enable the user to gain access to the acoustic signal without the laborious business of winding through large amounts of tape. Once this was done, we should be able not only to listen to the recordings in a very convenient way, but also to carry out many automatic analyses of the material by computer.


2021 ◽  
Vol 58 (1) ◽  
pp. 94-111
Author(s):  
Dmitry V. Zaitsev ◽  

In this paper, I attempt to offer a general outline of my views on the origin and evolution of language. I do not pretend in any way to a completely new conception of language evolution. It seems to me that all the most important and productive hypotheses about the origin of language have already been made before, and it is only a matter of putting the pieces of the puzzle together correctly. As far as I can see it, the evolution of language is directly related to the embedded and embodied emotional types, which served as the basis for the subsequent categorization of perceived objects, and thus laid the ground for the formation of first an internal language (of thought), and then an external verbal language. Consistent with this, the paper is organized as follows. In the Introduction I briefly describe the problem I am facing in this article and outline a plan for solving it. Next section comprises a survey of relevant empirical findings related primarily to the processing and understanding of abstract terms and concepts. In my view, it supports the idea of the close connection of abstract terms proceeding, and thus language comprehension, with emotional states. The third section provides relevant theoretical considerations of the relationship between emotions, cognition, and language. Consistently considering various theories of emotions and concepts of language formation, I pay attention to the connection between affective states and language as a sign system. In the fourth section, my views are presented directly. In so doing, I illustrate my approach with a telling example that shows how, in the course of evolution, embedded and embodied emotional responses and reactions could become the building blocks first for the internal language of thought, and then for the external natural language.


2016 ◽  
Vol 61 (8) ◽  
pp. 79-90
Author(s):  
Grzegorz Kończak

Computing power in recent decades was increasing steadily. Along with this, rise resources collected and transmitted data sets. Large collections of information need to enforce the selection by senders. At the same time it is not possible to receive all the generated information. The article presents the risks associated with uncritical acceptance of information on economic and social issues. Particularly helpful in the transmission of information to a mass audience may be the method of data presentation. Modern software enables to develop graphical presentation, which, through interaction with the user can contribute to effective dissemination of relevant information on socio-economic conditions.


Author(s):  
Reto Gmür ◽  
Donat Agosti

Taxonomic treatments, sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions, and published in scientific journals, shape our understanding of global biodiversity (Catapano 2019). Treatments are the building blocks of the evolving scientific consensus on taxonomic entities. The semantics of these treatments and their relationships are highly structured: taxa are introduced, merged, made obsolete, split, renamed, associated with specimens and so on. Plazi makes this content available in machine-readable form using Resource Description Framework (RDF) . RDF is the standard model for Linked Data and the Semantic Web. RDF can be exchanged in different formats (aka concrete syntaxes) such as RDF/XML or Turtle. The data model describes graph structures and relies on Internationalized Resource Identifiers (IRIs) , ontologies such as Darwin Core basic vocabulary are used to assign meaning to the identifiers. For Synospecies, we unite all treatments into one large knowledge graph, modelling taxonomic knowledge and its evolution with complete references to quotable treatments. However, this knowledge graph expresses much more than any individual treatment could convey because every referenced entity is linked to every other relevant treatment. On synospecies.plazi.org, we provide a user-friendly interface to find the names and treatments related to a taxon. An advanced mode allows execution of queries using the SPARQL query language.


Author(s):  
Zanna Clay ◽  
Emilie Genty

Our capacity for language is a central aspect of what it means to be human and sets us apart from the rest of the animal kingdom. Given that language does not fossilize, one way to understand how and when it first evolved is to examine the communicative capacities of our closest living relatives, the great apes. This chapter reviews recent research exploring natural communication in our least understood but closest living relative, the bonobo (Pan paniscus). It primarily focuses on what natural bonobo communication can tell us about their underlying social awareness and how this relates to the evolution of language. Examining vocal and gestural communication, we report findings that highlight considerable communicative complexity, flexibility, and intentionality which, cumulatively, suggest that many of the building blocks for language are deeply rooted in our primate past. Notre qualité de langage est un aspect central d’être humain, et nous sépare du reste de l’univers animal. Vu que le langage ne fige pas, les qualités communicatives des grands singes nous peuvent aider à expliquer comment et quand est-ce-que nos langues ont évolué. Ici nous révisons les recherches plus récentes explorant la communication naturelle chez notre plus proche relatif vivant, le bonobo (Pan paniscus). Nous nous concentrons sur ce que la communication naturelle des bonobos nous peut dire à propos de leur conscience sociale et comment cela se rapporte à l’évolution des langues. En examinant la communication vocale et gestuelle, nous signalons des trouvailles qui soulignent la complexité, la flexibilité et l’intentionnalité dans la communication. Ces aspects suggèrent que les fondations de notre langue sont enracinées dans notre passé primate.


Sign in / Sign up

Export Citation Format

Share Document