Building the evidence

Author(s):  
Marilyn May Vihman

This chapter presents data from six children learning American English at two developmental points: first word use and the end of the single-word period, when templates typically first begin to be identifiable. The chapter lays out procedures for identifying prosodic structures and variants and also consonant inventories, which give insight into the child’s resources for word production. Analysis of the most frequently used prosodic structures is followed by an analysis of each child’s data to permit template identification, based primarily on high proportionate use and adaptation. A developmental comparison of the two data sets shows continued reliance, by all the children, on the default or simplest CV structure, but advances in use of one- and two-syllable structures with codas. Consonant variegation is found to be the single greatest challenge for early word formation.

2018 ◽  
Vol 6 (2) ◽  
pp. 99-115
Author(s):  
Borislav Marušić ◽  
Sanda Katavić-Čaušić

Abstract The aim of this paper is to research the word class adjective in one sequence of the ESP: Business English, more precisely English business magazines online. It is an empirical study on the corpus taken from a variety of business magazines online. The empirical analysis allows a comprehensive insight into the word class adjective in this variety of Business English and makes its contribution to English syntax, semantics and word formation. The syntactic part analyses the adjective position in the sentence. The semantic part of the study identifies the most common adjectives that appear in English business magazines online. Most of the analysis is devoted to the word formation of the adjectives found in the corpus. The corpus is analysed in such a way that it enables its division into compounds, derivatives and conversions. The results obtained in this way will give a comprehensive picture of the word class adjective in this type of Business English and can act as a starting point for further research of the word class adjective.


Phonetica ◽  
2021 ◽  
Vol 78 (1) ◽  
pp. 65-94
Author(s):  
Katsura Aoyama ◽  
Barbara L. Davis

Abstract The aim of the present study was to investigate relationships between characteristics of children’s target words and their actual productions during the single-word period in American English. Word productions in spontaneous and functional speech from 18 children acquiring American English were analyzed. Consonant sequences in 3,328 consonant-vowel-consonant (C1VC2) target words were analyzed in terms of global place of articulation (labials, coronals, and dorsals). Children’s actual productions of place sequences were compared between target words containing repeated place sequences (e.g., mom, map, dad, not) and target words containing variegated place sequences (e.g., mat, dog, cat, nap). Overall, when the target word contained two consonants at the same global place of articulation (e.g., labial-labial, map; coronal-coronal, not), approximately 50% of children’s actual productions matched consonant place characteristics. Conversely, when the target word consisted of variegated place sequences (e.g., mat, dog, cat, nap), only about 20% of the productions matched the target consonant sequences. These results suggest that children’s actual productions are influenced by their own production abilities as well as by the phonetic forms of target words.


2020 ◽  
Author(s):  
Kary Ocaña ◽  
Micaella Coelho ◽  
Guilherme Freire ◽  
Carla Osthoff

Bayesian phylogenetic algorithms are computationally intensive. BEAST 1.10 inferences made use of the BEAGLE 3 high-performance library for efficient likelihood computations. The strategy allows phylogenetic inference and dating in current knowledge for SARS-CoV-2 transmission. Follow-up simulations on hybrid resources of Santos Dumont supercomputer using four phylogenomic data sets, we characterize the scaling performance behavior of BEAST 1.10. Our results provide insight into the species tree and MCMC chain length estimation, identifying preferable requirements to improve the use of high-performance computing resources. Ongoing steps involve analyzes of SARS-CoV-2 using BEAST 1.8 in multi-GPUs.


2020 ◽  
Author(s):  
Garrett Stubbings ◽  
Spencer Farrell ◽  
Arnold Mitnitski ◽  
Kenneth Rockwood ◽  
Andrew Rutenberg

AbstractFrailty indices (FI) based on continuous valued health data, such as obtained from blood and urine tests, have been shown to be predictive of adverse health outcomes. However, creating FI from such biomarker data requires a binarization treatment that is difficult to standardize across studies. In this work, we explore a “quantile” methodology for the generic treatment of biomarker data that allows us to construct an FI without preexisting medical knowledge (i.e. risk thresholds) of the included biomarkers. We show that our quantile approach performs as well as, or even slightly better than, established methods for the National Health and Nutrition Examination Survey (NHANES) and the Canadian Study of Health and Aging (CSHA) data sets. Furthermore, we show that our approach is robust to cohort effects within studies as compared to other data-based methods. The success of our binarization approaches provides insight into the robustness of the FI as a health measure, the upper limits of the FI observed in various data sets, and highlights general difficulties in obtaining absolute scales for comparing FI between studies.


2018 ◽  
Author(s):  
Brian Hie ◽  
Bryan Bryson ◽  
Bonnie Berger

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.


Author(s):  
O. Majgaonkar ◽  
K. Panchal ◽  
D. Laefer ◽  
M. Stanley ◽  
Y. Zaki

Abstract. Classifying objects within aerial Light Detection and Ranging (LiDAR) data is an essential task to which machine learning (ML) is applied increasingly. ML has been shown to be more effective on LiDAR than imagery for classification, but most efforts have focused on imagery because of the challenges presented by LiDAR data. LiDAR datasets are of higher dimensionality, discontinuous, heterogenous, spatially incomplete, and often scarce. As such, there has been little examination into the fundamental properties of the training data required for acceptable performance of classification models tailored for LiDAR data. The quantity of training data is one such crucial property, because training on different sizes of data provides insight into a model’s performance with differing data sets. This paper assesses the impact of training data size on the accuracy of PointNet, a widely used ML approach for point cloud classification. Subsets of ModelNet ranging from 40 to 9,843 objects were validated on a test set of 400 objects. Accuracy improved logarithmically; decelerating from 45 objects onwards, it slowed significantly at a training size of 2,000 objects, corresponding to 20,000,000 points. This work contributes to the theoretical foundation for development of LiDAR-focused models by establishing a learning curve, suggesting the minimum quantity of manually labelled data necessary for satisfactory classification performance and providing a path for further analysis of the effects of modifying training data characteristics.


Author(s):  
Antje S. Meyer ◽  
Eva Belke

Current models of word form retrieval converge on central assumptions. They all distinguish between morphological, phonological, and phonetic representations and processes; they all assume morphological and phonological decomposition, and agree on the main processing units at these levels. In addition, all current models of word form postulate the same basic retrieval mechanisms: activation and selection of units. Models of word production often distinguish between processes concerning the selection of a single word unit from the mental lexicon and the retrieval of the associated word form. This article explores lexical selection and word form retrieval in language production. Following the distinctions in linguistic theory, it discusses morphological encoding, phonological encoding, and phonetic encoding. The article also considers the representation of phonological knowledge, building of phonological representations, segmental retrieval, retrieval of metrical information, generating the phonetic code of words, and a model of word form retrieval.


Author(s):  
Marilyn May Vihman

This chapter presents data from four to eight children each learning one of six languages, British English, Estonian, Finnish, French, Italian, and Welsh. As a basis for cross-linguistic comparison the chapter first considers similarities and differences in the target forms of the first words of these children. It then presents the children’s later prosodic structures, including American English in the comparison. The chapter considers the development changes apparent from comparing the first words with the later structures and quantifies the extent of variegation in first word targets and later child word forms. In concluding, it is found that common resources are strongly in evidence in the first words but by the later point there is good evidence of ambient language influence as well as of individual differences within the groups.


Author(s):  
Susan Reichelt

Abstract This study explores marked affixation as a possible cue for characterization in scripted television dialogue. The data used here is the newly compiled TV Corpus, which encompasses over 265 million words in its North American English context. An initial corpus-based analysis quantifies the innovative use of affixes in word-formation processes across the corpus to allow for comparison with a following character analysis, which investigates how derivational word-formation supports characterization patterns within a specific series, Buffy the Vampire Slayer. For this, a list of productive prefixes (e.g. de-, un-) and suffixes (e.g. -y, -ish) is used to elicit relevant contexts. The study thus combines two approaches to word-formation processes in scripted contexts. On a large scale, it shows how derivational neologisms are spread across TV dialogue and on a much smaller scale, it highlights particular instances where these neologisms are used to aid character construction.


Sign in / Sign up

Export Citation Format

Share Document