How Data Drive Early Word Learning: A Cross-Linguistic Waiting Time Analysis

The extent to which word learning is delayed by maturation as opposed to accumulating data is a longstanding question in language acquisition. Further, the precise way in which data influence learning on a large scale is unknown—experimental results reveal that children can rapidly learn words from single instances as well as by aggregating ambiguous information across multiple situations. We analyze Wordbank, a large cross-linguistic dataset of word acquisition norms, using a statistical waiting time model to quantify the role of data in early language learning, building off Hidaka ( 2013 ). We find that the model both fits and accurately predicts the shape of children’s growth curves. Further analyses of model parameters suggest a primarily data-driven account of early word learning. The parameters of the model directly characterize both the amount of data required and the rate at which informative data occurs. With high statistical certainty, words require on the order of ∼ 10 learning instances, which occur on average once every two months. Our method is extremely simple, statistically principled, and broadly applicable to modeling data-driven learning effects in development.

Download Full-text

Study on the Assembling Regularity of Passengers at Large-Scale Railway Station

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.253-255.1231 ◽

2012 ◽

Vol 253-255 ◽

pp. 1231-1234

Author(s):

Le Huang ◽

Yong Bo Lv ◽

Yuan Ren

Keyword(s):

Normal Distribution ◽

Waiting Time ◽

Large Scale ◽

Distribution Theory ◽

Distribution Model ◽

Time Model ◽

Railway Station ◽

Normal Distribution Model

This paper based on the surveys of Beijing West Railway Station, Guangzhou Railway station, Xi'an Railway Station, Zhengzhou Railway Station, Jinan Railway station and Shangqiu Railway station. Looking over the tickets of passengers and recording the time when they enter into the stations, a waiting time model is established on the basis of logarithmic normal distribution theory. The number of passengers assembling in a station is found after the logarithmic normal distribution model is used to deal with the rule of passenger arriving and the uniform moving law is applied in passenger departing.

Download Full-text

Quantifying Lexical Ambiguity in Speech To and From English-Learning Children

10.31234/osf.io/zxkm2 ◽

2021 ◽

Author(s):

Stephan Meylan ◽

Jessica Mankewitz ◽

Sammy Floyd ◽

Hugh Rabagliati ◽

Mahesh Srinivasan

Keyword(s):

Language Learning ◽

Word Learning ◽

Large Scale ◽

First Language ◽

Small Sample ◽

English Learning ◽

Ambiguous Words ◽

Multiple Meanings ◽

Small Sample Sizes ◽

Word Senses

Because words have multiple meanings, language users must often choose appropriate meanings according to the context of use. How this potential ambiguity affects first language learning, especially word learning, is unknown. Here, we present the first large-scale study of how children are exposed to, and themselves use, ambiguous words in their actual language learning environments. We tag 180,000 words in two longitudinal child language corpora with word senses from WordNet, focusing between 9 and 51 months and limiting to words from a popular parental vocabulary report. We then compare the diversity of sense usage in adult speech around children to that observed in a sample of adult-directed language, as well as the diversity of sense usage in children's own productions. To accomplish this we use a Bayesian model-based estimate of sense entropy, a measure of diversity that takes into account uncertainty inherent in small sample sizes. This reveals that sense diversity in caregivers' speech to children is similar to that observed in a sample of adult-directed written material, and that children' use of nouns --- but not verbs --- is similarly diverse to that of adults. Finally, we show that sense entropy is a significant predictor of vocabulary development: children begin to produce words with a higher diversity of adult sense usage at later ages. We discuss the implications of our findings for theories of word learning.

Download Full-text

Vocal Communications of a Developmentally Delayed Child

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.1802.112 ◽

1987 ◽

Vol 18 (2) ◽

pp. 112-130

Author(s):

Mary Ann Romski ◽

Sharon Ellis Joyner ◽

Rose A. Sevcik

Keyword(s):

Language Learning ◽

Developmental Delays ◽

Developmentally Delayed ◽

Language Impaired ◽

Diary Data ◽

Word Acquisition ◽

Diary Studies ◽

Usage Patterns ◽

Vocal Communications ◽

Impaired Children

Studies of first-word acquisition in typical language-learning children frequently take the form of diary studies. Comparable diary data from language-impaired children with developmental delays, however, are not currently available. This report describes the spontaneous vocalizations of a child with a developmental delay for 14 months, from the time he was age 6:5 to age 7:7. From a corpus of 285 utterances, 47 phonetic forms were identified and categorized. Analysis focused on semantic, communicative, and phonological usage patterns.

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text

Why ability point estimates can be pointless: a primer on using skill measures from large-scale assessments in secondary analyses

Measurement Instruments for the Social Sciences ◽

10.1186/s42409-020-00020-5 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Clemens M. Lechner ◽

Nivedita Bhaktha ◽

Katharina Groskurth ◽

Matthias Bluemke

Keyword(s):

Measurement Error ◽

Statistical Models ◽

Test Scores ◽

Large Scale ◽

Equation Modeling ◽

Model Parameters ◽

Advantages And Disadvantages ◽

Point Estimates ◽

Secondary Analyses ◽

Large Scale Assessments

AbstractMeasures of cognitive or socio-emotional skills from large-scale assessments surveys (LSAS) are often based on advanced statistical models and scoring techniques unfamiliar to applied researchers. Consequently, applied researchers working with data from LSAS may be uncertain about the assumptions and computational details of these statistical models and scoring techniques and about how to best incorporate the resulting skill measures in secondary analyses. The present paper is intended as a primer for applied researchers. After a brief introduction to the key properties of skill assessments, we give an overview over the three principal methods with which secondary analysts can incorporate skill measures from LSAS in their analyses: (1) as test scores (i.e., point estimates of individual ability), (2) through structural equation modeling (SEM), and (3) in the form of plausible values (PVs). We discuss the advantages and disadvantages of each method based on three criteria: fallibility (i.e., control for measurement error and unbiasedness), usability (i.e., ease of use in secondary analyses), and immutability (i.e., consistency of test scores, PVs, or measurement model parameters across different analyses and analysts). We show that although none of the methods are optimal under all criteria, methods that result in a single point estimate of each respondent’s ability (i.e., all types of “test scores”) are rarely optimal for research purposes. Instead, approaches that avoid or correct for measurement error—especially PV methodology—stand out as the method of choice. We conclude with practical recommendations for secondary analysts and data-producing organizations.

Download Full-text

Power-to-Green Methanol via CO2 Hydrogenation—A Concept Study Including Oxyfuel Fluidized Bed Combustion of Biomass

Energies ◽

10.3390/en14154638 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4638

Author(s):

Simon Pratschner ◽

Pavel Skopec ◽

Jan Hrdlicka ◽

Franz Winter

Keyword(s):

Large Scale ◽

Positive Impact ◽

Renewable Energy Sources ◽

Air Separation ◽

Model Parameters ◽

Processing Unit ◽

Fluidized Bed Combustion ◽

Separation Unit ◽

Oxyfuel Combustion ◽

Wind Park

A revolution of the global energy industry is without an alternative to solving the climate crisis. However, renewable energy sources typically show significant seasonal and daily fluctuations. This paper provides a system concept model of a decentralized power-to-green methanol plant consisting of a biomass heating plant with a thermal input of 20 MWth. (oxyfuel or air mode), a CO2 processing unit (DeOxo reactor or MEA absorption), an alkaline electrolyzer, a methanol synthesis unit, an air separation unit and a wind park. Applying oxyfuel combustion has the potential to directly utilize O2 generated by the electrolyzer, which was analyzed by varying critical model parameters. A major objective was to determine whether applying oxyfuel combustion has a positive impact on the plant’s power-to-liquid (PtL) efficiency rate. For cases utilizing more than 70% of CO2 generated by the combustion, the oxyfuel’s O2 demand is fully covered by the electrolyzer, making oxyfuel a viable option for large scale applications. Conventional air combustion is recommended for small wind parks and scenarios using surplus electricity. Maximum PtL efficiencies of ηPtL,Oxy = 51.91% and ηPtL,Air = 54.21% can be realized. Additionally, a case study for one year of operation has been conducted yielding an annual output of about 17,000 t/a methanol and 100 GWhth./a thermal energy for an input of 50,500 t/a woodchips and a wind park size of 36 MWp.

Download Full-text

Word Learning and Word Acquisition

SpringerBriefs in Psychology - Words as Social Tools: An Embodied View on Abstract Concepts ◽

10.1007/978-1-4614-9539-0_4 ◽

2014 ◽

pp. 71-93

Author(s):

Anna M. Borghi ◽

Ferdinand Binkofski

Keyword(s):

Word Learning ◽

Word Acquisition

Download Full-text