Analysis and lessons identified on critical infrastructures and dependencies from an empirical data set

Abstract Georeferenced digital trace data offer unprecedented flexibility in migration estimation. Because of their high temporal granularity, many migration estimates can be generated from the same data set by changing the definition parameters. Yet despite the growing application of digital trace data to migration research, strategies for taking advantage of their temporal granularity remain largely underdeveloped. In this paper, we provide a general framework for converting digital trace data into estimates of migration transitions and for systematically analyzing their variation along a quasi-continuous time scale, analogous to a survival function. From migration theory, we develop two simple hypotheses regarding how we expect our estimated migration transition functions to behave. We then test our hypotheses on simulated data and empirical data from three platforms in two internal migration contexts: geotagged Tweets and Gowalla check-ins in the United States, and cell-phone call detail records in Senegal. Our results demonstrate the need for evaluating the internal consistency of migration estimates derived from digital trace data before using them in substantive research. At the same time, however, common patterns across our three empirical data sets point to an emergent research agenda using digital trace data to study the specific functional relationship between estimates of migration and time and how this relationship varies by geography and population characteristics.

Download Full-text

Authentic Empathy: A Cultural Basis for the Development of Empathy in Children

Journal of Humanistic Psychology ◽

10.1177/0022167820934222 ◽

2020 ◽

pp. 002216782093422

Author(s):

Tracey Woolrych ◽

Michelle J. Eady ◽

Corinne A. Green

Keyword(s):

Thematic Analysis ◽

Perspective Taking ◽

Empirical Data ◽

Primary Schools ◽

Research Paradigm ◽

Top Down ◽

Data Set ◽

Cultural Elements ◽

Rich Data ◽

Gain Access

Culture is important for the development of social skills in children, including empathy. Although empathy has long been linked with prosocial behaviors and attitudes, there is little research that links culture with development of empathy in children. This project sought to investigate and identify specific culturally related empathy elements in a sample of Dene and Inuit children from Northern Canada. Across seven different grade (primary) schools, 92 children aged 7 to 9 years participated in the study. Children’s drawings, and interviews about those pictures, were uniquely employed as empirical data which allowed researchers to gain access to the children’s perspective about what aspects of culture were important to them. Using empathy as the theoretical framework, a thematic analysis was conducted in a top-down deductive approach. The research paradigm elicited a rich data set revealing three major themes: sharing; knowledge of self and others; and acceptance of differences. The identified themes were found to have strong links with empathy constructs such as sharing, helping, perspective-taking, and self–other knowledges, revealing the important role that culture may play in the development of empathy. Findings from this study can help researchers explore and identify specific cultural elements that may contribute to the development of empathy in children.

Download Full-text

Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Social Science Computer Review ◽

10.1177/0894439319888445 ◽

2019 ◽

pp. 089443931988844

Author(s):

Ranjith Vijayakumar ◽

Mike W.-L. Cheung

Keyword(s):

Machine Learning ◽

Empirical Data ◽

Fixed Effects ◽

Predictive Accuracy ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Replication Studies ◽

Machine Learning Methods ◽

Accuracy Measure

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

Download Full-text

Throwing the baby out with the bathwater: Problems in modeling aggregated eye-movement data

Behavioral and Brain Sciences ◽

10.1017/s0140525x03280109 ◽

2003 ◽

Vol 26 (4) ◽

pp. 482-483 ◽

Cited By ~ 2

Author(s):

Gary Feng

Keyword(s):

Eye Movement ◽

Empirical Data ◽

Data Set ◽

Severe Problem ◽

Movement Data ◽

Parameter Values

Parameters in E-Z Reader models are estimated on the basis of a simple data set consisting of 30 means. Because of heavy aggregation, the data have a severe problem of multicolinearity and are unable to adequately constrain parameter values. This could give the model more power than the empirical data warrant. Future models should exploit the richness of eye movement data and avoid excessive aggregation.

Download Full-text

Computing the Internode Certainty and related measures from partial gene trees

10.1101/022053 ◽

2015 ◽

Cited By ~ 2

Author(s):

Kassian Kobert ◽

Leonidas Salichos ◽

Antonis Rokas ◽

Alexandros Stamatakis

Keyword(s):

Empirical Data ◽

Data Sets ◽

Gene Trees ◽

Data Set ◽

Reference Tree ◽

Full Species

AbstractWe present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also include trees containing the full species set.

Download Full-text

A prior-based approach for hypothesis comparison and its utility to discern among temporal scenarios of divergence

10.1101/302539 ◽

2018 ◽

Cited By ~ 1

Author(s):

Eugenia Zarza ◽

Robert B. O’Hara ◽

Annette Klussmann-Kolb ◽

Markus Pfenninger

Keyword(s):

Dna Sequence ◽

Empirical Data ◽

Sequence Length ◽

Path Sampling ◽

Historical Events ◽

Data Set ◽

Stepping Stone ◽

Biogeographic Patterns ◽

Correct Hypothesis ◽

Competing Hypotheses

AbstractOne of the major problems in evolutionary biology is to elucidate the relationships between historical events and the tempo and mode of lineage divergence. The development of relaxed molecular clock models and the increasing availability of DNA sequences resulted in more accurate estimations of taxa divergence times. However, finding the link between competing historical events and divergence is still challenging. Here we investigate assigning constrained-age priors to nodes of interest in a time-calibrated phylogeny as a means of hypothesis comparison. These priors are equivalent to historic scenarios for lineage origin. The hypothesis that best explains the data can be selected by comparing the likelihood values of the competing hypotheses, modelled with different priors. A simulation approach was taken to evaluate the performance of the prior-based method and to compare it with an unconstrained approach. We explored the effect of DNA sequence length and the temporal placement and span of competing hypotheses (i.e. historic scenarios) on selection of the correct hypothesis and the strength of the inference. Competing hypotheses were compared applying a posterior simulation analogue of the Akaike Information Criterion and Bayes factors (obtained after calculation of the marginal likelihood with three estimators: Harmonic Mean, Stepping Stone and Path Sampling). We illustrate the potential application of the prior-based method on an empirical data set to compare competing geological hypotheses explaining the biogeographic patterns in Pleurodeles newts. The correct hypothesis was selected on average 89% times. The best performance was observed with DNA sequence length of 3500-10000 bp. The prior-based method is most reliable when the hypotheses compared are not temporally too close. The strongest inferences were obtained when using the Stepping Stone and Path Sampling estimators. The prior-based approach proved effective in discriminating between competing hypotheses when used on empirical data. The unconstrained analyses performed well but it probably requires additional computational effort. Researchers applying this approach should rely only on inferences with moderate to strong support. The prior-based approach could be applied on biogeographical and phylogeographical studies where robust methods for historical inferences are still lacking.

Download Full-text

Empirical Analysis of Transformers in the Development of a Storyboarding Methodology

Volume 5: 35th Design Automation Conference, Parts A and B ◽

10.1115/detc2009-87420 ◽

2009 ◽

Cited By ~ 2

Author(s):

Dennis Wang ◽

Rachel Kuhr ◽

Kristen Kaufman ◽

Richard Crawford ◽

Kristin L. Wood ◽

...

Keyword(s):

Empirical Analysis ◽

Empirical Data ◽

Design Theory ◽

Consumer Products ◽

New Method ◽

Data Set ◽

Multiple Functions ◽

The Creation ◽

Change State

Transforming products, or more generally transformers, are devices that change state in order to facilitate new, or enhance an existing, functionality. Mechanical transformers relate to products that reconfigure and can be advantageous by providing multiple functions, while often conserving space. A basic example is a foldable chair that can be stowed when not in use, but provides ergonomic and structural seating when deployed. Utilizing transformation can also lead to novel designs that combine functions across domains, such as an amphibious vehicle that provides both terrestrial and aquatic transportation. In order to harness these assets of transformation, the Transformational Design Theory [1] was developed. This theory outlines a set of principles and facilitators that describe and embody transformation for the purpose of systematically assisting the design of transformers. To build on this theory, this paper analyzes a repository of popular transformer toys. Transformer toys are chosen for this study because of their richness in displaying a variety of kinematic aspects of transformation. Through this process, new definitions to describe transformation are garnered and a set of guidelines are developed to further aid designers. The empirical data set of transformer toys is rich in information and provides a basis for application to other fields, such as robotics and consumer products. These insights, in conjunction with the use of storyboarding, create a new method of designing transformers. This paper presents the method and concludes with a validation exercise in the creation of a new transformer toy.

Download Full-text

The Modified Baumol Equation: Theory and Evidence

Review of European Studies ◽

10.5539/res.v10n1p25 ◽

2018 ◽

Vol 10 (1) ◽

pp. 25

Author(s):

Tchai Tavor ◽

Limor D. Gonen ◽

Michal Weber ◽

Uriel Spiegel

Keyword(s):

Empirical Data ◽

Income Elasticity ◽

Cost Minimization ◽

High Income ◽

Point Of View ◽

Original Equation ◽

Full Time ◽

Demand For Money ◽

Data Set ◽

Luxury Goods

Baumol developed an equation of demand for money for the transaction motive. It is affected positively by cost per withdrawal and negatively by the interest loss resulting from holding cash. The present paper suggests modifying the basic and simplified Baumol approach by adding another element to the transaction equation. Availability of cash encourages spontaneous purchases resulting in customer losses. Through cost minimization with respect to three elements instead of two as in the original Baumol equation, a new modified Baumol equation was created. It is examined by using an empirical data set and the results support the modified version of the Baumol equation. Customers respond positively to cash availability when they spend more on luxury goods. This is prominent especially among unmarried and most likely young customers. Due to high-income elasticity, spontaneous purchasing is higher among wealthier customers and full-time workers who maintain a steady and secure employment position. Since such customers have a weakness for spontaneously and sometimes even carelessly buying luxury items, from their point of view they create a good and efficient buffer by decreasing the available cash in hand and thereby reducing or possibly even preventing their wasteful behavior. The new version is robust and statistically more significant than the original equation presented in 1952.

Download Full-text

Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference

Molecular Biology and Evolution ◽

10.1093/molbev/msz291 ◽

2019 ◽

Vol 37 (4) ◽

pp. 1202-1210 ◽

Cited By ~ 6

Author(s):

David A Duchêne ◽

K Jun Tong ◽

Charles S P Foster ◽

Sebastián Duchêne ◽

Robert Lanfear ◽

...

Keyword(s):

Empirical Data ◽

Sequence Data ◽

Branch Length ◽

Data Sets ◽

Gene Trees ◽

Data Set ◽

Branch Lengths ◽

Statistical Support ◽

Distinct Branch ◽

Consistent Support

Abstract Evolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.

Download Full-text

Delineating the Average Rate of Change in Longitudinal Models

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998607306074 ◽

2008 ◽

Vol 33 (3) ◽

pp. 307-332 ◽

Cited By ~ 5

Author(s):

Ken Kelley ◽

Scott E. Maxwell

Keyword(s):

Longitudinal Data ◽

Empirical Data ◽

Average Rate ◽

Rate Of Change ◽

Change Model ◽

Individual Change ◽

Longitudinal Models ◽

Data Set ◽

Straight Line ◽

Two Measures

The average rate of change is a concept that has been misunderstood in the literature. This article attempts to clarify the concept and show unequivocally the mathematical definition and meaning of the average rate of change in longitudinal models. The slope from the straight-line change model has at times been interpreted as if it were always the average rate of change. It is shown, however, that this is generally not the case and holds true in only a limited number of situations. General equations are presented for two measures of discrepancy when the slope from the straight-line change model is used to estimate the average rate of change. The importance of fitting an appropriate individual change model is discussed, as are the benefits provided by models nonlinear in their parameters for longitudinal data. An empirical data set is used to illustrate the analytic developments.

Download Full-text