Combining Data from Multiple Sources to Define a Respondent: The Case of Education Data

AbstractIrradiation of reactor pressure vessel (RPV) steels causes the formation of nanoscale microstructural features (termed radiation damage), which affect the mechanical properties of the vessel. A key tool for characterizing these nanoscale features is atom probe tomography (APT), due to its high spatial resolution and the ability to identify different chemical species in three dimensions. Microstructural observations using APT can underpin development of a mechanistic understanding of defect formation. However, with atom probe analyses there are currently multiple methods for analyzing the data. This can result in inconsistencies between results obtained from different researchers and unnecessary scatter when combining data from multiple sources. This makes interpretation of results more complex and calibration of radiation damage models challenging. In this work simulations of a range of different microstructures are used to directly compare different cluster analysis algorithms and identify their strengths and weaknesses.

Download Full-text

Combining data from multiple sources to study mechanisms of aneurysm disease: Tools and techniques

International Journal for Numerical Methods in Biomedical Engineering ◽

10.1002/cnm.3133 ◽

2018 ◽

Vol 34 (11) ◽

pp. e3133 ◽

Cited By ~ 5

Author(s):

Juan R. Cebral ◽

Fernando Mut ◽

Piyusha Gade ◽

Fangzhou Cheng ◽

Yasutaka Tobe ◽

...

Keyword(s):

Multiple Sources ◽

Combining Data ◽

Tools And Techniques

Download Full-text

Upsetting the agenda: the clout of external focusing events in the European Council

Journal of Public Policy ◽

10.1017/s0143814x15000197 ◽

2015 ◽

Vol 35 (3) ◽

pp. 505-530 ◽

Cited By ~ 2

Author(s):

Petya Alexandrova

Keyword(s):

European Union ◽

Natural Disasters ◽

Large Scale ◽

European Council ◽

The European Union ◽

Multiple Sources ◽

Time And Space ◽

Focusing Events ◽

Combining Data

AbstractFocusing events are sudden, striking large-scale occurrences that attract political attention. However, not all potential focusing events appear on the agenda. Combining data from multiple sources, this study conducts an analysis of the determinants of prioritisation of external focusing events in the European Council over a period longer than two decades. The results demonstrate that decisions regarding the placement of crises on the agenda are underscored by exogenous (humanitarian) and endogenous (geopolitical interest) considerations. Those events with a higher likelihood of agenda access include manmade incidents (versus natural disasters), events with larger death tolls and crises in the neighbourhood. Stronger competition between potential focusing events across time and space reduces the chances of access. The level of attention each event receives depends on purely strategic interests. Focusing events in neighbouring countries gain a higher portion of attention, as do occurrences in states having a larger trade exchange with the European Union.

Download Full-text

DynOmics to identify delays and co-expression patterns across time course experiments

10.1101/076257 ◽

2016 ◽

Author(s):

Jasmin Straube ◽

Bevan Emma Huang ◽

Kim-Anh Lê Cao

Keyword(s):

Time Course ◽

Expression Patterns ◽

Time Integration ◽

Omics Data ◽

Multiple Sources ◽

Combining Data ◽

The Difference ◽

Changes Over Time ◽

High Degree ◽

Molecular Expression

ABSTRACTDynamic changes in biological systems can be captured by measuring molecular expression from different levels (e.g., genes and proteins) across time. Integration of such data aims to identify molecules that show similar expression changes over time; such molecules may be co-regulated and thus involved in similar biological processes. Combining data sources presents a systematic approach to study molecular behaviour. It can compensate for missing data in one source, and can reduce false positives when multiple sources highlight the same pathways. However, integrative approaches must accommodate the challenges inherent in ‘omics’ data, including high-dimensionality, noise, and timing differences in expression. As current methods for identification of co-expression cannot cope with this level of complexity, we developed a novel algorithm called DynOmics. DynOmics is based on the fast Fourier transform, from which the difference in expression initiation between trajectories can be estimated. This delay can then be used to realign the trajectories and identify those which show a high degree of correlation. Through extensive simulations, we demonstrate that DynOmics is efficient and accurate compared to existing approaches. We consider two case studies highlighting its application, identifying regulatory relationships across ‘omics’ data within an organism and for comparative gene expression analysis across organisms.

Download Full-text

Combining data from multiple sources, with applications to environmental risk assessment

Statistics in Medicine ◽

10.1002/sim.3053 ◽

2008 ◽

Vol 27 (5) ◽

pp. 698-710 ◽

Cited By ~ 13

Author(s):

Louise Ryan

Keyword(s):

Risk Assessment ◽

Environmental Risk ◽

Environmental Risk Assessment ◽

Multiple Sources ◽

Combining Data

Download Full-text

Error in geometric morphometric data collection: Combining data from multiple sources

American Journal of Physical Anthropology ◽

10.1002/ajpa.23257 ◽

2017 ◽

Vol 164 (1) ◽

pp. 62-75 ◽

Cited By ~ 26

Author(s):

Chris Robinson ◽

Claire E. Terhune

Keyword(s):

Data Collection ◽

Morphometric Data ◽

Multiple Sources ◽

Geometric Morphometric ◽

Combining Data

Download Full-text

Combining data from multiple sources to design a raptor census - the first national survey of the Montagu’s Harrier Circus pygargus in Poland

Bird Conservation International ◽

10.1017/s0959270917000235 ◽

2017 ◽

Vol 28 (3) ◽

pp. 350-362 ◽

Cited By ~ 2

Author(s):

NATALIA KRÓLIKOWSKA ◽

DOMINIK KRUPIŃSKI ◽

LECHOSŁAW KUCZYŃSKI

Keyword(s):

Population Size ◽

Large Scale ◽

The European Union ◽

Multiple Sources ◽

Circus Pygargus ◽

Combining Data ◽

Montagu’S Harrier ◽

The Status ◽

Using Data ◽

Montagu's Harrier

SummaryThe effective conservation management of vulnerable taxa requires up-to-date evaluation of population size. Montagu’s Harrier Circus pygargus is a farmland raptor of high conservation concern and threatened by agricultural intensification. However, within many European countries, including Poland, the status of this species remains unknown or questionable and information on its breeding is incomplete or imprecise. Here, we estimate the size of the national population of the Montagu’s Harrier and argue that using data from multiple sources may help to design national bird surveys and better contribute to identifying population trends. We built a predictive model based on a presence-absence data obtained by volunteer-based citizen-science projects conducted in Poland during 2000–2012. Afterwards, from the set of 10 km x 10 km squares of high predicted habitat suitability, 100 sampling plots were randomly chosen and regularly surveyed by experienced ornithologists in 2013 and 2014. The evaluation of fieldwork efficiency by the double-observer approach allowed detectability to be estimated and accounted for while estimating population size. We estimated the Polish Montagu’s Harrier population at almost 3,400 breeding pairs (95% CI: 2,700–4,300), thus constituting 20% of the European Union (EU) population. Furthermore, we showed that public-gathered data originating from multiple sources offered great potential for regular surveys to obtain large-scale estimates of population size.

Download Full-text

Understanding the coverage of Statistics New Zealand's Integrated Data Infrastructure

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.222 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Gareth Minshall ◽

Antony Gomez ◽

Bycroft Christine

Keyword(s):

New Zealand ◽

Census Data ◽

Target Population ◽

Data Sources ◽

Multiple Sources ◽

Unique Identifier ◽

Data Infrastructure ◽

Individual Level ◽

Education Data ◽

The Impact

ABSTRACTObjectivesStatistics New Zealand’s Integrated Data Infrastructure (IDI) combines information from a range of government agencies (such as tax, health and education data) in order to provide the insights government needs to improve social and economic outcomes for New Zealanders. New Zealand has no national population register or unique identifier used in common across these multiple data sources, and probabilistic linkages are a feature of the IDI. A challenge for researchers is to understand the impact of linkage errors and coverage issues present in the linked data, and to develop the rules necessary to define their target population. We outline the statistical infrastructure Statistics New Zealand is developing to help researchers navigate these issues. ApproachA method has been developed to identify NZ residents at a given time from the much larger number of individuals present in the IDI. Census data linked to the IDI offers insight into the coverage of key population groups and the quality of the attribute information held in the IDI (e.g. location and ethnicity). We are assessing ways that Statistics New Zealand could use these findings to assist researchers in forming their population of interest and assess the potential for bias. ResultsThe derived administrative resident population is compared with the official population figures and patterns of under- and over-coverage are identified at an aggregate, and individual level. Some coverage discrepancies may be improved through reducing linkage errors. Comparison with census data reveals some significant quality issues with location and ethnicity variables in administrative collections. Work is underway to improve methods for combining information from multiple sources of varying quality. ConclusionIdentifying NZ residents at a given time, and quantifying errors in administrative data sources will assist researchers ability to recognise and adjust for these errors in their analysis. Simply quantifying (often for the first time) the limitations of administrative sources also provides impetus to improving the collection of these variables at source.

Download Full-text

Language-agnostic integrated queries in a managed polyglot runtime

Proceedings of the VLDB Endowment ◽

10.14778/3457390.3457405 ◽

2021 ◽

Vol 14 (8) ◽

pp. 1414-1426

Author(s):

Filippo Schiavio ◽

Daniele Bonetta ◽

Walter Binder

Keyword(s):

Programming Languages ◽

Type System ◽

Database Systems ◽

General Purpose ◽

Multiple Sources ◽

Server Side ◽

Query Engine ◽

Combining Data ◽

Dynamic Languages ◽

Memory Object

Language-integrated query (LINQ) frameworks offer a convenient programming abstraction for processing in-memory collections of data, allowing developers to concisely express declarative queries using general-purpose programming languages. Existing LINQ frameworks rely on the well-defined type system of statically-typed languages such as C # or Java to perform query compilation and execution. As a consequence of this design, they do not support dynamic languages such as Python, R, or JavaScript. Such languages are however very popular among data scientists, who would certainly benefit from LINQ frameworks in data analytics applications. In this work we bridge the gap between dynamic languages and LINQ frameworks. We introduce DynQ, a novel query engine designed for dynamic languages. DynQ is language-agnostic, since it is able to execute SQL queries in a polyglot language runtime. Moreover, DynQ can execute queries combining data from multiple sources, namely in-memory object collections as well as on-file data and external database systems. Our evaluation of DynQ shows performance comparable with equivalent hand-optimized code, and in line with common data-processing libraries and embedded databases, making DynQ an appealing query engine for standalone analytics applications and for data-intensive server-side workloads.

Download Full-text

Progression of Cognitive-Affective States During Learning in Kindergarteners: Bringing Together Physiological, Observational and Performance Data

Education Sciences ◽

10.3390/educsci10070177 ◽

2020 ◽

Vol 10 (7) ◽

pp. 177

Author(s):

Priyashri Kamlesh Sridhar ◽

Suranga Nanayakkara

Keyword(s):

Data Sources ◽

Physiological Data ◽

Affective States ◽

Multiple Sources ◽

Multiple Data ◽

Combining Data ◽

And Performance ◽

And Mathematics ◽

Future Work ◽

Physiological Markers

It has been shown that combining data from multiple sources, such as observations, self-reports, and performance with physiological markers offers better insights into cognitive-affective states during the learning process. Through a study with 12 kindergarteners, we explore the role of utilizing insights from multiple data sources, as a potential arsenal to supplement and complement existing assessments methods in understanding cognitive-affective states across two main pedagogical approaches—constructionist and instructionist—as children explored learning a chosen Science, Technology, Engineering and Mathematics (STEM) concept. We present the trends that emerged across pedagogies from different data sources and illustrate the potential value of additional data channels through case illustrations. We also offer several recommendations for such studies, particularly when collecting physiological data, and summarize key challenges that provide potential avenues for future work.

Download Full-text