Combining Data from Multiple Sources to Define a Respondent: The Case of Education Data

Author(s):  
Peter Siegel ◽  
Darryl Creel, ◽  
James Chromy
2017 ◽  
Vol 23 (2) ◽  
pp. 366-375 ◽  
Author(s):  
Jonathan M. Hyde ◽  
Gérald DaCosta ◽  
Constantinos Hatzoglou ◽  
Hannah Weekes ◽  
Bertrand Radiguet ◽  
...  

AbstractIrradiation of reactor pressure vessel (RPV) steels causes the formation of nanoscale microstructural features (termed radiation damage), which affect the mechanical properties of the vessel. A key tool for characterizing these nanoscale features is atom probe tomography (APT), due to its high spatial resolution and the ability to identify different chemical species in three dimensions. Microstructural observations using APT can underpin development of a mechanistic understanding of defect formation. However, with atom probe analyses there are currently multiple methods for analyzing the data. This can result in inconsistencies between results obtained from different researchers and unnecessary scatter when combining data from multiple sources. This makes interpretation of results more complex and calibration of radiation damage models challenging. In this work simulations of a range of different microstructures are used to directly compare different cluster analysis algorithms and identify their strengths and weaknesses.


2018 ◽  
Vol 34 (11) ◽  
pp. e3133 ◽  
Author(s):  
Juan R. Cebral ◽  
Fernando Mut ◽  
Piyusha Gade ◽  
Fangzhou Cheng ◽  
Yasutaka Tobe ◽  
...  

2015 ◽  
Vol 35 (3) ◽  
pp. 505-530 ◽  
Author(s):  
Petya Alexandrova

AbstractFocusing events are sudden, striking large-scale occurrences that attract political attention. However, not all potential focusing events appear on the agenda. Combining data from multiple sources, this study conducts an analysis of the determinants of prioritisation of external focusing events in the European Council over a period longer than two decades. The results demonstrate that decisions regarding the placement of crises on the agenda are underscored by exogenous (humanitarian) and endogenous (geopolitical interest) considerations. Those events with a higher likelihood of agenda access include manmade incidents (versus natural disasters), events with larger death tolls and crises in the neighbourhood. Stronger competition between potential focusing events across time and space reduces the chances of access. The level of attention each event receives depends on purely strategic interests. Focusing events in neighbouring countries gain a higher portion of attention, as do occurrences in states having a larger trade exchange with the European Union.


2016 ◽  
Author(s):  
Jasmin Straube ◽  
Bevan Emma Huang ◽  
Kim-Anh Lê Cao

ABSTRACTDynamic changes in biological systems can be captured by measuring molecular expression from different levels (e.g., genes and proteins) across time. Integration of such data aims to identify molecules that show similar expression changes over time; such molecules may be co-regulated and thus involved in similar biological processes. Combining data sources presents a systematic approach to study molecular behaviour. It can compensate for missing data in one source, and can reduce false positives when multiple sources highlight the same pathways. However, integrative approaches must accommodate the challenges inherent in ‘omics’ data, including high-dimensionality, noise, and timing differences in expression. As current methods for identification of co-expression cannot cope with this level of complexity, we developed a novel algorithm called DynOmics. DynOmics is based on the fast Fourier transform, from which the difference in expression initiation between trajectories can be estimated. This delay can then be used to realign the trajectories and identify those which show a high degree of correlation. Through extensive simulations, we demonstrate that DynOmics is efficient and accurate compared to existing approaches. We consider two case studies highlighting its application, identifying regulatory relationships across ‘omics’ data within an organism and for comparative gene expression analysis across organisms.


2017 ◽  
Vol 28 (3) ◽  
pp. 350-362 ◽  
Author(s):  
NATALIA KRÓLIKOWSKA ◽  
DOMINIK KRUPIŃSKI ◽  
LECHOSŁAW KUCZYŃSKI

SummaryThe effective conservation management of vulnerable taxa requires up-to-date evaluation of population size. Montagu’s Harrier Circus pygargus is a farmland raptor of high conservation concern and threatened by agricultural intensification. However, within many European countries, including Poland, the status of this species remains unknown or questionable and information on its breeding is incomplete or imprecise. Here, we estimate the size of the national population of the Montagu’s Harrier and argue that using data from multiple sources may help to design national bird surveys and better contribute to identifying population trends. We built a predictive model based on a presence-absence data obtained by volunteer-based citizen-science projects conducted in Poland during 2000–2012. Afterwards, from the set of 10 km x 10 km squares of high predicted habitat suitability, 100 sampling plots were randomly chosen and regularly surveyed by experienced ornithologists in 2013 and 2014. The evaluation of fieldwork efficiency by the double-observer approach allowed detectability to be estimated and accounted for while estimating population size. We estimated the Polish Montagu’s Harrier population at almost 3,400 breeding pairs (95% CI: 2,700–4,300), thus constituting 20% of the European Union (EU) population. Furthermore, we showed that public-gathered data originating from multiple sources offered great potential for regular surveys to obtain large-scale estimates of population size.


Author(s):  
Gareth Minshall ◽  
Antony Gomez ◽  
Bycroft Christine

ABSTRACTObjectivesStatistics New Zealand’s Integrated Data Infrastructure (IDI) combines information from a range of government agencies (such as tax, health and education data) in order to provide the insights government needs to improve social and economic outcomes for New Zealanders. New Zealand has no national population register or unique identifier used in common across these multiple data sources, and probabilistic linkages are a feature of the IDI. A challenge for researchers is to understand the impact of linkage errors and coverage issues present in the linked data, and to develop the rules necessary to define their target population. We outline the statistical infrastructure Statistics New Zealand is developing to help researchers navigate these issues. ApproachA method has been developed to identify NZ residents at a given time from the much larger number of individuals present in the IDI. Census data linked to the IDI offers insight into the coverage of key population groups and the quality of the attribute information held in the IDI (e.g. location and ethnicity). We are assessing ways that Statistics New Zealand could use these findings to assist researchers in forming their population of interest and assess the potential for bias. ResultsThe derived administrative resident population is compared with the official population figures and patterns of under- and over-coverage are identified at an aggregate, and individual level. Some coverage discrepancies may be improved through reducing linkage errors. Comparison with census data reveals some significant quality issues with location and ethnicity variables in administrative collections. Work is underway to improve methods for combining information from multiple sources of varying quality. ConclusionIdentifying NZ residents at a given time, and quantifying errors in administrative data sources will assist researchers ability to recognise and adjust for these errors in their analysis. Simply quantifying (often for the first time) the limitations of administrative sources also provides impetus to improving the collection of these variables at source.


2021 ◽  
Vol 14 (8) ◽  
pp. 1414-1426
Author(s):  
Filippo Schiavio ◽  
Daniele Bonetta ◽  
Walter Binder

Language-integrated query (LINQ) frameworks offer a convenient programming abstraction for processing in-memory collections of data, allowing developers to concisely express declarative queries using general-purpose programming languages. Existing LINQ frameworks rely on the well-defined type system of statically-typed languages such as C # or Java to perform query compilation and execution. As a consequence of this design, they do not support dynamic languages such as Python, R, or JavaScript. Such languages are however very popular among data scientists, who would certainly benefit from LINQ frameworks in data analytics applications. In this work we bridge the gap between dynamic languages and LINQ frameworks. We introduce DynQ, a novel query engine designed for dynamic languages. DynQ is language-agnostic, since it is able to execute SQL queries in a polyglot language runtime. Moreover, DynQ can execute queries combining data from multiple sources, namely in-memory object collections as well as on-file data and external database systems. Our evaluation of DynQ shows performance comparable with equivalent hand-optimized code, and in line with common data-processing libraries and embedded databases, making DynQ an appealing query engine for standalone analytics applications and for data-intensive server-side workloads.


2020 ◽  
Vol 10 (7) ◽  
pp. 177
Author(s):  
Priyashri Kamlesh Sridhar ◽  
Suranga Nanayakkara

It has been shown that combining data from multiple sources, such as observations, self-reports, and performance with physiological markers offers better insights into cognitive-affective states during the learning process. Through a study with 12 kindergarteners, we explore the role of utilizing insights from multiple data sources, as a potential arsenal to supplement and complement existing assessments methods in understanding cognitive-affective states across two main pedagogical approaches—constructionist and instructionist—as children explored learning a chosen Science, Technology, Engineering and Mathematics (STEM) concept. We present the trends that emerged across pedagogies from different data sources and illustrate the potential value of additional data channels through case illustrations. We also offer several recommendations for such studies, particularly when collecting physiological data, and summarize key challenges that provide potential avenues for future work.


Sign in / Sign up

Export Citation Format

Share Document