scholarly journals Administrative Data Format Standardization for Efficient Analytics

Author(s):  
Ryan Mackenzie White

Adoption of non-traditional data sources to augment or replace traditional survey vehicles can reduce respondent burden, provide more timely information for policy makers, and gain insights into the society that may otherwise be hidden or missed through traditional survey vehicles. The use of non-traditional data sources imposes several technological challenges due to the volume, velocity and quality of the data. The lack of applied industry-standard data format is a limiting factor which affects the reception, processing and analysis of these data sources. The adoption of a standardized, cross-language, in-memory data format that is organized for efficient analytic operations on modern hardware as a system of record for all administrative data sources has several implications: Enables the efficient use of computational resources related to I/O, processing and storage. Improves data sharing, management and governance capabilities. Increases analyst accessibility to tools, technologies and methods. Statistics Canada developed a framework for selecting computing architecture models for efficient data processing based on benchmark data pipelines representative of common administrative data processes. The data pipelines demonstrate the benefits of a standardized data format for data management, and the efficient use of computational resources. The data pipelines define the preprocessing requirements, data ingestion, data conversion, and metadata modeling, for integration into a common computing architecture. The integration of a standardized data format into a distributed data processing framework based on container technologies is discussed as a general technique to process large volumes of administrative data.

2011 ◽  
Vol 314-316 ◽  
pp. 2253-2258
Author(s):  
Dong Gen Cai ◽  
Tian Rui Zhou

The data processing and conversion plays an important role in RP processes in which the choice of data format determines data processing procedure and method. In this paper, the formats and features of commonly used interface standards such as STL, IGES and STEP are introduced, and the data conversion experiments of CAD models are carried out based on Pro/E system in which the conversion effects of different data formats are compared and analyzed, and the most reasonable data conversion format is proposed.


2020 ◽  
Author(s):  
Christian Zeeden ◽  
Christian Laag ◽  
Pierre Camps ◽  
Yohan Guyodo ◽  
Ulrich Hambach ◽  
...  

<p>Paleomagnetic data are used in different data formats, adapted to data output of a variety of devices and specific analysis software. This includes widely used openly available software, e.g. PMag.py/MagIC, AGICO/.jr6 & .ged, and PuffinPlot/.ppl. Besides these, individual software and data formats have been established by individual laboratories.</p><p>Here we compare different data formats, identify similarities and create a common and interchangeable data basis. We introduce the idea of a paleomagnetic object (pmob), a simple data table that can include any and all data that would be relevant to the user. We propose a basic nomenclature of abbreviations for the most common paleomagnetic data to merge different data formats. For this purpose, we introduce a set of automatization routines for paleomagnetic data conversion. Our routines bring several data formats into a common data format (pmob), and also allow reversion into selected formats. We propose creating similar routines for all existing paleomagnetic data formats; our suite of computation tools will provide the basis to facilitate the inclusion of further data formats. Furthermore, automatized data processing allows quality assessment of data.</p>


2021 ◽  
pp. 1-22
Author(s):  
Emily Berg ◽  
Johgho Im ◽  
Zhengyuan Zhu ◽  
Colin Lewis-Beck ◽  
Jie Li

Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.


Author(s):  
Jonathan M Snowden ◽  
Audrey Lyndon ◽  
Peiyi Kan ◽  
Alison El Ayadi ◽  
Elliott Main ◽  
...  

Abstract Severe maternal morbidity (SMM) is a composite outcome measure that indicates serious, potentially life-threatening maternal health problems. There is great interest in defining SMM using administrative data for surveillance and research. In the US, one common way of defining SMM at the population level is an index developed by the Centers for Disease Control and Prevention. Modifications have been proposed to this index (e.g., excluding maternal transfusion); some research defines SMM using an index introduced by Bateman et al. Birth certificate data are also increasingly being used to define SMM. We compared commonly used US definitions of SMM to each other among all California births, 2007-2012, using the Kappa statistic and other measures. We also evaluated agreement between maternal morbidity fields on the birth certificate compared to claims data. Concordance was generally low between the 7 definitions of SMM analyzed (i.e., κ < 0.4 for 13 of 21 two-way comparisons), Low concordance was particularly driven by presence/absence of transfusion and claims data versus birth certificate definitions. Low agreement between administrative data-based definitions of SMM highlights that results can be expected to differ between them. Further research is needed on validity of SMM definitions, using more fine-grained data sources.


2021 ◽  
Vol 2 (3) ◽  
pp. 59
Author(s):  
Susanti Krismon ◽  
Syukri Iska

This article discusses the implementation of wages in agriculture in Nagari Bukit Kandung Subdistrict X Koto Atas, Solok Regency in a review of muamalah fiqh. The type of research is field research (field research). The data sources consist of primary data sources, namely from farmers and farm laborers who were carried out to 8 people and 4 farm workers, while the secondary data were obtained from documents in the form of the Bukit Kandung Nagari Profile that were related to this research, which could provide information or data. Addition to strengthen the primary data. Data collection techniques that the author uses are observation, interviews and documentation. The data processing that the author uses is qualitative. Based on the results of this study, the implementation of wages in agriculture carried out in Nagari Bukit Kandung District X Koto Diatas Solok Regency is farm laborers who ask for their wages to be given in advance before they carry out their work without an agreement to give their wages at the beginning. Because farm laborers ask for their wages to be given at the beginning, many farm workers work not as expected by farmers and there are also farm workers who are not on time to do the work that should be done. According to the muamalah fiqh review, the implementation of wages in agriculture in Nagari Bukit Kandung is not allowed because there is an element of gharar in the contract and there are parties who are disadvantaged in the contract, namely the owner of the fields.


2018 ◽  
Vol 62 (7) ◽  
pp. 1044-1060 ◽  
Author(s):  
Alex Bogatu ◽  
Norman W Paton ◽  
Alvaro A A Fernandes ◽  
Martin Koehler

Abstract Data wrangling is the process whereby data are cleaned and integrated for analysis. Data wrangling, even with tool support, is typically a labour intensive process. One aspect of data wrangling involves carrying out format transformations on attribute values, for example so that names or phone numbers are represented consistently. Recent research has developed techniques for synthesizing format transformation programs from examples of the source and target representations. This is valuable, but still requires a user to provide suitable examples, something that may be challenging in applications in which there are huge datasets or numerous data sources. In this paper, we investigate the automatic discovery of examples that can be used to synthesize format transformation programs. In particular, we propose two approaches to identifying candidate data examples and validating the transformations that are synthesized from them. The approaches are evaluated empirically using datasets from open government data.


2014 ◽  
Vol 23 (01) ◽  
pp. 27-35 ◽  
Author(s):  
S. de Lusignan ◽  
S-T. Liaw ◽  
C. Kuziemsky ◽  
F. Mold ◽  
P. Krause ◽  
...  

Summary Background: Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. Objective: To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. Method: We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. Results: We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowd-sourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the “internet of things”, and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. Conclusions: Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance.


2015 ◽  
Vol 31 (2) ◽  
pp. 231-247 ◽  
Author(s):  
Matthias Schnetzer ◽  
Franz Astleithner ◽  
Predrag Cetkovic ◽  
Stefan Humer ◽  
Manuela Lenk ◽  
...  

Abstract This article contributes a framework for the quality assessment of imputations within a broader structure to evaluate the quality of register-based data. Four quality-related hyperdimensions examine the data processing from the raw-data level to the final statistics. Our focus lies on the quality assessment of different imputation steps and their influence on overall data quality. We suggest classification rates as a measure of accuracy of imputation and derive several computational approaches.


Sign in / Sign up

Export Citation Format

Share Document