data quality assessment
Recently Published Documents


TOTAL DOCUMENTS

439
(FIVE YEARS 123)

H-INDEX

23
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Gérard Ancellet ◽  
Sophie Godin-Beekmann ◽  
Herman G. J. Smit ◽  
Ryan M. Stauffer ◽  
Roeland Van Malderen ◽  
...  

Abstract. The Observatoire de Haute Provence (OHP) weekly Electrochemical Concentration Cell (ECC) ozonesonde data have been homogenized for the time period 1991–2020 according to the recommendations of the Ozonesonde Data Quality Assessment (O3S-DQA) panel. The assessment of the ECC homogenization benefit has been carried out using comparisons with ground based instruments also measuring ozone at the same station (lidar, surface measurements) and with collocated satellite observations of the O3 vertical profile by Microwave Limb Sounder (MLS). The major differences between uncorrected and homogenized ECC are related to a change of ozonesonde type in 1997, removal of the pressure dependency of the ECC background current and correction of internal ozonesonde temperature. The 3–4 ppbv positive bias between ECC and lidar in the troposphere is corrected with the homogenization. The ECC 30-years trends of the seasonally adjusted ozone concentrations are also significantly improved both in the troposphere and the stratosphere when the ECC concentrations are homogenized, as shown by the ECC/lidar or ECC/surface ozone trend comparisons. A −0.29 % per year negative trend of the normalization factor (NT) calculated using independent measurements of the total ozone column (TOC) at OHP disappears after homogenization of the ECC. There is however a remaining −5 % negative bias in the TOC which is likely related to an underestimate of the ECC concentrations in the stratosphere above 50 hPa as shown by direct comparison with the OHP lidar and MLS. The reason for this bias is still unclear, but a possible explanation might be related to freezing or evaporation of the sonde solution in the stratosphere. Both the comparisons with lidar and satellite observations suggest that homogenization increases the negative bias of the ECC up to 10 % above 28 km.


JAMIA Open ◽  
2022 ◽  
Vol 5 (1) ◽  
Author(s):  
Sophia Z Shalhout ◽  
Farees Saqlain ◽  
Kayla Wright ◽  
Oladayo Akinyemi ◽  
David M Miller

Abstract Objective To develop a clinical informatics pipeline designed to capture large-scale structured Electronic Health Record (EHR) data for a national patient registry. Materials and Methods The EHR-R-REDCap pipeline is implemented using R statistical software to remap and import structured EHR data into the Research Electronic Data Capture (REDCap)-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary. Results Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Laboratory values (Labs) were transformed, remapped, and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482 450 results were imported into the registry for 1109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N = 176) using this clinical informatics pipeline. Conclusion We demonstrate feasibility of the facile eLAB workflow. EHR data are successfully transformed and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.


2021 ◽  
Vol 27 (12) ◽  
pp. 1300-1324
Author(s):  
Mohamed Talha ◽  
Anas Abou El Kalam

Big Data often refers to a set of technologies dedicated to deal with large volumes of data. Data Quality and Data Security are two essential aspects for any Big Data project. While Data Quality Management Systems are about putting in place a set of processes to assess and improve certain characteristics of data such as Accuracy, Consistency, Completeness, Timeliness, etc., Security Systems are designed to protect the Confidentiality, Integrity and Availability of data. In a Big Data environment, data quality processes can be blocked by data security mechanisms. Indeed, data is often collected from external sources that could impose their own security policies. In many research works, it has been recognized that merging and integrating access control policies are real challenges for Big Data projects. To address this issue, we suggest in this paper a framework to secure data collection in collaborative platforms. Our framework extends and combines two existing frameworks namely: PolyOrBAC and SLA- Framework. PolyOrBAC is a framework intended for the protection of collaborative environments. SLA-Framework, for its part, is an implementation of the WS-Agreement Specification, the standard for managing bilaterally negotiable SLAs (Service Level Agreements) in distributed systems; its integration into PolyOrBAC will automate the implementation and application of security rules. The resulting framework will then be incorporated into a data quality assessment system to create a secure and dynamic collaborative activity in the Big Data context.


2021 ◽  
Vol 3 ◽  
pp. 1-3
Author(s):  
Elias Nasr Naim Elias ◽  
Fabricio Rosa Amorim ◽  
Marcio Augusto Reolon Schmidt ◽  
Silvana Philippi Camboim


Author(s):  
Abdurrahman Coskun ◽  
Sverre Sandberg ◽  
Ibrahim Unsal ◽  
Fulya G. Yavuz ◽  
Coskun Cavusoglu ◽  
...  

Abstract For many measurands, physicians depend on population-based reference intervals (popRI), when assessing laboratory test results. The availability of personalized reference intervals (prRI) may provide a means to improve the interpretation of laboratory test results for an individual. prRI can be calculated using estimates of biological and analytical variation and previous test results obtained in a steady-state situation. In this study, we aim to outline statistical approaches and considerations required when establishing and implementing prRI in clinical practice. Data quality assessment, including analysis for outliers and trends, is required prior to using previous test results to estimate the homeostatic set point. To calculate the prRI limits, two different statistical models based on ‘prediction intervals’ can be applied. The first model utilizes estimates of ‘within-person biological variation’ which are based on an individual’s own data. This model requires a minimum of five previous test results to generate the prRI. The second model is based on estimates of ‘within-subject biological variation’, which represents an average estimate for a population and can be found, for most measurands, in the EFLM Biological Variation Database. This model can be applied also when there are lower numbers of previous test results available. The prRI offers physicians the opportunity to improve interpretation of individuals’ test results, though studies are required to demonstrate if using prRI leads to better clinical outcomes. We recommend that both popRIs and prRIs are included in laboratory reports to aid in evaluating laboratory test results in the follow-up of patients.


Author(s):  
Sebastian Meister ◽  
Jan Stüve ◽  
Roger M. Groves

AbstractAutomated fibre layup techniques are often applied for the production of complex structural components. In order to ensure a sufficient component quality, a subsequent visual inspection is necessary, especially in the aerospace industry. The use of automated optical inspection systems can reduce the inspection effort by up to 50 %. Laser line scan sensors, which capture the topology of the surface, are particularly advantageous for this purpose. These sensors project a laser beam at an angle onto the surface and detect its position via a camera. The optical properties of the observed surface potentially have a great influence on the quality of the recorded data. This is especially relevant for dark or highly scattering materials such as Carbon Fiber Reinforced Plastics (CFRP). For this reason, in this study we investigate the optical reflection and transmission properties of the commonly used Hexel HexPly 8552 IM7 prepreg CFRP in detail. Therefore, we utilise a Gonioreflectometer to investigate such optical characteristics of the material with respect to different fibre orientations, illumination directions and detection angles. In this way, specific scattering information of the material in the hemispherical space are recorded. The major novelty of this research are the findings about the scattering behaviour of the fibre composite material which can be used as a more precise input for the methods of image data quality assessment from our previous research and thus is particularly valuable for developers and users of camera based inspection systems for CFRP components.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Erik Tute ◽  
Nagarajan Ganapathy ◽  
Antje Wulff

Abstract Background Data quality assessment is important but complex and task dependent. Identifying suitable measurement methods and reference ranges for assessing their results is challenging. Manually inspecting the measurement results and current data driven approaches for learning which results indicate data quality issues have considerable limitations, e.g. to identify task dependent thresholds for measurement results that indicate data quality issues. Objectives To explore the applicability and potential benefits of a data driven approach to learn task dependent knowledge about suitable measurement methods and assessment of their results. Such knowledge could be useful for others to determine whether a local data stock is suitable for a given task. Methods We started by creating artificial data with previously defined data quality issues and applied a set of generic measurement methods on this data (e.g. a method to count the number of values in a certain variable or the mean value of the values). We trained decision trees on exported measurement methods’ results and corresponding outcome data (data that indicated the data’s suitability for a use case). For evaluation, we derived rules for potential measurement methods and reference values from the decision trees and compared these regarding their coverage of the true data quality issues artificially created in the dataset. Three researchers independently derived these rules. One with knowledge about present data quality issues and two without. Results Our self-trained decision trees were able to indicate rules for 12 of 19 previously defined data quality issues. Learned knowledge about measurement methods and their assessment was complementary to manual interpretation of measurement methods’ results. Conclusions Our data driven approach derives sensible knowledge for task dependent data quality assessment and complements other current approaches. Based on labeled measurement methods’ results as training data, our approach successfully suggested applicable rules for checking data quality characteristics that determine whether a dataset is suitable for a given task.


Author(s):  
Ben Norton

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.


2021 ◽  
pp. 1-24
Author(s):  
Kelly McMann ◽  
Daniel Pemstein ◽  
Brigitte Seim ◽  
Jan Teorell ◽  
Staffan Lindberg

Abstract Political scientists routinely face the challenge of assessing the quality (validity and reliability) of measures in order to use them in substantive research. While stand-alone assessment tools exist, researchers rarely combine them comprehensively. Further, while a large literature informs data producers, data consumers lack guidance on how to assess existing measures for use in substantive research. We delineate a three-component practical approach to data quality assessment that integrates complementary multimethod tools to assess: (1) content validity; (2) the validity and reliability of the data generation process; and (3) convergent validity. We apply our quality assessment approach to the corruption measures from the Varieties of Democracy (V-Dem) project, both illustrating our rubric and unearthing several quality advantages and disadvantages of the V-Dem measures, compared to other existing measures of corruption.


Sign in / Sign up

Export Citation Format

Share Document