data quality metrics
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 18)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 6 (1) ◽  
pp. 25-44
Author(s):  
Menna Ibrahim Gabr ◽  
◽  
Yehia M. Helmy ◽  
Doaa Saad Elzanfaly ◽  
◽  
...  

Achieving high level of data quality is considered one of the most important assets for any small, medium and large size organizations. Data quality is the main hype for both practitioners and researchers who deal with traditional or big data. The level of data quality is measured through several quality dimensions. High percentage of the current studies focus on assessing and applying data quality on traditional data. As we are in the era of big data, the attention should be paid to the tremendous volume of generated and processed data in which 80% of all the generated data is unstructured. However, the initiatives for creating big data quality evaluation models are still under development. This paper investigates the data quality dimensions that are mostly used in both traditional and big data to figure out the metrics and techniques that are used to measure and handle each dimension. A complete definition for each traditional and big data quality dimension, metrics and handling techniques are presented in this paper. Many data quality dimensions can be applied to both traditional and big data, while few number of quality dimensions are either applied to traditional data or big data. Few number of data quality metrics and barely handling techniques are presented in the current works.


2021 ◽  
Author(s):  
Thomas Naake ◽  
Wolfgang Huber

Motivation: First-line data quality assessment and exploratory data analysis are integral parts of any data analysis workflow. In high-throughput quantitative omics experiments (e.g. transcriptomics, proteomics, metabolomics), after initial processing, the data are typically presented as a matrix of numbers (feature IDs x samples). Efficient and standardized data-quality metrics calculation and visualization are key to track the within-experiment quality of these rectangular data types and to guarantee for high-quality data sets and subsequent biological question-driven inference. Results: We present MatrixQCvis, which provides interactive visualization of data quality metrics at the per-sample and per-feature level using R's shiny framework. It provides efficient and standardized ways to analyze data quality of quantitative omics data types that come in a matrix-like format (features IDs x samples). MatrixQCvis builds upon the Bioconductor SummarizedExperiment S4 class and thus facilitates the integration into existing workflows. Availability: MatrixQCVis is implemented in R. It is available via Bioconductor and released under the GPL v3.0 license.


2021 ◽  
Author(s):  
Katrin Hafner ◽  
Dave Wilson ◽  
Rob Mellors ◽  
Pete Davis

<p>The decades long recordings of high-quality open data from the Global Seismographic Network have facilitated studies of earth structure and earthquake processes, as well as monitoring of earthquakes and explosions worldwide.  These data have also enabled a wide range of transformative, cross-disciplinary research that far exceeded the original expectations and design goals of the network, including studies of slow earthquakes, landslides, the Earth’s “hum”, glacial earthquakes, sea-state, climate change, and induced seismicity. </p><p>The GSN continues to produce high quality waveform data, metadata, and multiple data quality metrics such as timing quality and noise levels.   This requires encouraging equipment vendors to develop modern instrumentation, upgrading the stations with new seismic sensors and infrastructure, implementing consistent and well documented calibrations, and monitoring of noise performance.    A Design Goals working group is convening to evaluate how well the GSN has met its original 1985 and 2002 goals, as well as how the network should evolve in order to be able to meet the requirements for enabling new research and monitoring capabilities.   </p><p>In collaboration with GEOFON and GEOSCOPE the GSN is also reviewing the current global distribution and performance of very broadband and broadband stations that comprise these three networks.  We are working to exchange our expertise and experience about new technologies and deployment techniques, and to identify regions where we could collaborate to make operations more efficient, where current efforts are overlapping or where we have similar needs for relocating stations. </p>


BMJ Open ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. e038174
Author(s):  
Antoinette Alas Bhattacharya ◽  
Elizabeth Allen ◽  
Nasir Umar ◽  
Ahmed Audu ◽  
Habila Felix ◽  
...  

ObjectivesPrimary objective: to assess nine data quality metrics for 14 maternal and newborn health data elements, following implementation of an integrated, district-focused data quality intervention. Secondary objective: to consider whether assessing the data quality metrics beyond completeness and accuracy of facility reporting offered new insight into reviewing routine data quality.DesignBefore-and-after study design.SettingPrimary health facilities in Gombe State, Northeastern Nigeria.ParticipantsMonitoring and evaluation officers and maternal, newborn and child health coordinators for state-level and all 11 local government areas (district-equivalent) overseeing 492 primary care facilities offering maternal and newborn care services.InterventionBetween April 2017 and December 2018, we implemented an integrated data quality intervention which included: introduction of job aids and regular self-assessment of data quality, peer-review and feedback, learning workshops, work planning for improvement, and ongoing support through social media.Outcome measures9 metrics for the data quality dimensions of completeness and timeliness, internal consistency of reported data, and external consistency.ResultsThe data quality intervention was associated with improvements in seven of nine data quality metrics assessed including availability and timeliness of reporting, completeness of data elements, accuracy of facility reporting, consistency between related data elements, and frequency of outliers reported. Improvement differed by data element type, with content of care and commodity-related data improving more than contact-related data. Increases in the consistency between related data elements demonstrated improved internal consistency within and across facility documentation.ConclusionsAn integrated district-focused data quality intervention—including regular self-assessment of data quality, peer-review and feedback, learning workshops, work planning for improvement, and ongoing support through social media—can increase the completeness, accuracy and internal consistency of facility-based routine data.


Information ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 532
Author(s):  
Timo Homburg

A continuing question in the geospatial community is the evaluation of fitness for use of map data for a variety of use cases. While data quality metrics and dimensions have been discussed broadly in the geospatial community and have been modelled in semantic web vocabularies, an ontological connection between use cases and data quality expressions allowing reasoning approaches to determine the fitness for use of semantic web map data has not yet been approached. This publication introduces such an ontological model to represent and link situations with geospatial data quality metrics to evaluate thematic map contents. The ontology model constitutes the data storage element of a framework for use case based data quality assurance, which creates suggestions for data quality evaluations which are verified and improved upon by end-users. So-created requirement profiles are associated and shared to semantic web concepts and therefore contribute to a pool of linked data describing situation-based data quality assessments, which may be used by a variety of applications. The framework is tested using two test scenarios which are evaluated and discussed in a wider context.


2020 ◽  
Author(s):  
Heung-Sik Kang ◽  
Chang-Ki Min ◽  
Inhyuk Nam ◽  
Bonggi Oh ◽  
Gyujin Kim ◽  
...  

Abstract We demonstrate a hard-X-ray self-seeded (HXRSS) free-electron laser (FEL) at Pohang Accelerator Laboratory with an unprecedented peak brightness (3.2 × 1035 photons/(s·mm2·mrad2·0.1%BW)). The self-seeded FEL generates hard X-ray pulses with improved spectral purity; the average pulse energy was 0.85 mJ at 9.7 keV, almost as high as in SASE mode; the bandwidth (0.19 eV) is about 1/70 as wide, the peak spectral brightness is 40 times higher than in self-amplified spontaneous emission (SASE) mode, and the stability is excellent with > 94% of shots exceeding the average SASE intensity. Using this self-seeded XFEL, we conducted serial femtosecond crystallography (SFX) experiments to map the structure of lysozyme protein; data-quality metrics such as Rsplit, multiplicity, and signal-to-noise ratio for the SFX were substantially increased. We precisely map out the structure of lysozyme protein with substantially better statistics for the diffraction data and significantly sharper electron density maps compared to maps obtained using SASE mode.


Proteomes ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 21 ◽  
Author(s):  
David C. L. Handler ◽  
Flora Cheng ◽  
Abdulrahman M. Shathili ◽  
Paul A. Haynes

PeptideWitch is a python-based web module that introduces several key graphical and technical improvements to the Scrappy software platform, which is designed for label-free quantitative shotgun proteomics analysis using normalised spectral abundance factors. The program inputs are low stringency protein identification lists output from peptide-to-spectrum matching search engines for ‘control’ and ‘treated’ samples. Through a combination of spectral count summation and inner joins, PeptideWitch processes low stringency data, and outputs high stringency data that are suitable for downstream quantitation. Data quality metrics are generated, and a series of statistical analyses and graphical representations are presented, aimed at defining and presenting the difference between the two sample proteomes.


Assessment ◽  
2020 ◽  
pp. 107319112091393 ◽  
Author(s):  
Paula M. McLaughlin ◽  
Kelly M. Sunderland ◽  
Derek Beaton ◽  
Malcolm A. Binns ◽  
Donna Kwan ◽  
...  

As large research initiatives designed to generate big data on clinical cohorts become more common, there is an increasing need to establish standard quality assurance (QA; preventing errors) and quality control (QC; identifying and correcting errors) procedures for critical outcome measures. The present article describes the QA and QC approach developed and implemented for the neuropsychology data collected as part of the Ontario Neurodegenerative Disease Research Initiative study. We report on the efficacy of our approach and provide data quality metrics. Our findings demonstrate that even with a comprehensive QA protocol, the proportion of data errors still can be high. Additionally, we show that several widely used neuropsychological measures are particularly susceptible to error. These findings highlight the need for large research programs to put into place active, comprehensive, and separate QA and QC procedures before, during, and after protocol deployment. Detailed recommendations and considerations for future studies are provided.


GigaScience ◽  
2020 ◽  
Vol 9 (4) ◽  
Author(s):  
Thomas McGowan ◽  
James E Johnson ◽  
Praveen Kumar ◽  
Ray Sajulga ◽  
Subina Mehta ◽  
...  

Abstract Background Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate ‘omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. Findings MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input—a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. Conclusions MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.


Sign in / Sign up

Export Citation Format

Share Document