quality data Latest Research Papers

Large-scale adoption of Artificial Intelligence and Machine Learning (AI-ML) models fed by heterogeneous, possibly untrustworthy data sources has spurred interest in estimating degradation of such models due to spurious, adversarial, or low-quality data assets. We propose a quantitative estimate of the severity of classifiers’ training set degradation: an index expressing the deformation of the convex hulls of the classes computed on a held-out dataset generated via an unsupervised technique. We show that our index is computationally light, can be calculated incrementally and complements well existing ML data assets’ quality measures. As an experimentation, we present the computation of our index on a benchmark convolutional image classifier.

Download Full-text

Automated Annotations for AI Data and Model Transparency

Journal of Data and Information Quality ◽

10.1145/3460000 ◽

2022 ◽

Vol 14 (1) ◽

pp. 1-9

Author(s):

Saravanan Thirumuruganathan ◽

Mayuresh Kunjir ◽

Mourad Ouzzani ◽

Sanjay Chawla

Keyword(s):

Open Data ◽

Ease Of Use ◽

Quality Data ◽

Key Factors ◽

Data Governance ◽

Policy Compliance ◽

The Public ◽

Use Of Data ◽

Challenges And Opportunities ◽

Data Transparency

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Download Full-text

The Impact of Human Service Provider Quality on the Personal Outcomes of People With Intellectual and Developmental Disabilities

Frontiers in Rehabilitation Sciences ◽

10.3389/fresc.2021.780168 ◽

2022 ◽

Vol 2 ◽

Author(s):

Carli Friedman

Keyword(s):

Quality Of Life ◽

Developmental Disabilities ◽

Human Service ◽

Service Providers ◽

Quality Data ◽

Support Needs ◽

Intellectual And Developmental Disabilities ◽

Human Service Providers ◽

Provider Quality

BackgroundQuality of life is multidimensional—influenced by individual, organizational, and environmental factors. As such, when examining personal outcomes, it is also important to consider meso and macro factors that contribute to people with intellectual and developmental disabilities' (IDD's) quality of life. While it is widely acknowledged that organizational factors contribute to people's quality of life, there is less research directly examining how the quality of human service providers contributes to people with IDD's personal outcomes. For these reasons, the aim of this study was to explore the relationship between provider quality and people with IDD's personal quality of life outcomes.MethodsUsing a multilevel linear regression we analyzed secondary Personal Outcome Measures® (personal outcomes) and Basic Assurances® (provider quality) data from 2,900 people with IDD served by 331 human service providers.ResultsPeople with IDD's personal outcomes, regardless of their support needs or other demographics, were significantly impacted by the quality of the human service providers they received services from—the higher the quality of the provider, the more personal outcomes they had present. In addition, the following demographic covariates were correlated with personal outcomes: gender; race; complex support needs; residence type; and organizations that offered therapy services.DiscussionWhile quality improvement initiatives may require a great deal of cost and time commitment from providers, our findings suggest the effort translates to improved personal outcomes among people with IDD. The ultimate goal of service providers should be improvement of quality of life among those they support.

Download Full-text

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

Heritage Science ◽

10.1186/s40494-021-00640-y ◽

2022 ◽

Vol 10 (1) ◽

Author(s):

Laurens Jozef Nicolaas Oostwegel ◽

Štefan Jaud ◽

Sergej Muhič ◽

Katja Malovrh Rebec

Keyword(s):

Data Exchange ◽

Open Data ◽

Quality Data ◽

Semantic Data ◽

Information Models ◽

Building Information ◽

Heritage Building ◽

Conservation Plan ◽

The Creation ◽

Information Delivery Manual

AbstractCultural heritage building information models (HBIMs) incorporate specific geometric and semantic data that are mandatory for supporting the workflows and decision making during a heritage study. The Industry Foundation Classes (IFC) open data exchange standard can be used to migrate these data between different software solutions as an openBIM approach, and has the potential to mitigate data loss. Specific data-exchange scenarios can be supported by firstly developing an Information Delivery Manual (IDM) and subsequently filtering portions of the IFC schema and producing a specialized Model View Definition (MVD). This paper showcases the creation of a specialized IDM for the heritage domain in consultation with experts in the restoration and preservation of built heritage. The IDM was then translated into a pilot MVD for heritage. We tested our developments on an HBIM case study, where a historic building was semantically enriched with information about the case study’s conservation plan and then checked against the specified IDM requirements using the developed MVD. We concluded that the creation of an IDM and then the MVD for the heritage domain are achievable and will bring us one step closer to BIM standardisation in the field of digitised cultural buildings.

Download Full-text

Structure of the Lysinibacillus sphaericus Tpp49Aa1 pesticidal protein elucidated from natural crystals using MHz-SFX

10.1101/2022.01.14.476343 ◽

2022 ◽

Author(s):

Lainey J Williamson ◽

Marina Galchenkova ◽

Hannah L Best ◽

Richard J Bean ◽

Anna Munke ◽

...

Keyword(s):

Potential Interaction ◽

Quality Data ◽

Target Range ◽

X Ray Diffraction ◽

Two Component System ◽

Lysinibacillus Sphaericus ◽

X Ray ◽

Mosquitocidal Activity ◽

Family Protein ◽

First Time

Tpp49Aa1 from Lysinibacillus sphaericus is a Toxin_10 family protein that must interact with Cry48Aa1, a 3-domain crystal protein, to produce potent mosquitocidal activity, specifically against Culex quinquefasciatus mosquitoes. We use Culex cell lines to demonstrate for the first time transient detrimental effects of individual toxin components and widen the known target range of the proteins. MHz serial femtosecond crystallography at a nano-focused X-ray free electron laser allowed rapid and high-quality data collection to determine the Tpp49Aa1 structure at 2.2 Å resolution from the merged X-ray diffraction data. The structure revealed the packing of Cry49Aa1 within the natural nanocrystals isolated from sporulated bacteria, as a homodimer with a large intermolecular interface. We then modelled the potential interaction between Tpp49Aa1 and Cry48Aa1. The structure sheds light on natural crystallisation and, along with cell-based assays broadens our understanding of this two-component system.

Download Full-text

Studying Up Machine Learning Data

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3492853 ◽

2022 ◽

Vol 6 (GROUP) ◽

pp. 1-14

Author(s):

Milagros Miceli ◽

Julian Posada ◽

Tianling Yang

Keyword(s):

Machine Learning ◽

Social Contexts ◽

Quality Data ◽

Research Focus ◽

Labor Conditions ◽

The Social ◽

Data Documentation ◽

Societal Problems ◽

Data Design ◽

Learning Data

Research in machine learning (ML) has argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our research community \,---\,one bias-centered, the other power-aware. We highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to "bias" misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.

Download Full-text

Distance-Entropy: An Effective Indicator for Selecting Informative Data

Frontiers in Plant Science ◽

10.3389/fpls.2021.818895 ◽

2022 ◽

Vol 12 ◽

Author(s):

Yang Li ◽

Xuewei Chao

Keyword(s):

Quality Assessment ◽

Image Quality Assessment ◽

Data Gathering ◽

Recognition Task ◽

Entropy Method ◽

Quality Data ◽

Practical Applications ◽

Crucial Information ◽

Smart Agriculture ◽

Base Data

Smart agriculture is inseparable from data gathering, analysis, and utilization. A high-quality data improves the efficiency of intelligent algorithms and helps reduce the costs of data collection and transmission. However, the current image quality assessment research focuses on visual quality, while ignoring the crucial information aspect. In this work, taking the crop pest recognition task as an example, we proposed an effective indicator of distance-entropy to distinguish the good and bad data from the perspective of information. Many comparative experiments, considering the mapping feature dimensions and base data sizes, were conducted to testify the validity and robustness of this indicator. Both the numerical and the visual results demonstrate the effectiveness and stability of the proposed distance-entropy method. In general, this study is a relatively cutting-edge work in smart agriculture, which calls for attention to the quality assessment of the data information and provides some inspiration for the subsequent research on data mining, as well as for the dataset optimization for practical applications.

Download Full-text

The non-fatal burden of cancer in Belgium, 2004–2019: a nationwide registry-based study

BMC Cancer ◽

10.1186/s12885-021-09109-4 ◽

2022 ◽

Vol 22 (1) ◽

Author(s):

Vanessa Gorasso ◽

Geert Silversmit ◽

Marc Arbyn ◽

Astrid Cornez ◽

Robby De Pauw ◽

...

Keyword(s):

Skin Cancer ◽

Burden Of Disease ◽

Quality Data ◽

Years Of Life Lost ◽

Non Melanoma Skin Cancer ◽

Life Years ◽

National Burden ◽

Disease Study ◽

The Impact ◽

Melanoma Skin

Abstract Background The importance of assessing and monitoring the health status of a population has grown in the last decades. Consistent and high quality data on the morbidity and mortality impact of a disease represent the key element for this assessment. Being increasingly used in global and national burden of diseases (BoD) studies, the Disability-Adjusted Life Year (DALY) is an indicator that combines healthy life years lost due to living with disease (Years Lived with Disability; YLD) and due to dying prematurely (Years of Life Lost; YLL). As a step towards a comprehensive national burden of disease study, this study aims to estimate the non-fatal burden of cancer in Belgium using national data. Methods We estimated the Belgian cancer burden from 2004 to 2019 in terms of YLD, using national population-based cancer registry data and international disease models. We developed a microsimulation model to translate incidence- into prevalence-based estimates, and used expert elicitation to integrate the long-term impact of increased disability due to surgical treatment. Results The age-standardized non-fatal burden of cancer increased from 2004 to 2019 by 6 and 3% respectively for incidence- and prevalence-based YLDs. In 2019, in Belgium, breast cancer had the highest morbidity impact among women, followed by colorectal and non-melanoma skin cancer. Among men, prostate cancer had the highest morbidity impact, followed by colorectal and non-melanoma skin cancer. Between 2004 and 2019, non-melanoma skin cancer significantly increased for both sexes in terms of age-standardized incidence-based YLD per 100,000, from 49 to 111 for men and from 15 to 44 for women. Important decreases were seen for colorectal cancer for both sexes in terms of age-standardized incidence-based YLD per 100,000, from 105 to 84 for men and from 66 to 58 for women. Conclusions Breast and prostate cancers represent the greatest proportion of cancer morbidity, while for both sexes the morbidity burden of skin cancer has shown an important increase from 2004 onwards. Integrating the current study in the Belgian national burden of disease study will allow monitoring of the burden of cancer over time, highlighting new trends and assessing the impact of public health policies.

Download Full-text

Network-Based Topological Exploration of the Impact of Pollution Sources on Surface Water Bodies

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.723997 ◽

2022 ◽

Vol 9 ◽

Author(s):

Viktor Sebestyén ◽

Tímea Czvetkó ◽

János Abonyi

Keyword(s):

Water Quality ◽

Surface Water ◽

Wastewater Treatment Plants ◽

Pollution Sources ◽

Water Bodies ◽

Quality Data ◽

Critical Surface ◽

Water Quality Data ◽

Surface Water Bodies ◽

The Impact

We developed a digital water management toolkit to evaluate the importance of the connections between water bodies and the impacts caused by pollution sources. By representing water bodies in a topological network, the relationship between point loads and basic water quality parameters is examined as a labelled network. The labels are defined based on the classification of the water bodies and pollution sources. The analysis of the topology of the network can provide information on how the possible paths of the surface water network influence the water quality. The extracted information can be used to develop a monitoring- and evidence-based decision support system. The methodological development is presented through the analysis of the physical-chemical parameters of all surface water bodies in Hungary, using the emissions of industrial plants and wastewater treatment plants. Changes in water quality are comprehensively assessed based on the water quality data recorded over the past 10 years. The results illustrate that the developed method can identify critical surface water bodies where the impact of local pollution sources is more significant. One hundred six critical water bodies have been identified, where special attention should be given to water quality improvement.

Download Full-text

Price, quality, and market dynamics of malaria rapid diagnostic tests: analysis of Global Fund 2009–2018 data

Malaria Journal ◽

10.1186/s12936-021-04008-2 ◽

2022 ◽

Vol 21 (1) ◽

Author(s):

Rachel Wittenauer ◽

Spike Nowak ◽

Nick Luter

Keyword(s):

Product Quality ◽

Market Share ◽

Diagnostic Tests ◽

Global Fund ◽

Rapid Diagnostic Tests ◽

Product Type ◽

Quality Data ◽

The Past ◽

The Relationship ◽

Over Time

Abstract Background Rapid diagnostic tests (RDTs) for malaria are a vital part of global malaria control. Over the past decade, RDT prices have declined, and quality has improved. However, the relationship between price and product quality and their larger implications on the market have yet to be characterized. This analysis used purchase data from the Global Fund together with product quality data from the World Health Organization (WHO) and Foundation for Innovative New Diagnostics (FIND) Malaria RDT Product Testing Programme to understand three unanswered questions: (1) Has the market share by quality of RDTs in the Global Fund’s procurement orders changed over time? (2) What is the relationship between unit price and RDT quality? (3) Has the market for RDTs financed by the Global Fund become more concentrated over time? Methods Data from 10,075 procurement transactions in the Global Fund’s database, which includes year, product, volume, and price, was merged with product quality data from all eight rounds of the WHO-FIND programme, which evaluated 227 unique RDT products. To describe trends in market share by quality level of RDT, descriptive statistics were used to analyse trends in market share from 2009 to 2018. A generalized linear regression model was then applied to characterize the relationship between price and panel detection score (PDS), adjusting for order volume, year purchased, product type, and manufacturer. Third, a Herfindahl–Hirschman Index (HHI) score was calculated to characterize the degree of market concentration. Results Lower-quality RDTs have lost market share between 2009 and 2018, as have the highest-quality RDTs. No statistically significant relationship between price per test and PDS was found when adjusting for order volume, product type, and year of purchase. The HHI was 3,570, indicating a highly concentrated market. Conclusions Advancements in RDT affordability, quality, and access over the past decade risk stagnation if health of the RDT market as a whole is neglected. These results suggest that from 2009 to 2018, this market was highly concentrated and that quality was not a distinguishing feature between RDTs. This information adds to previous reports noting concerns about the long-term sustainability of this market. Further research is needed to understand the causes and implications of these trends.

Download Full-text

quality data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Estimating Degradation of Machine Learning Data Assets

Automated Annotations for AI Data and Model Transparency

The Impact of Human Service Provider Quality on the Personal Outcomes of People With Intellectual and Developmental Disabilities

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

Structure of the Lysinibacillus sphaericus Tpp49Aa1 pesticidal protein elucidated from natural crystals using MHz-SFX

Studying Up Machine Learning Data

Distance-Entropy: An Effective Indicator for Selecting Informative Data

The non-fatal burden of cancer in Belgium, 2004–2019: a nationwide registry-based study

Network-Based Topological Exploration of the Impact of Pollution Sources on Surface Water Bodies

Price, quality, and market dynamics of malaria rapid diagnostic tests: analysis of Global Fund 2009–2018 data

Export Citation Format

quality dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Estimating Degradation of Machine Learning Data Assets

Automated Annotations for AI Data and Model Transparency

The Impact of Human Service Provider Quality on the Personal Outcomes of People With Intellectual and Developmental Disabilities

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

Structure of the Lysinibacillus sphaericus Tpp49Aa1 pesticidal protein elucidated from natural crystals using MHz-SFX

Studying Up Machine Learning Data

Distance-Entropy: An Effective Indicator for Selecting Informative Data

The non-fatal burden of cancer in Belgium, 2004–2019: a nationwide registry-based study

Network-Based Topological Exploration of the Impact of Pollution Sources on Surface Water Bodies

Price, quality, and market dynamics of malaria rapid diagnostic tests: analysis of Global Fund 2009–2018 data

quality data
Recently Published Documents