poor quality data
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 13)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. A. Dakka ◽  
T. V. Nguyen ◽  
J. M. M. Hall ◽  
S. M. Diakiw ◽  
M. VerMilyea ◽  
...  

AbstractThe detection and removal of poor-quality data in a training set is crucial to achieve high-performing AI models. In healthcare, data can be inherently poor-quality due to uncertainty or subjectivity, but as is often the case, the requirement for data privacy restricts AI practitioners from accessing raw training data, meaning manual visual verification of private patient data is not possible. Here we describe a novel method for automated identification of poor-quality data, called Untrainable Data Cleansing. This method is shown to have numerous benefits including protection of private patient data; improvement in AI generalizability; reduction in time, cost, and data needed for training; all while offering a truer reporting of AI performance itself. Additionally, results show that Untrainable Data Cleansing could be useful as a triage tool to identify difficult clinical cases that may warrant in-depth evaluation or additional testing to support a diagnosis.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2049
Author(s):  
Kennedy Edemacu ◽  
Jong Wook Kim

Nowadays, the internet of things (IoT) is used to generate data in several application domains. A logistic regression, which is a standard machine learning algorithm with a wide application range, is built on such data. Nevertheless, building a powerful and effective logistic regression model requires large amounts of data. Thus, collaboration between multiple IoT participants has often been the go-to approach. However, privacy concerns and poor data quality are two challenges that threaten the success of such a setting. Several studies have proposed different methods to address the privacy concern but to the best of our knowledge, little attention has been paid towards addressing the poor data quality problems in the multi-party logistic regression model. Thus, in this study, we propose a multi-party privacy-preserving logistic regression framework with poor quality data filtering for IoT data contributors to address both problems. Specifically, we propose a new metric gradient similarity in a distributed setting that we employ to filter out parameters from data contributors with poor quality data. To solve the privacy challenge, we employ homomorphic encryption. Theoretical analysis and experimental evaluations using real-world datasets demonstrate that our proposed framework is privacy-preserving and robust against poor quality data.


2021 ◽  
Author(s):  
M.A. Dakka ◽  
T. Nguyen ◽  
J.M.M. Hall ◽  
S.M. Diakiw ◽  
M. VerMilyea ◽  
...  

Abstract The detection and removal of poor-quality data in a training set is crucial to achieve high-performing AI models. In healthcare, data can be inherently poor-quality due to uncertainty or subjectivity, but as is often the case, the requirement for data privacy restricts AI practitioners from accessing raw training data, meaning manual visual verification of private patient data is not possible. Here we describe a novel method for automated identification of poor-quality data, called Untrainable Data Cleansing. This method is shown to have numerous benefits including protection of private patient data; improvement in AI generalizability; reduction in time, cost, and data needed for training; all while offering a truer reporting of AI performance itself. Additionally, results show that Untrainable Data Cleansing could be useful as a triage tool to identify difficult clinical cases that may warrant in-depth evaluation or additional testing to support a diagnosis.


Author(s):  
Scott Davis ◽  
Sumit Mohan

Patients who receive a kidney transplant commonly experience failure of their allograft. Transplant failure often comes with complex management decisions, such as when and how to wean immunosuppression and start the transition to a second transplant or to dialysis. These decisions are made in the context of important concerns about competing risks, including sensitization and infection. Unfortunately, the management of the failed allograft is, at present, guided by relatively poor-quality data and, as a result, practice patterns are variable and suboptimal given that patients with failed allografts experience excess morbidity and mortality compared with their transplant-naive counterparts. In this review, we summarize the management strategies through the often-precarious transition from transplant to dialysis, highlighting the paucity of data and the critical gaps in our knowledge that are necessary to inform the optimal care of the patient with a failing kidney transplant.


2020 ◽  
Vol 73 (6) ◽  
pp. 1372-1386
Author(s):  
Zihan Peng ◽  
Chengfa Gao ◽  
Rui Shang

The tight combination model improves the positioning accuracy of the Global Navigation Satellite System (GNSS) in complex environments by increasing the redundancy of observation. However, the ambiguity cannot be calculated directly because of the correlation between it and the phase difference inter-system bias (DISB) in the model. This paper proposes a method of DISB estimation based on the principle of maximum ratio. From the data analysis, for the standard deviation of code DISB, the improvement of the method can up to 0·179 m with the poor quality data. In addition, compared to the parameter combination method, the standard deviation of all the phase DISB was deceased with the method in the paper. About the phase DISB of GPS L1/Galileo E1, the standard deviation decreased from 0·014/0·022/0·009/0·051 cycles to 0·006/0·015/0·004/0·029 cycles of four baselines, which represents the improvement of 57·14/31·82/55·56/43·14%. About the phase DISB of GPS L1/BDS B1, the standard deviation decreased from 0·014/0·061/0·010/0·052 cycles to 0·002/0·005/0·009/0·004 cycles of four baselines, which represents the improvement of 85·71/91·80/10·00/92·31%.


2020 ◽  
Vol 9 (1) ◽  
pp. 2535-2539

: Data is very valuable and it is generated in large volumes. The Use of high-quality data for making quality decisions has become a huge task which helps people to make better decisions, analysis, predictions. We are surrounded by data with errors, Data cleaning is a delayed, complicated task and considered costly. Data polishing is important since it is necessary to remove errors from the data before transferring to the data warehouse since poor quality data is eliminated to get the desired results. The Error-free data will produce precise and accurate results when queried. Hence consistent and proper data is required for the decision making. The characteristics of data polishing is data repairing and data association. Identifying the homogeneous object and linking it to the most associated object is defined as Association. The process of making the database reliable by repairing and finding the faults is defined as repairing. In the case of big data applications, we do not use all the existing data, we use only subsets of appropriate data. Association is the process of converting extensive amounts of raw data to subsets of appropriate data that are useful. Once we get the appropriate data, the available data is analyzed and it leads to knowledge [14]. Multiple approaches are used to associate the given data and to achieve meaningful and useful knowledge to fix or repair [12]. Maintaining polished quality of data is referred to as data polishing. Usually the objectives of data polishing are not properly defined. This paper will discuss the goals of data cleaning and different approaches for data cleaning platforms


Author(s):  
Arnelle Etienne ◽  
Tarana Laroia ◽  
Harper Weigle ◽  
Amber Afelin ◽  
Shawn K Kelly ◽  
...  

AbstractEEG is a powerful and affordable brain sensing and imaging tool used extensively for the diagnosis of neurological disorders (e.g. epilepsy), brain computer interfacing, and basic neuroscience. Unfortunately, most EEG electrodes and systems are not designed to accommodate coarse and curly hair common in individuals of African descent. This can lead to poor quality data that might be discarded in scientific studies after recording from a broader population set, and for clinical diagnoses, lead to an uncomfortable and/or emotionally taxing experience, and, in the worst cases, misdiagnosis. In this work, we design a system to explicitly accommodate coarse and curly hair, and demonstrate that, across time, our electrodes, in conjunction with appropriate braiding, attain substantially (~10x) lower impedance than state-of-the-art systems. This builds on our prior work that demonstrated that braiding hair in patterns consistent with the clinical standard 10-20 arrangement leads to improved impedance with existing systems.


2020 ◽  
Vol 17 (1) ◽  
pp. 253-269
Author(s):  
Alaoui El ◽  
Fazziki El ◽  
Fatima Ennaji ◽  
Mohamed Sadgal

The ubiquity of mobile devices and their advanced features have increased the use of crowdsourcing in many areas, such as the mobility in the smart cities. With the advent of high-quality sensors on smartphones, online communities can easily collect and share information. These information are of great importance for the institutions, which must analyze the facts by facilitating the data collecting on crimes and criminals, for example. This paper proposes an approach to develop a crowdsensing framework allowing a wider collaboration between the citizens and the authorities. In addition, this framework takes advantage of an objectivity analysis to ensure the participants? credibility and the information reliability, as law enforcement is often affected by unreliable and poor quality data. In addition, the proposed framework ensures the protection of users' private data through a de-identification process. Experimental results show that the proposed framework is an interesting tool to improve the quality of crowdsensing information in a government context.


2020 ◽  
Vol 182 ◽  
pp. 127-134
Author(s):  
Zhonghyun Kim ◽  
Heewon Jeong ◽  
Sora Shin ◽  
Jinho Jung ◽  
Joon Ha Kim ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document