dataset shift
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 27)

H-INDEX

7
(FIVE YEARS 2)

Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6774
Author(s):  
Doyoung Kim ◽  
Inwoong Lee ◽  
Dohyung Kim ◽  
Sanghoon Lee

The development of action recognition models has shown great performance on various video datasets. Nevertheless, because there is no rich data on target actions in existing datasets, it is insufficient to perform action recognition applications required by industries. To satisfy this requirement, datasets composed of target actions with high availability have been created, but it is difficult to capture various characteristics in actual environments because video data are generated in a specific environment. In this paper, we introduce a new ETRI-Activity3D-LivingLab dataset, which provides action sequences in actual environments and helps to handle a network generalization issue due to the dataset shift. When the action recognition model is trained on the ETRI-Activity3D and KIST SynADL datasets and evaluated on the ETRI-Activity3D-LivingLab dataset, the performance can be severely degraded because the datasets were captured in different environments domains. To reduce this dataset shift between training and testing datasets, we propose a close-up of maximum activation, which magnifies the most activated part of a video input in detail. In addition, we present various experimental results and analysis that show the dataset shift and demonstrate the effectiveness of the proposed method.


2021 ◽  
Author(s):  
Dipanwita Sinha Mukherjee ◽  
Divyanshy Bhandari ◽  
Naveen Yeri

<div>Any predictive software deployed with this hypothesis that test data distribution will not differ from training data distribution. Real time scenario does not follow this rule, which results inconsistent and non-transferable observation in various cases. This makes the dataset shift, a growing concern. In this paper, we’ve explored the recent concept of Label shift detection and classifier correction with the help of Black Box shift detection(BBSD), Black Box shift estimation(BBSE) and Black Box shift correction(BBSC). Digits dataset from ”sklearn” and ”LogisticRegression” classifier have been used for this investigation. Knock out shift was clearly detected by applying Kolmogorov–Smirnov test for BBSD. Performance of the classifier got improved after applying BBSE and BBSC from 91% to 97% in terms of overall accuracy.</div>


2021 ◽  
Author(s):  
Dipanwita Sinha Mukherjee ◽  
Divyanshy Bhandari ◽  
Naveen Yeri

<div>Any predictive software deployed with this hypothesis that test data distribution will not differ from training data distribution. Real time scenario does not follow this rule, which results inconsistent and non-transferable observation in various cases. This makes the dataset shift, a growing concern. In this paper, we’ve explored the recent concept of Label shift detection and classifier correction with the help of Black Box shift detection(BBSD), Black Box shift estimation(BBSE) and Black Box shift correction(BBSC). Digits dataset from ”sklearn” and ”LogisticRegression” classifier have been used for this investigation. Knock out shift was clearly detected by applying Kolmogorov–Smirnov test for BBSD. Performance of the classifier got improved after applying BBSE and BBSC from 91% to 97% in terms of overall accuracy.</div>


2021 ◽  
Author(s):  
Dipanwita Sinha Mukherjee ◽  
Divyanshy Bhandari ◽  
Naveen Yeri

<div>Any predictive software deployed with this hypothesis that test data distribution will not differ from training data distribution. Real time scenario does not follow this rule, which results inconsistent and non-transferable observation in various cases. This makes the dataset shift, a growing concern. In this paper, we’ve explored the recent concept of Label shift detection and classifier correction with the help of Black Box shift detection(BBSD), Black Box shift estimation(BBSE) and Black Box shift correction(BBSC). Digits dataset from ”sklearn” and ”LogisticRegression” classifier have been used for this investigation. Knock out shift was clearly detected by applying Kolmogorov–Smirnov test for BBSD. Performance of the classifier got improved after applying BBSE and BBSC from 91% to 97% in terms of overall accuracy.</div>


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Jérôme Dockès ◽  
Gaël Varoquaux ◽  
Jean-Baptiste Poline

Abstract Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning–extracted biomarkers, as well as detection and correction strategies.


2021 ◽  
Vol 12 (04) ◽  
pp. 808-815
Author(s):  
Lin Lawrence Guo ◽  
Stephen R. Pfohl ◽  
Jason Fries ◽  
Jose Posada ◽  
Scott Lanyon Fleming ◽  
...  

Abstract Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. Methods Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. Results Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. Conclusion There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.


2021 ◽  
Vol 385 (3) ◽  
pp. 283-286
Author(s):  
Samuel G. Finlayson ◽  
Adarsh Subbaswamy ◽  
Karandeep Singh ◽  
John Bowers ◽  
Annabel Kupke ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document