scholarly journals Resilience of clinical text de-identified with “hiding in plain sight” to hostile reidentification attacks by human readers

2020 ◽  
Vol 27 (9) ◽  
pp. 1374-1382
Author(s):  
David S Carrell ◽  
Bradley A Malin ◽  
David J Cronkite ◽  
John S Aberdeen ◽  
Cheryl Clark ◽  
...  

Abstract Objective Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this “residual PII problem.” HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII. Materials and Methods Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers. Results Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers. Discussion and Conclusions Approximately 70% of leaked PII “hiding” in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario—more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Author(s):  
Keiko Itano ◽  
Koji Ochiai ◽  
Koichi Takahashi ◽  
Takahide Matsushima ◽  
Hiroshi Asahara

Abstract In many biological laboratories, biologists analyze images and identify cell or organ states manually. There are some problems: lack of human resource and high experimental costs, among others. Identification results vary according to the person. To solve these problems, the process automation of biologists’ operations and quantitative identification are needed. Here, a cell-foci-phenotype identification system is developed by applying image processing and machine learning methods to fluorescent cell images. With this system, cell-foci-phenotype with high accuracy can be predicted and biologists’ efforts in doing image analysis can be reduced.


Author(s):  
M.A. Basyrov ◽  
◽  
A.V. Akinshin ◽  
I.R. Makhmutov ◽  
Yu.D. Kantemirov ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document