scholarly journals DE-IDENTIFICATION OF PROTECTED HEALTH INFORMATION PHI FROM FREE TEXT IN MEDICAL RECORDS

2019 ◽  
Vol 08 (02) ◽  
pp. 01-11
Author(s):  
Geetha Mahadevaiah ◽  
M.S Dinesh ◽  
Rithesh Sreenivasan ◽  
Sana Moin ◽  
Andre Dekker
2017 ◽  
Vol 27 (11) ◽  
pp. 3304-3324 ◽  
Author(s):  
Luca Bonomi ◽  
Xiaoqian Jiang

Modern medical research relies on multi-institutional collaborations which enhance the knowledge discovery and data reuse. While these collaborations allow researchers to perform analytics otherwise impossible on individual datasets, they often pose significant challenges in the data integration process. Due to the lack of a unique identifier, data integration solutions often have to rely on patient’s protected health information (PHI). In many situations, such information cannot leave the institutions or must be strictly protected. Furthermore, the presence of noisy values for these attributes may result in poor overall utility. While much research has been done to address these challenges, most of the current solutions are designed for a static setting without considering the temporal information of the data (e.g. EHR). In this work, we propose a novel approach that uses non-PHI for linking patient longitudinal data. Specifically, our technique captures the diagnosis dependencies using patterns which are shown to provide important indications for linking patient records. Our solution can be used as a standalone technique to perform temporal record linkage using non-protected health information data or it can be combined with Privacy Preserving Record Linkage solutions (PPRL) when protected health information is available. In this case, our approach can solve ambiguities in results. Experimental evaluations on real datasets demonstrate the effectiveness of our technique.


2020 ◽  
pp. 991-1010 ◽  
Author(s):  
Shweta Yadav ◽  
Asif Ekbal ◽  
Sriparna Saha ◽  
Parth S Pathak ◽  
Pushpak Bhattacharyya

With the rapid increment in the clinical text, de-identification of patient Protected Health Information (PHI) has drawn significant attention in recent past. This aims for automatic identification and removal of the patient Protected Health Information from medical records. This paper proposes a supervised machine learning technique for solving the problem of patient data de- identification. In the current paper, we provide an insight into the de-identification task, its major challenges, techniques to address challenges, detailed analysis of the results and direction of future improvement. We extract several features by studying the properties of the datasets and the domain. We build our model based on the 2014 i2b2 (Informatics for Integrating Biology to the Bedside) de-identification challenge. Experiments show that the proposed system is highly accurate in de-identification of the medical records. The system achieves the final recall, precision and F-score of 95.69%, 99.31%, and 97.46%, respectively.


Author(s):  
Shweta Yadav ◽  
Asif Ekbal ◽  
Sriparna Saha ◽  
Parth S Pathak ◽  
Pushpak Bhattacharyya

With the rapid increment in the clinical text, de-identification of patient Protected Health Information (PHI) has drawn significant attention in recent past. This aims for automatic identification and removal of the patient Protected Health Information from medical records. This paper proposes a supervised machine learning technique for solving the problem of patient data de- identification. In the current paper, we provide an insight into the de-identification task, its major challenges, techniques to address challenges, detailed analysis of the results and direction of future improvement. We extract several features by studying the properties of the datasets and the domain. We build our model based on the 2014 i2b2 (Informatics for Integrating Biology to the Bedside) de-identification challenge. Experiments show that the proposed system is highly accurate in de-identification of the medical records. The system achieves the final recall, precision and F-score of 95.69%, 99.31%, and 97.46%, respectively.


2020 ◽  
pp. 1502-1521
Author(s):  
Shweta Yadav ◽  
Asif Ekbal ◽  
Sriparna Saha ◽  
Parth S Pathak ◽  
Pushpak Bhattacharyya

With the rapid increment in the clinical text, de-identification of patient Protected Health Information (PHI) has drawn significant attention in recent past. This aims for automatic identification and removal of the patient Protected Health Information from medical records. This paper proposes a supervised machine learning technique for solving the problem of patient data de- identification. In the current paper, we provide an insight into the de-identification task, its major challenges, techniques to address challenges, detailed analysis of the results and direction of future improvement. We extract several features by studying the properties of the datasets and the domain. We build our model based on the 2014 i2b2 (Informatics for Integrating Biology to the Bedside) de-identification challenge. Experiments show that the proposed system is highly accurate in de-identification of the medical records. The system achieves the final recall, precision and F-score of 95.69%, 99.31%, and 97.46%, respectively.


2018 ◽  
Vol 2 (6) ◽  
Author(s):  
Hoala Greevy

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy rule uses Protected Health Information (PHI) to define the type of patient information that’s protected by law.1 PHI is an important factor for HIPAA compliance. PHI isn’t confined to medical records and test results. Any information distributed by a business associate that can identify a patient and is used or disclosed to a covered entity during the course of care is considered PHI. Even if that information doesn’t reveal a patient’s medical history, it is still considered PHI.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Beau Norgeot ◽  
Kathleen Muenzen ◽  
Thomas A. Peterson ◽  
Xuancheng Fan ◽  
Benjamin S. Glicksberg ◽  
...  

Author(s):  
Saman Hina ◽  
Raheela Asif ◽  
Syed Abbas Ali

It is imperative in a medical domain that protection of information does not allow an individual to be overlooked. In medical domain, research community encourages use of real-time datasets for research purposes. These real-time datasets contain structured and unstructured (natural language free text) information that can be useful to researchers in various disciplines including computational linguistics. On the other hand, these real-time datasets cannot be distributed without anonymization of Protected Health Information (PHI). The information of PHI (such as Name, age, address, etc.) that can identify an individual is unethical. Therefore, we present a rule-based Natural Language Processing (NLP) anonymization system using a challenging corpus containing medical narratives and ICD-10 codes (medical codes). This anonymization module can be used for pre-processing the corpus containing identifiable information. The corpus used in this research contains '2534' PHIs in '1984' medical records in total. 15% of the labelled corpus was used for improvement of guidelines in the identification and classification of PHI groups and 85% was held for the evaluation. Our anonymization system follows two step process: (1) Identification and cataloging PHIs with four PHI categories ('Patients Name', 'Doctors Name', 'Other Name [Names other than patients and doctors]', 'Place Name'), (2) Anonymization of PHIs by replacing identified PHIs with their respective PHI categories. Our method uses basic language processing, dictionaries, rules and heuristics to identify, classify and anonymize PHIs with PHI categories. We use standard metrics for evaluation and our system outperforms against human annotated gold standard with 100% of F-measure by increasing 39% from baseline results, which proves the reliability of data usage for research.


Sign in / Sign up

Export Citation Format

Share Document