Deep-Confidentiality : An IoT-Enabled Privacy-Preserving Framework for Unstructured Big Biomedical Data

2022 ◽  
Vol 22 (2) ◽  
pp. 1-21
Author(s):  
Syed Atif Moqurrab ◽  
Adeel Anjum ◽  
Abid Khan ◽  
Mansoor Ahmed ◽  
Awais Ahmad ◽  
...  

Due to the Internet of Things evolution, the clinical data is exponentially growing and using smart technologies. The generated big biomedical data is confidential, as it contains a patient’s personal information and findings. Usually, big biomedical data is stored over the cloud, making it convenient to be accessed and shared. In this view, the data shared for research purposes helps to reveal useful and unexposed aspects. Unfortunately, sharing of such sensitive data also leads to certain privacy threats. Generally, the clinical data is available in textual format (e.g., perception reports). Under the domain of natural language processing, many research studies have been published to mitigate the privacy breaches in textual clinical data. However, there are still limitations and shortcomings in the current studies that are inevitable to be addressed. In this article, a novel framework for textual medical data privacy has been proposed as Deep-Confidentiality . The proposed framework improves Medical Entity Recognition (MER) using deep neural networks and sanitization compared to the current state-of-the-art techniques. Moreover, the new and generic utility metric is also proposed, which overcomes the shortcomings of the existing utility metric. It provides the true representation of sanitized documents as compared to the original documents. To check our proposed framework’s effectiveness, it is evaluated on the i2b2-2010 NLP challenge dataset, which is considered one of the complex medical data for MER. The proposed framework improves the MER with 7.8% recall, 7% precision, and 3.8% F1-score compared to the existing deep learning models. It also improved the data utility of sanitized documents up to 13.79%, where the value of the  k is 3.

2015 ◽  
Vol 8 (2) ◽  
pp. 1-15 ◽  
Author(s):  
Aicha Ghoulam ◽  
Fatiha Barigou ◽  
Ghalem Belalem

Information Extraction (IE) is a natural language processing (NLP) task whose aim is to analyse texts written in natural language to extract structured and useful information such as named entities and semantic relations between them. Information extraction is an important task in a diverse set of applications like bio-medical literature mining, customer care, community websites, personal information management and so on. In this paper, the authors focus only on information extraction from clinical reports. The two most fundamental tasks in information extraction are discussed; namely, named entity recognition task and relation extraction task. The authors give details about the most used rule/pattern-based and machine learning techniques for each task. They also make comparisons between these techniques and summarize the advantages and disadvantages of each one.


2021 ◽  
Author(s):  
Syed Usama Khalid Bukhari ◽  
Anum Qureshi ◽  
Adeel Anjum ◽  
Munam Ali Shah

<div> <div> <div> <p>Privacy preservation of high-dimensional healthcare data is an emerging problem. Privacy breaches are becoming more common than before and affecting thousands of people. Every individual has sensitive and personal information which needs protection and security. Uploading and storing data directly to the cloud without taking any precautions can lead to serious privacy breaches. It’s a serious struggle to publish a large amount of sensitive data while minimizing privacy concerns. This leads us to make crucial decisions for the privacy of outsourced high-dimensional healthcare data. Many types of privacy preservation techniques have been presented to secure high-dimensional data while keeping its utility and privacy at the same time but every technique has its pros and cons. In this paper, a novel privacy preservation NRPP model for high-dimensional data is proposed. The model uses a privacy-preserving generative technique for releasing sensitive data, which is deferentially private. The contribution of this paper is twofold. First, a state-of-the-art anonymization model for high-dimensional healthcare data is proposed using a generative technique. Second, achieved privacy is evaluated using the concept of differential privacy. The experiment shows that the proposed model performs better in terms of utility. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Syed Usama Khalid Bukhari ◽  
Anum Qureshi ◽  
Adeel Anjum ◽  
Munam Ali Shah

<div> <div> <div> <p>Privacy preservation of high-dimensional healthcare data is an emerging problem. Privacy breaches are becoming more common than before and affecting thousands of people. Every individual has sensitive and personal information which needs protection and security. Uploading and storing data directly to the cloud without taking any precautions can lead to serious privacy breaches. It’s a serious struggle to publish a large amount of sensitive data while minimizing privacy concerns. This leads us to make crucial decisions for the privacy of outsourced high-dimensional healthcare data. Many types of privacy preservation techniques have been presented to secure high-dimensional data while keeping its utility and privacy at the same time but every technique has its pros and cons. In this paper, a novel privacy preservation NRPP model for high-dimensional data is proposed. The model uses a privacy-preserving generative technique for releasing sensitive data, which is deferentially private. The contribution of this paper is twofold. First, a state-of-the-art anonymization model for high-dimensional healthcare data is proposed using a generative technique. Second, achieved privacy is evaluated using the concept of differential privacy. The experiment shows that the proposed model performs better in terms of utility. </p> </div> </div> </div>


2014 ◽  
Vol 8 (1) ◽  
pp. 13-21 ◽  
Author(s):  
ARKADIUSZ LIBER

Introduction: Medical documentation must be protected against damage or loss, in compliance with its integrity and credibility and the opportunity to a permanent access by the authorized staff and, finally, protected against the access of unauthorized persons. Anonymization is one of the methods to safeguard the data against the disclosure.Aim of the study: The study aims at the analysis of methods of anonymization, the analysis of methods of the protection of anonymized data and the study of a new security type of privacy enabling to control sensitive data by the entity which the data concerns.Material and methods: The analytical and algebraic methods were used.Results: The study ought to deliver the materials supporting the choice and analysis of the ways of the anonymization of medical data, and develop a new privacy protection solution enabling the control of sensitive data by entities whom this data concerns.Conclusions: In the paper, the analysis of solutions of data anonymizing used for medical data privacy protection was con-ducted. The methods, such as k-Anonymity, (X,y)- Anonymity, (a,k)- Anonymity, (k,e)-Anonymity, (X,y)-Privacy, LKC-Privacy, l-Diversity, (X,y)-Linkability, t-Closeness, Confidence Bounding and Personalized Privacy were described, explained and analyzed. The analysis of solutions to control sensitive data by their owners was also conducted. Apart from the existing methods of the anonymization, the analysis of methods of the anonimized data protection was conducted, in particular the methods of: d-Presence, e-Differential Privacy, (d,g)-Privacy, (a,b)-Distributing Privacy and protections against (c,t)-Isolation were analyzed. The author introduced a new solution of the controlled protection of privacy. The solution is based on marking a protected field and multi-key encryption of the sensitive value. The suggested way of fields marking is in accordance to the XML standard. For the encryption (n,p) different key cipher was selected. To decipher the content the p keys of n is used. The proposed solution enables to apply brand new methods for the control of privacy of disclosing sensitive data.


Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 85 ◽  
Author(s):  
Daniela Gîfu ◽  
Diana Trandabăț ◽  
Kevin Cohen ◽  
Jingbo Xia

With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data, is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical references, constructing a Gold standard biomedical corpus or the visualization of biomedical data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Fabio Giachelle ◽  
Ornella Irrera ◽  
Gianmaria Silvello

Abstract Background Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute. Results We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use. Conclusions MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.


2020 ◽  
Vol 17 (01) ◽  
Author(s):  
Sumedha Sachar ◽  
Maïa Dakessian ◽  
Saina Beitari ◽  
Saishree Badrinarayanan

Artificial intelligence (AI) and machine learning (ML) have the potential to revolutionize the healthcare system with their immense potential to diagnose, personalize treatments, and reduce physician burnout. These technologies are highly dependent on large datasets to learn from and require data sharing across organizations for reliable and efficient predictive analysis. However, adoption of AI/ML technologies will require policy imperatives to address the challenges of data privacy, accountability, and bias. To form a regulatory framework, we propose that algorithms should be interpretable and that companies that utilize a black box model for their algorithms be held accountable for the output of their ML systems. To aid in increasing accountability and reducing bias, physicians can be educated about the inherent bias that can be generated from the ML system. We further discuss the potential benefits and disadvantages of existing privacy standards ((Personal Information Protection and Electronic Documents Act) PIPEDA and (Personal Information Protection and Electronic Documents Act) GDPR) at the federal, provincial and territorial levels. We emphasize responsible implementation of AI by ethics, skill-building, and minimizing data privacy breaches while boosting innovation and increased accessibility and interoperability across provinces.


Author(s):  
Bing-Rong Lin ◽  
Dan Kifer

In statistical privacy, a privacy definition is regarded as a set of algorithms that are allowed to process sensitive data. It is often helpful to consider the complementary view that privacy definitions are also contracts that guide the behavior of algorithms that take in sensitive data and produce sanitized data. Historically, data privacy breaches have been the result of fundamental misunderstandings about what a particular privacy definition guarantees. Privacy definitions are often analyzed using a highly targeted approach: a specific attack strategy is evaluated to determine if a specific type of information can be inferred. If the attack works, one can conclude that the privacy definition is too weak. If it doesn't work, one often gains little information about its security (perhaps a slightly different attack would have worked?). Furthermore, these strategies will not identify cases where a privacy definition protects unnecessary pieces of information. On the other hand, technical results concerning generalizable and systematic analyses of privacy are few in number, but such results have significantly advanced our understanding of the design of privacy definitions. We add to this literature with a novel methodology for analyzing the Bayesian properties of a privacy definition. Its goal is to identify precisely the type of information being protected, hence making it easier to identify (and later remove) unnecessary data protections. Using privacy building blocks (which we refer to as axioms), we turn questions about semantics into mathematical problems -- the construction of a consistent normal form and the subsequent construction of the row cone (which is a geometric object that encapsulates Bayesian guarantees provided by a privacy definition). We apply these ideas to study randomized response, FRAPP/PRAM, and several algorithms that add integer-valued noise to their inputs; we show that their privacy properties can be stated in terms of the protection of various notions of parity of a dataset. Randomized response, in particular, provides unnecessarily strong protections for parity, and so we also show how our methodology can be used to relax privacy definitions.


2020 ◽  
Author(s):  
Haeyun Lee ◽  
Young Jun Chai ◽  
Hyunjin Joo ◽  
Kyungsu Lee ◽  
Jae Youn Hwang ◽  
...  

BACKGROUND Federated learning is a decentralized approach to machine learning; it is a training strategy that overcomes medical data privacy regulations and generalizes deep learning algorithms. Federated learning mitigates many systemic privacy risks by sharing only the model and parameters for training, without the need to export existing medical data sets. In this study, we performed ultrasound image analysis using federated learning to predict whether thyroid nodules were benign or malignant. OBJECTIVE The goal of this study was to evaluate whether the performance of federated learning was comparable with that of conventional deep learning. METHODS A total of 8457 (5375 malignant, 3082 benign) ultrasound images were collected from 6 institutions and used for federated learning and conventional deep learning. Five deep learning networks (VGG19, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) were used. Using stratified random sampling, we selected 20% (1075 malignant, 616 benign) of the total images for internal validation. For external validation, we used 100 ultrasound images (50 malignant, 50 benign) from another institution. RESULTS For internal validation, the area under the receiver operating characteristic (AUROC) curve for federated learning was between 78.88% and 87.56%, and the AUROC for conventional deep learning was between 82.61% and 91.57%. For external validation, the AUROC for federated learning was between 75.20% and 86.72%, and the AUROC curve for conventional deep learning was between 73.04% and 91.04%. CONCLUSIONS We demonstrated that the performance of federated learning using decentralized data was comparable to that of conventional deep learning using pooled data. Federated learning might be potentially useful for analyzing medical images while protecting patients’ personal information.


Author(s):  
Qingyu Chen ◽  
Robert Leaman ◽  
Alexis Allot ◽  
Ling Luo ◽  
Chih-Hsuan Wei ◽  
...  

The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)—the branch of artificial intelligence that interprets human language—can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Sign in / Sign up

Export Citation Format

Share Document