Secure Multi-keyword Similarity Search Over Encrypted Data With Security Improvement

Hussein Mohammed; Ayad Abdulsada

doi:10.37917/ijeee.17.2.1

Secure Multi-keyword Similarity Search Over Encrypted Data With Security Improvement

Iraqi Journal for Electrical And Electronic Engineering ◽

10.37917/ijeee.17.2.1 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-10

Author(s):

Hussein Mohammed ◽

Ayad Abdulsada

Keyword(s):

Real World ◽

Similarity Search ◽

Keyword Search ◽

Bloom Filter ◽

Locality Sensitive Hashing ◽

Real World Data ◽

Encrypted Data ◽

Search Results ◽

Security Properties ◽

Cloud Servers

Searchable encryption (SE) is an interesting tool that enables clients to outsource their encrypted data into external cloud servers with unlimited storage and computing power and gives them the ability to search their data without decryption. The current solutions of SE support single-keyword search making them impractical in real-world scenarios. In this paper, we design and implement a multi-keyword similarity search scheme over encrypted data by using locality-sensitive hashing functions and Bloom filter. The proposed scheme can recover common spelling mistakes and enjoys enhanced security properties such as hiding the access and search patterns but with costly latency. To support similarity search, we utilize an efficient bi-gram-based method for keyword transformation. Such a method improves the search results accuracy. Our scheme employs two non-colluding servers to break the correlation between search queries and search results. Experiments using real-world data illustrate that our scheme is practically efficient, secure, and retains high accuracy.

Download Full-text

Verification of statistical oncological endpoints on encrypted data: Confirming the feasibility of real-world data sharing without the need to reveal protected patient information.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e18725 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e18725-e18725

Author(s):

Ravit Geva ◽

Barliz Waissengrin ◽

Dan Mirelman ◽

Felix Bokstein ◽

Deborah T. Blumenthal ◽

...

Keyword(s):

Data Sharing ◽

Real World ◽

Clinical Decision Making ◽

Homomorphic Encryption ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Raw Data ◽

Encrypted Data ◽

World Data

e18725 Background: Healthcare data sharing is important for the creation of diverse and large data sets, supporting clinical decision making, and accelerating efficient research to improve patient outcomes. This is especially vital in the case of real world data analysis. However, stakeholders are reluctant to share their data without ensuring patients’ privacy, proper protection of their data sets and the ways they are being used. Homomorphic encryption is a cryptographic capability that can address these issues by enabling computation on encrypted data without ever decrypting it, so the analytics results are obtained without revealing the raw data. The aim of this study is to prove the accuracy of analytics results and the practical efficiency of the technology. Methods: A real-world data set of colorectal cancer patients’ survival data, following two different treatment interventions, including 623 patients and 24 variables, amounting to 14,952 items of data, was encrypted using leveled homomorphic encryption implemented in the PALISADE software library. Statistical analysis of key oncological endpoints was blindly performed on both the raw data and the homomorphically-encrypted data using descriptive statistics and survival analysis with Kaplan-Meier curves. Results were then compared with an accuracy goal of two decimals. Results: The difference between the raw data and the homomorphically encrypted data results, regarding all variables analyzed was within the pre-determined accuracy range goal, as well as the practical efficiency of the encrypted computation measured by run time, are presented in table. Conclusions: This study demonstrates that data encrypted with Homomorphic Encryption can be statistical analyzed with a precision of at least two decimal places, allowing safe clinical conclusions drawing while preserving patients’ privacy and protecting data owners’ data assets. Homomorphic encryption allows performing efficient computation on encrypted data non-interactively and without requiring decryption during computation time. Utilizing the technology will empower large-scale cross-institution and cross- stakeholder collaboration, allowing safe international collaborations. Clinical trial information: 0048-19-TLV. [Table: see text]

Download Full-text

CO-CLUSTERING BIPARTITE WITH PATTERN PRESERVATION FOR TOPIC EXTRACTION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008003790 ◽

2008 ◽

Vol 17 (01) ◽

pp. 87-107 ◽

Cited By ~ 6

Author(s):

TIANMING HU ◽

CHEW LIM TAN ◽

YONG TANG ◽

SAM YUAN SUNG ◽

HUI XIONG ◽

...

Keyword(s):

Real World ◽

Data Sets ◽

Real World Data ◽

Topic Extraction ◽

World Data ◽

Search Results ◽

Word Clustering ◽

Two Sides ◽

Clustering Validation ◽

Single Cluster

The duality between document and word clustering naturally leads to the consideration of storing the document dataset in a bipartite. With documents and words modeled as vertices on two sides respectively, partitioning such a graph yields a co-clustering of words and documents. The topic of each cluster can then be represented by the top words and documents that have highest within-cluster degrees. However, such claims may fail if top words and documents are selected simply because they are very general and frequent. In addition, for those words and documents across several topics, it may not be proper to assign them to a single cluster. In other words, to precisely capture the cluster topic, we need to identify those micro-sets of words/documents that are similar among themselves and as a whole, representative of their respective topics. Along this line, in this paper, we use hyperclique patterns, strongly affiliated words/documents, to define such micro-sets. We introduce a new bipartite formulation that incorporates both word hypercliques and document hypercliques as super vertices. By co-preserving hyperclique patterns during the clustering process, our experiments on real-world data sets show that better clustering results can be obtained in terms of various external clustering validation measures and the cluster topic can be more precisely identified. Also, the partitioned bipartite with co-preserved patterns naturally lends itself to different clustering-related functions in search engines. To that end, we illustrate such an application, returning clustered search results for keyword queries. We show that the topic of each cluster with respect to the current query can be identified more accurately with the words and documents from the patterns than with those top ones from the standard bipartite formulation.

Download Full-text

Efficient multi-keyword similarity search over encrypted cloud documents

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i1.pp510-518 ◽

2021 ◽

Vol 23 (1) ◽

pp. 510

Author(s):

Ayad I. Abdulsada ◽

Dhafer G. Honi ◽

Salah Al-Darraji

Keyword(s):

Similarity Search ◽

Keyword Search ◽

Service Providers ◽

Hamming Distance ◽

Cloud Service ◽

Sensitive Data ◽

Encrypted Data ◽

High Level ◽

Do So ◽

Similarity Scores

Many organizations and individuals are attracted to outsource their data into remote cloud service providers. To ensure privacy, sensitive data should be encrypted be-fore being hosted. However, encryption disables the direct application of the essential data management operations like searching and indexing. Searchable encryption is acryptographic tool that gives users the ability to search the encrypted data while being encrypted. However, the existing schemes either serve a single exact search that loss the ability to handle the misspelled keywords or multi-keyword search that generate very long trapdoors. In this paper, we address the problem of designing a practical multi-keyword similarity scheme that provides short trapdoors and returns the correct results according to their similarity scores. To do so, each document is translated intoa compressed trapdoor. Trapdoors are generated using key based hash functions to en-sure their privacy. Only authorized users can issue valid trapdoors. Similarity scores of two textual documents are evaluated by computing the Hamming distance between their corresponding trapdoors. A robust security definition is provided together withits proof. Our experimental results illustrate that the proposed scheme improves thesearch efficiency compared to the existing schemes. Further more, it shows a high level of performance.

Download Full-text

Enhance manet usability for encrypted data retrieval from cloud computing

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i1.pp64-74 ◽

2020 ◽

Vol 18 (1) ◽

pp. 64

Author(s):

Fairouz Sher Ali ◽

Hadeel Noori Saad ◽

Falah Hassan Sarhan ◽

Bushra Naaeem

Keyword(s):

Cloud Computing ◽

Mobile Networks ◽

Resource Sharing ◽

Keyword Search ◽

Bloom Filter ◽

Data Retrieval ◽

Mobile Nodes ◽

Sensitive Data ◽

Privacy Concerns ◽

Cloud Servers

<p>Cloud computing has become a revolutionary computing model which provides an economical and flexible strategy for resource sharing and data management. Due to privacy concerns, sensitive data has to be encrypted before being uploaded to the cloud servers. Over the last few years, several keyword searchable encryption works have been described in the literature. However, existing works mostly focus on secure searching using keyword and only retrieve Boolean results that are not yet adequate. On the other hand, poor-resources of mobile networks play an important role on all applications area nowadays. Mobile nodes mostly act as information retrieval end which make it important to address this problem. In this paper, we present a secure keyword search scheme based on the Bloom filter(SKS-BF), which enhances the system’s usability by allowing ranking based on the relevance score of the search results and retrieves the top most relevant files instead of retrieving all the files. Further, the Bloom filter (BFs) can accelerate a search process involving a large number of keywords. Extensive experiments and network simulation confirm the efficiency of our proposed schemes.</p>

Download Full-text

A Survey on Secure Ranked Keyword Search over Outsourced Encrypted Cloud Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35184 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 949-952

Author(s):

Akash Tidke

Keyword(s):

Comparative Study ◽

Keyword Search ◽

Encrypted Data ◽

Cloud Data ◽

Ranked Search ◽

Ranked Keyword Search ◽

Cloud Servers

In this paper we present a survey on keyword based searching algorithms. Various searching techniques are used for retrieving the encrypted data from cloud servers. This survey work involves a comparative study of these keyword based searching algorithms. It concludes that till now multi-keyword ranked search MRSE scheme is the best methodology for searching the encrypted data.

Download Full-text

Extracting Parallel Paragraphs from Common Crawl

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0003 ◽

2017 ◽

Vol 107 (1) ◽

pp. 39-56

Author(s):

Jakub Kúdela ◽

Irena Holubová ◽

Ondřej Bojar

Keyword(s):

Real World ◽

Web Sites ◽

Locality Sensitive Hashing ◽

Web Pages ◽

Large Set ◽

Real World Data ◽

Negligible Amount ◽

Parallel Data ◽

Parallel Texts ◽

The Web

Abstract Most of the current methods for mining parallel texts from the web assume that web pages of web sites share same structure across languages. We believe that there still exists a non-negligible amount of parallel data spread across sources not satisfying this assumption. We propose an approach based on a combination of bivec (a bilingual extension of word2vec) and locality-sensitive hashing which allows us to efficiently identify pairs of parallel segments located anywhere on pages of a given web domain, regardless their structure. We validate our method on realigning segments from a large parallel corpus. Another experiment with real-world data provided by Common Crawl Foundation confirms that our solution scales to hundreds of terabytes large set of web-crawled data.

Download Full-text

MinHash-Based Fuzzy Keyword Search of Encrypted Data across Multiple Cloud Servers

Future Internet ◽

10.3390/fi10050038 ◽

2018 ◽

Vol 10 (5) ◽

pp. 38

Author(s):

Jingsha He ◽

Jianan Wu ◽

Nafei Zhu ◽

Muhammad Salman Pathan

Keyword(s):

Keyword Search ◽

Encrypted Data ◽

Fuzzy Keyword Search ◽

Cloud Servers

Download Full-text

Abstract #946: Assessment of Acromegaly Patients with and Without Diabetes Treated with Lanreotide Depot: 2-Year Real World Data

Endocrine Practice ◽

10.1016/s1530-891x(20)45236-x ◽

2016 ◽

Vol 22 ◽

pp. 219

Author(s):

Roberto Salvatori ◽

Olga Gambetti ◽

Whitney Woodmansee ◽

David Cox ◽

Beloo Mirakhur ◽

...

Keyword(s):

Real World ◽

Real World Data ◽

World Data

Download Full-text

Safety and efficacy of direct acting oral anticoagulants and vitamin K antagonists in nonvalvular atrial fibrillation – a network meta-analysis of real-world data

VASA ◽

10.1024/0301-1526/a000746 ◽

2019 ◽

Vol 48 (2) ◽

pp. 134-147 ◽

Cited By ~ 8

Author(s):

Mirko Hirschl ◽

Michael Kundi

Keyword(s):

Real World ◽

Meta Analysis ◽

Vitamin K Antagonists ◽

Oral Anticoagulants ◽

Efficacy And Safety ◽

Nonvalvular Atrial Fibrillation ◽

Real World Data ◽

Hazard Ratios ◽

Direct Acting ◽

Event Rates

Abstract. Background: In randomized controlled trials (RCTs) direct acting oral anticoagulants (DOACs) showed a superior risk-benefit profile in comparison to vitamin K antagonists (VKAs) for patients with nonvalvular atrial fibrillation. Patients enrolled in such studies do not necessarily reflect the whole target population treated in real-world practice. Materials and methods: By a systematic literature search, 88 studies including 3,351,628 patients providing over 2.9 million patient-years of follow-up were identified. Hazard ratios and event-rates for the main efficacy and safety outcomes were extracted and the results for DOACs and VKAs combined by network meta-analysis. In addition, meta-regression was performed to identify factors responsible for heterogeneity across studies. Results: For stroke and systemic embolism as well as for major bleeding and intracranial bleeding real-world studies gave virtually the same result as RCTs with higher efficacy and lower major bleeding risk (for dabigatran and apixaban) and lower risk of intracranial bleeding (all DOACs) compared to VKAs. Results for gastrointestinal bleeding were consistently better for DOACs and hazard ratios of myocardial infarction were significantly lower in real-world for dabigatran and apixaban compared to RCTs. By a ranking analysis we found that apixaban is the safest anticoagulant drug, while rivaroxaban closely followed by dabigatran are the most efficacious. Risk of bias and heterogeneity was assessed and had little impact on the overall results. Analysis of effect modification could guide the clinical decision as no single DOAC was superior/inferior to the others under all conditions. Conclusions: DOACs were at least as efficacious as VKAs. In terms of safety endpoints, DOACs performed better under real-world conditions than in RCTs. The current real-world data showed that differences in efficacy and safety, despite generally low event rates, exist between DOACs. Knowledge about these differences in performance can contribute to a more personalized medicine.

Download Full-text

Evaluating a Macrocognition Model of Team Collaboration using Real-world Data from the Haiti Relief Effort

PsycEXTRA Dataset ◽

10.1037/e578902012-052 ◽

2011 ◽

Author(s):

Susan G. Hutchins

Keyword(s):

Real World ◽

Real World Data ◽

Team Collaboration ◽

World Data ◽

Relief Effort

Download Full-text