Big Data Retrieval Using Locality-Sensitive Hashing with Document-Based NoSQL Database

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.

Download Full-text

P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data

10.1109/bigdata52589.2021.9671881 ◽

2021 ◽

Author(s):

Yikai Huang ◽

Zhili Yao ◽

Jianlin Feng

Keyword(s):

Big Data ◽

Locality Sensitive Hashing

Download Full-text

Building OLAP Cubes From Columnar NoSQL Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch006 ◽

2019 ◽

pp. 129-157

Author(s):

Khaled Dehdouh

Keyword(s):

Big Data ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Data Cubes ◽

Nosql Database ◽

Oriented Approach

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

Architecture for Big Data Storage in Different Cloud Deployment Models

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch009 ◽

2021 ◽

pp. 178-208

Author(s):

Chandu Thota ◽

Gunasekaran Manogaran ◽

Daphne Lopez ◽

Revathi Sundarasekar

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Storage ◽

High Performance ◽

Data Services ◽

Big Data Applications ◽

Nosql Database ◽

Amazon Web Services ◽

Product Domains ◽

Scalable Database

Cloud Computing is a new computing model that distributes the computation on a resource pool. The need for a scalable database capable of expanding to accommodate growth has increased with the growing data in web world. More familiar Cloud Computing vendors such as Amazon Web Services, Microsoft, Google, IBM and Rackspace offer cloud based Hadoop and NoSQL database platforms to process Big Data applications. Variety of services are available that run on top of cloud platforms freeing users from the need to deploy their own systems. Nowadays, integrating Big Data and various cloud deployment models is major concern for Internet companies especially software and data services vendors that are just getting started themselves. This chapter proposes an efficient architecture for integration with comprehensive capabilities including real time and bulk data movement, bi-directional replication, metadata management, high performance transformation, data services and data quality for customer and product domains.

Download Full-text

Advanced Metaheuristic Methods in Big Data Retrieval and Analytics

10.4018/978-1-5225-7338-8 ◽

2019 ◽

Keyword(s):

Big Data ◽

Data Retrieval ◽

Metaheuristic Methods

Download Full-text

Toward Automatic Generation of Column-Oriented NoSQL Databases in Big Data Context

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i09.10433 ◽

2019 ◽

Vol 15 (09) ◽

pp. 4

Author(s):

Redouane Esbai ◽

Fouad Elotmani ◽

Fatima Zahra Belkadi

Keyword(s):

Big Data ◽

Relational Databases ◽

Automatic Generation ◽

Transformation Model ◽

Model Driven ◽

Transformation Rules ◽

New Concepts ◽

Nosql Database ◽

Meta Object Facility ◽

View Transformation

<span>The growth of application architectures in all areas (e.g. Astrology, Meteorology, E-commerce, social network, etc.) has resulted in an exponential increase in data volumes, now measured in Petabytes. Managing these volumes of data has become a problem that relational databases are no longer able to handle because of the acidity properties. In response to this scaling up, new concepts have emerged such as NoSQL. In this paper, we show how to design and apply transformation rules to migrate from an SQL relational database to a Big Data solution within NoSQL. For this, we use the Model Driven Architecture (MDA) and the transformation languages like as MOF 2.0 QVT (Meta-Object Facility 2.0 Query-View-Transformation) and Acceleo which define the meta-models for the development of transformation model. The transformation rules defined in this work can generate, from the class diagram, a CQL code for creation column-oriented NoSQL database.</span>

Download Full-text

Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

Applied Sciences ◽

10.3390/app8091514 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1514 ◽

Cited By ~ 1

Author(s):

Bao Chang ◽

Hsiu-Fen Tsai ◽

Yun-Da Lee

Keyword(s):

Big Data ◽

Open Source Software ◽

High Performance ◽

Low Cost ◽

Data Retrieval ◽

Fast Response ◽

Disk Cache ◽

Memory Cache ◽

Medium Enterprise ◽

Effectiveness And Efficiency

This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment.

Download Full-text

Big Data Retrieval Using Locality-Sensitive Hashing with Document-Based NoSQL Database

Big Data retrieval techniques based on Hash Indexing and MapReduce approach with NoSQL Database

Efficient Optimized Strategy of Big Data Retrieval

A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data

Building OLAP Cubes From Columnar NoSQL Data Warehouses

Architecture for Big Data Storage in Different Cloud Deployment Models

Advanced Metaheuristic Methods in Big Data Retrieval and Analytics

Toward Automatic Generation of Column-Oriented NoSQL Databases in Big Data Context

Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

Export Citation Format