Big Data Retrieval Using Locality-Sensitive Hashing with Document-Based NoSQL Database

2021 ◽  
pp. 1-10
Author(s):  
N.R. Gayathiri ◽  
A.M. Natarajan
2021 ◽  
Vol 14 (2) ◽  
pp. 26
Author(s):  
Na Li ◽  
Lianguan Huang ◽  
Yanling Li ◽  
Meng Sun

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.


Author(s):  
Khaled Dehdouh

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


Author(s):  
Chandu Thota ◽  
Gunasekaran Manogaran ◽  
Daphne Lopez ◽  
Revathi Sundarasekar

Cloud Computing is a new computing model that distributes the computation on a resource pool. The need for a scalable database capable of expanding to accommodate growth has increased with the growing data in web world. More familiar Cloud Computing vendors such as Amazon Web Services, Microsoft, Google, IBM and Rackspace offer cloud based Hadoop and NoSQL database platforms to process Big Data applications. Variety of services are available that run on top of cloud platforms freeing users from the need to deploy their own systems. Nowadays, integrating Big Data and various cloud deployment models is major concern for Internet companies especially software and data services vendors that are just getting started themselves. This chapter proposes an efficient architecture for integration with comprehensive capabilities including real time and bulk data movement, bi-directional replication, metadata management, high performance transformation, data services and data quality for customer and product domains.


Author(s):  
Redouane Esbai ◽  
Fouad Elotmani ◽  
Fatima Zahra Belkadi

<span>The growth of application architectures in all areas (e.g. Astrology, Meteorology, E-commerce, social network, etc.) has resulted in an exponential increase in data volumes, now measured in Petabytes. Managing these volumes of data has become a problem that relational databases are no longer able to handle because of the acidity properties. In response to this scaling up, new concepts have emerged such as NoSQL. In this paper, we show how to design and apply transformation rules to migrate from an SQL relational database to a Big Data solution within NoSQL. For this, we use the Model Driven Architecture (MDA) and the transformation languages like as MOF 2.0 QVT (Meta-Object Facility 2.0 Query-View-Transformation) and Acceleo which define the meta-models for the development of transformation model. The transformation rules defined in this work can generate, from the class diagram, a CQL code for creation column-oriented NoSQL database.</span>


2018 ◽  
Vol 8 (9) ◽  
pp. 1514 ◽  
Author(s):  
Bao Chang ◽  
Hsiu-Fen Tsai ◽  
Yun-Da Lee

This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment.


Sign in / Sign up

Export Citation Format

Share Document