Workload-Driven Horizontal Partitioning and Pruning for Large HTAP Systems

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.

Download Full-text

Privacy Preserving Horizontal Partitioning of Outsourced Database for Frequent Pattern Mining Using Paillier

2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA) ◽

10.1109/iccubea.2017.8463910 ◽

2017 ◽

Author(s):

Manasi Dhage ◽

S.V. Kedar

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Privacy Preserving ◽

Frequent Pattern ◽

Horizontal Partitioning ◽

Outsourced Database

Download Full-text

A Genetic Algorithm for Selecting Horizontal Fragments

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch142 ◽

2011 ◽

pp. 920-925

Author(s):

Ladjel Bellatreche

Keyword(s):

Database Systems ◽

Data Partitioning ◽

Materialized Views ◽

Access Path ◽

Multiple Dimensions ◽

Physical Database Design ◽

Join Queries ◽

Speed Up ◽

Disjoint Sets ◽

Horizontal Partitioning

Decision support applications require complex queries, e.g., multi way joins defining on huge warehouses usually modelled using star schemas, i.e., a fact table and a set of data dimensions (Papadomanolakis & Ailamaki, 2004). Star schemas have an important property in terms of join operations between dimensions tables and the fact table (i.e., the fact table contains foreign keys for each dimension). None join operations between dimension tables. Joins in data warehouses (called star join queries) are particularly expensive because the fact table (the largest table in the warehouse by far) participates in every join and multiple dimensions are likely to participate in each join. To speed up star join queries, many optimization structures were proposed: redundant structures (materialized views and advanced index schemes) and non redundant structures (data partitioning and parallel processing). Recently, data partitioning is known as an important aspect of physical database design (Sanjay, Narasayya & Yang, 2004; Papadomanolakis & Ailamaki, 2004). Two types of data partitioning are available (Özsu & Valduriez, 1999): vertical and horizontal partitioning. Vertical partitioning allows tables to be decomposed into disjoint sets of columns. Horizontal partitioning allows tables, materialized views and indexes to be partitioned into disjoint sets of rows that are physically stored and usually accessed separately. Contrary to redundant structures, data partitioning does not replicate data, thereby reducing storage requirement and minimizing maintenance overhead. In this paper, we concentrate only on horizontal data partitioning (HP). HP may affect positively (1) query performance, by performing partition elimination: if a query includes a partition key as a predicate in the WHERE clause, the query optimizer will automatically route the query to only relevant partitions and (2) database manageability: for instance, by allocating partitions in different machines or by splitting any access paths: tables, materialized views, indexes, etc. Most of database systems allow three methods to perform the HP using PARTITION statement: RANGE, HASH and LIST (Sanjay, Narasayya & Yang, 2004). In the range partitioning, an access path (table, view, and index) is split according to a range of values of a given set of columns. The hash mode decomposes the data according to a hash function (provided by the system) applied to the values of the partitioning columns. The list partitioning splits a table according to the listed values of a column. These methods can be combined to generate composite partitioning. Oracle currently supports range-hash and range-list composite partitioning using PARTITION - SUBPARTITION statement. The following SQL statement shows an example of fragmenting a table Student using range partitioning.

Download Full-text

Horizontal partitioning of very-large data warehouses under dynamically-changing query workloads via incremental algorithms

Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC '13 ◽

10.1145/2480362.2480406 ◽

2013 ◽

Cited By ~ 5

Author(s):

Ladjel Bellatreche ◽

Rima Bouchakri ◽

Alfredo Cuzzocrea ◽

Sofian Maabout

Keyword(s):

Large Data ◽

Data Warehouses ◽

Incremental Algorithms ◽

Horizontal Partitioning

Download Full-text

Horizontal partitioning

Proceedings of the 1988 ACM sixteenth annual conference on Computer science - CSC '88 ◽

10.1145/322609.322619 ◽

1988 ◽

Author(s):

Jan E. Bond

Keyword(s):

Horizontal Partitioning

Download Full-text

Morphometric relief features of Kremenets Mountains

Visnyk of the Lviv University. Series Geography ◽

10.30970/vgg.2015.49.8509 ◽

2015 ◽

pp. 3-13 ◽

Cited By ~ 2

Author(s):

Andrii Bermes

Keyword(s):

Spatial Modelling ◽

Morphometric Parameters ◽

Gis Analysis ◽

Gis Technologies ◽

Vertical Partitioning ◽

Digital Elevation ◽

Morphometric Features ◽

Horizontal Partitioning ◽

Geomorphological Features ◽

Morphogenetic Analysis

Geomorphological structure and morphometric features of Kremenets Mountains are determined. The differences in geomorphic structure, morphometric parameters of individual sections of the study area are highlighted. The opportunity of the modelling of morphometric parameters using GIS technologies is considered. Certain regularities in the distribution of morphometric parameters on investigated area are revealed. Morphometric data processing and the construction of a series of morphometric maps using GIS-analysis and spatial modelling for Kremenets Mountains are done. A number of basic morphometric maps of the territory of Kremenets Mountains are constructed, namely horizontal and vertical partitioning of the territory, steepness of slopes and slope exposure. Based on the constructed maps certain regularities of geomorphological features of the territory of Kremenets Mountains, morphological features of the relief components are detected. The value of morphometric parameters could be used in complex morphogenetic analysis of the area of study. Key words: Kremenets Mountains, morphometric analysis, morphometric parameters, watershed, relict hills, GIS (geographic information systems), digital elevation models, horizontal partitioning, vertical partitioning, slopes, slope exposure.

Download Full-text

Vertical and Horizontal Partitioning in Data Stream Regression Ensembles

2019 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2019.8852244 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jean Paul Barddal

Keyword(s):

Data Stream ◽

Horizontal Partitioning

Download Full-text

Database Sharding

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2015040103 ◽

2015 ◽

Vol 5 (2) ◽

pp. 36-52 ◽

Cited By ~ 8

Author(s):

Sikha Bagui ◽

Loi Tang Nguyen

Keyword(s):

Big Data ◽

Fault Tolerance ◽

Distributed Database ◽

Database System ◽

High Availability ◽

Effective Management ◽

Large Databases ◽

Distributed Database System ◽

Horizontal Partitioning

In this paper, the authors present an architecture and implementation of a distributed database system using sharding to provide high availability, fault-tolerance, and scalability of large databases in the cloud. Sharding, or horizontal partitioning, is used to disperse the data among the data nodes located on commodity servers for effective management of big data on the cloud.

Download Full-text

Horizontal Partitioning of Multimedia Databases Using Hierarchical Agglomerative Clustering

Nature-Inspired Computation and Machine Learning - Lecture Notes in Computer Science ◽

10.1007/978-3-319-13650-9_27 ◽

2014 ◽

pp. 296-309 ◽

Cited By ~ 3

Author(s):

Lisbeth Rodríguez-Mazahua ◽

Giner Alor-Hernández ◽

Ma. Antonieta Abud-Figueroa ◽

S. Gustavo Peláez-Camarena

Keyword(s):

Multimedia Databases ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Horizontal Partitioning

Download Full-text

Primary and Referential Horizontal Partitioning Selection Problems

Integrations of Data Warehousing, Data Mining and Database Technologies ◽

10.4018/978-1-60960-537-7.ch011 ◽

2011 ◽

pp. 258-286

Author(s):

Ladjel Bellatreche ◽

Kamel Boukhalfa ◽

Pascal Richard

Keyword(s):

Data Warehouse ◽

Cost Model ◽

Experimental Studies ◽

Optimal Solution ◽

Hill Climbing ◽

Data Warehouses ◽

Data Set ◽

Primary Fragmentation ◽

Partition Dimension ◽

Horizontal Partitioning

Horizontal partitioning has evolved significantly in recent years and widely advocated by the academic and industrial communities. Horizontal Partitioning affects positively query performance, database manageability and availability. Two types of horizontal partitioning are supported: primary and referential. Horizontal fragmentation in the context of relational data warehouses is to partition dimension tables by primary fragmentation then fragmenting the fact table by referential fragmentation. This fragmentation can generate a very large number of fragments which may make the maintenance task very complicated. In this paper, we first focus on the evolution of horizontal partitioning in commercial DBMS motivated by decision support applications. Secondly, we give a formalization of the referential fragmentation schema selection problem in the data warehouse and we study its hardness to select an optimal solution. Due to its high complexity, we develop two algorithms: hill climbing and simulated annealing with several variants to select a near optimal partitioning schema. We present ParAdmin, an advisor tool assisting administrators to use primary and referential partitioning during the physical design of their data warehouses. Finally, extensive experimental studies are conducted using the data set of APB1 benchmark to compare the quality the proposed algorithms using a mathematical cost model. Based on these experiments, some recommendations are given to ensure the well use of horizontal partitioning.

Download Full-text