Data Representation in Big data via succinct data structures

Data Representation in memory is one of the tasks in Big data. Data representation includes several types of tree data structures through the system can access accurate and efficient data in big data. Succinct data structures can play important role in data representation while data in big-data is processed in main memory. Data representation is a very complex problem in Big Data.We proposed some solution of problems of data representation in Big data. Data processing in big data can be utilized to take a decision on data mining. We know the function and rules for query processing. We have to either change the method of processor we can change the way of representation. In this paper, different kind of tree data structures is presented for data representation in main memory of computer system for big data by using succinct data structures. Here we first compare all data structures by the table. Each method has different space and time complexity. We know that Big data information services increasing day by day. So space complexity of succinct data structures is becoming very popular in practice in this era.

Download Full-text

Data Representation in Big data via Succinct Data Structures

International Journal of Engineering Science and Technology ◽

10.21817/ijest/2018/v10i1/181001013 ◽

2018 ◽

Vol 10 (1) ◽

pp. 21-28 ◽

Cited By ~ 1

Author(s):

Vinesh kumar ◽

Dr. Amit Asthana ◽

Sunil Kumar ◽

Dr. Jayant Shekhar

Keyword(s):

Big Data ◽

Data Structures ◽

Data Representation ◽

Succinct Data Structures

Download Full-text

Compendious and Optimized Succinct Data Structures for Big Data Store

SSRN Electronic Journal ◽

10.2139/ssrn.3170513 ◽

2018 ◽

Cited By ~ 1

Author(s):

Vinesh Kumar ◽

Amit Asthana ◽

Sunil Kumar ◽

Sunil Kumar

Keyword(s):

Big Data ◽

Data Structures ◽

Succinct Data Structures ◽

Data Store

Download Full-text

The Rise of NoSQL Systems

Journal of Database Management ◽

10.4018/jdm.2020070104 ◽

2020 ◽

Vol 31 (3) ◽

pp. 67-82

Author(s):

Akhilesh Bajaj ◽

Wade Bick

Keyword(s):

Big Data ◽

Information Systems ◽

Data Structures ◽

Query Optimization ◽

Transaction Processing ◽

Data Representation ◽

Review Article ◽

Integrity Constraints ◽

Relational Model ◽

Transaction Processing Systems

Transaction processing systems are primarily based on the relational model of data and offer the advantages of decades of research and experience in enforcing data quality through integrity constraints, allowing concurrent access and supporting recoverability. From a performance standpoint, they offer joins-based query optimization and data structures to promote fast reads and writes, but are usually vertically scalable from a hardware standpoint. NoSQL (Not Only SQL) systems follow different data representation formats than relations, such as key-value pairs, graphs, documents or column-families. They offer a flexible data representation format as well as horizontal hardware scalability so that Big Data can be processed in real time. In this review article, we review recent research on each type of system, and then discuss how teaching of NoSQL may be incorporated into traditional undergraduate database courses in information systems curricula.

Download Full-text

Compendious and Succinct Data Structures for Big Data

Advances in Intelligent Systems and Computing - Advances in Computational Intelligence and Communication Technology ◽

10.1007/978-981-15-1275-9_37 ◽

2020 ◽

pp. 457-467

Author(s):

Vinesh Kumar ◽

Akhilesh Kumar Singh ◽

Sharad Pratap Singh

Keyword(s):

Big Data ◽

Data Structures ◽

Succinct Data Structures

Download Full-text

Application-Oriented Succinct Data Structures for Big Data

The Review of Socionetwork Strategies ◽

10.1007/s12626-019-00045-1 ◽

2019 ◽

Vol 13 (2) ◽

pp. 227-236

Author(s):

Tetsuo Shibuya

Keyword(s):

Big Data ◽

Data Structure ◽

Data Structures ◽

Genome Assembly ◽

Original Data ◽

Succinct Data Structures ◽

Space Requirement ◽

Space Reduction ◽

Big Data Applications ◽

Application Specific

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.

Download Full-text

Orthogonal Range Search Data Structures

10.1007/978-981-16-4095-7_8 ◽

2021 ◽

pp. 121-147

Author(s):

Kazuki Ishiyama ◽

Kunihiko Sadakane

Keyword(s):

Lower Bound ◽

Data Structures ◽

Time Complexity ◽

Search Problem ◽

Information Theoretic ◽

Range Search ◽

Existing Structures ◽

Search Data ◽

Efficient Data ◽

Efficient Data Structures

AbstractWe first review existing space-efficient data structures for the orthogonal range search problem. Then, we propose two improved data structures, the first of which has better query time complexity than the existing structures and the second of which has better space complexity that matches the information-theoretic lower bound.

Download Full-text

Representation of Recipe Flow Graphs in Succinct Data Structures

Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities - CEA '19 ◽

10.1145/3326458.3326930 ◽

2019 ◽

Author(s):

Takuya Namiki ◽

Tomonobu Ozaki

Keyword(s):

Data Structures ◽

Succinct Data Structures ◽

Flow Graphs

Download Full-text

Robust Performance of Main Memory Data Structures by Configuration

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3389725 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tiemo Bang ◽

Ismail Oukid ◽

Norman May ◽

Ilia Petrov ◽

Carsten Binnig

Keyword(s):

Data Structures ◽

Main Memory ◽

Robust Performance

Download Full-text

GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments

BMC Bioinformatics ◽

10.1186/s12859-021-04133-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jeongmin Bae ◽

Hajin Jeon ◽

Min-Soo Kim

Keyword(s):

Data Structures ◽

Primer Design ◽

Main Memory ◽

Design Tools ◽

Workload Balancing ◽

Qpcr Analysis ◽

Computational Speed ◽

Entire Sequence ◽

Target Sequences ◽

Coalesced Memory

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.

Download Full-text

Auto-weighted concept factorization for joint feature map and data representation learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200298 ◽

2021 ◽

pp. 1-13

Author(s):

Yikai Zhang ◽

Yong Peng ◽

Hongyu Bian ◽

Yuan Ge ◽

Feiwei Qin ◽

...

Keyword(s):

Objective Function ◽

Optimization Procedure ◽

Feature Space ◽

Representation Learning ◽

Data Representation ◽

Data Sets ◽

Reconstruction Process ◽

Factorization Model ◽

Efficient Data ◽

Concept Factorization

Concept factorization (CF) is an effective matrix factorization model which has been widely used in many applications. In CF, the linear combination of data points serves as the dictionary based on which CF can be performed in both the original feature space as well as the reproducible kernel Hilbert space (RKHS). The conventional CF treats each dimension of the feature vector equally during the data reconstruction process, which might violate the common sense that different features have different discriminative abilities and therefore contribute differently in pattern recognition. In this paper, we introduce an auto-weighting variable into the conventional CF objective function to adaptively learn the corresponding contributions of different features and propose a new model termed Auto-Weighted Concept Factorization (AWCF). In AWCF, on one hand, the feature importance can be quantitatively measured by the auto-weighting variable in which the features with better discriminative abilities are assigned larger weights; on the other hand, we can obtain more efficient data representation to depict its semantic information. The detailed optimization procedure to AWCF objective function is derived whose complexity and convergence are also analyzed. Experiments are conducted on both synthetic and representative benchmark data sets and the clustering results demonstrate the effectiveness of AWCF in comparison with the related models.

Download Full-text