scholarly journals Data Representation in Big data via succinct data structures

2017 ◽  
Vol 9 (02) ◽  
Author(s):  
Vinesh Kumar ◽  
Jayant Shekhar ◽  
Sunil Kumar

Data Representation in memory is one of the tasks in Big data. Data representation includes several types of tree data structures through the system can access accurate and efficient data in big data. Succinct data structures can play important role in data representation while data in big-data is processed in main memory. Data representation is a very complex problem in Big Data.We proposed some solution of problems of data representation in Big data. Data processing in big data can be utilized to take a decision on data mining. We know the function and rules for query processing. We have to either change the method of processor we can change the way of representation. In this paper, different kind of tree data structures is presented for data representation in main memory of computer system for big data by using succinct data structures. Here we first compare all data structures by the table. Each method has different space and time complexity. We know that Big data information services increasing day by day. So space complexity of succinct data structures is becoming very popular in practice in this era.

Author(s):  
Vinesh kumar ◽  
Dr. Amit Asthana ◽  
Sunil Kumar ◽  
Dr. Jayant Shekhar

Author(s):  
Vinesh Kumar ◽  
Amit Asthana ◽  
Sunil Kumar ◽  
Sunil Kumar

2020 ◽  
Vol 31 (3) ◽  
pp. 67-82
Author(s):  
Akhilesh Bajaj ◽  
Wade Bick

Transaction processing systems are primarily based on the relational model of data and offer the advantages of decades of research and experience in enforcing data quality through integrity constraints, allowing concurrent access and supporting recoverability. From a performance standpoint, they offer joins-based query optimization and data structures to promote fast reads and writes, but are usually vertically scalable from a hardware standpoint. NoSQL (Not Only SQL) systems follow different data representation formats than relations, such as key-value pairs, graphs, documents or column-families. They offer a flexible data representation format as well as horizontal hardware scalability so that Big Data can be processed in real time. In this review article, we review recent research on each type of system, and then discuss how teaching of NoSQL may be incorporated into traditional undergraduate database courses in information systems curricula.


2019 ◽  
Vol 13 (2) ◽  
pp. 227-236
Author(s):  
Tetsuo Shibuya

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.


2021 ◽  
pp. 121-147
Author(s):  
Kazuki Ishiyama ◽  
Kunihiko Sadakane

AbstractWe first review existing space-efficient data structures for the orthogonal range search problem. Then, we propose two improved data structures, the first of which has better query time complexity than the existing structures and the second of which has better space complexity that matches the information-theoretic lower bound.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jeongmin Bae ◽  
Hajin Jeon ◽  
Min-Soo Kim

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.


2021 ◽  
pp. 1-13
Author(s):  
Yikai Zhang ◽  
Yong Peng ◽  
Hongyu Bian ◽  
Yuan Ge ◽  
Feiwei Qin ◽  
...  

Concept factorization (CF) is an effective matrix factorization model which has been widely used in many applications. In CF, the linear combination of data points serves as the dictionary based on which CF can be performed in both the original feature space as well as the reproducible kernel Hilbert space (RKHS). The conventional CF treats each dimension of the feature vector equally during the data reconstruction process, which might violate the common sense that different features have different discriminative abilities and therefore contribute differently in pattern recognition. In this paper, we introduce an auto-weighting variable into the conventional CF objective function to adaptively learn the corresponding contributions of different features and propose a new model termed Auto-Weighted Concept Factorization (AWCF). In AWCF, on one hand, the feature importance can be quantitatively measured by the auto-weighting variable in which the features with better discriminative abilities are assigned larger weights; on the other hand, we can obtain more efficient data representation to depict its semantic information. The detailed optimization procedure to AWCF objective function is derived whose complexity and convergence are also analyzed. Experiments are conducted on both synthetic and representative benchmark data sets and the clustering results demonstrate the effectiveness of AWCF in comparison with the related models.


Sign in / Sign up

Export Citation Format

Share Document