scholarly journals Optimizing Joins in a Map-Reduce for Data Storage and Retrieval Performance Analysis of Query Processing in HDFS for Big Data

Author(s):  
Dr.Sudhakar S ◽  
2017 ◽  
Vol 10 (3) ◽  
pp. 597-602
Author(s):  
Jyotindra Tiwari ◽  
Dr. Mahesh Pawar ◽  
Dr. Anjajana Pandey

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.


Author(s):  
Venkat Gudivada ◽  
Amy Apon ◽  
Dhana L. Rao

Special needs of Big Data applications have ushered in several new classes of systems for data storage and retrieval. Each class targets the needs of a category of Big Data application. These systems differ greatly in their data models and system architecture, approaches used for high availability and scalability, query languages and client interfaces provided. This chapter begins with a description of the emergence of Big Data and data management requirements of Big Data applications. Several new classes of database management systems have emerged recently to address the needs of Big Data applications. NoSQL is an umbrella term used to refer to these systems. Next, a taxonomy for NoSQL systems is developed and several NoSQL systems are classified under this taxonomy. Characteristics of representative systems in each class are also discussed. The chapter concludes by indicating the emerging trends of NoSQL systems and research issues.


Author(s):  
Nirav Bhatt ◽  
Amit Thakkar

In the era of big data, large amounts of data are generated from different areas like education, business, stock market, healthcare, etc. Most of the available data from these areas are unstructured, which is large and complex. As healthcare industries become value-based from volume-based, there is a need to have specialized tools and methods to handle it. The traditional methods for data storage and retrieval can be used when data is structured in nature. Big data analytics provide technologies to store large amounts of complex healthcare data. It is believed that there is an enormous opportunity to improve lives by applying big data in the healthcare industry. No industry counts more than healthcare as it is a matter of life and death. Due to rapid development of big data tools and technologies, it is possible to improve disease diagnosis more efficiently than ever before, but security and privacy are two major issues when dealing with big data in the healthcare industry.


Author(s):  
VAIBHAV SUNEJA

Big data storage and retrieval is cause of concern for future. This paper proposes to store data by organizing nucleotides (adenine (A) and thymine (T)) to represent binary 0’s and 1’s. Small fragments of high molecular DNA can be achieved by chain termination method and by shotgun sequencing method, we can select the fragments containing A’s and T’s in order we want. This paper demonstrates a python script, which can produce A and T sequence from the sequence of data as input which can be used while using shotgun sequencing for selecting/discarding of strands, script can also be used to query the DNA database by integration with DNA sequencing method. The data will not be written on the nucleotide but nucleotide sequence will be modified to represent the data. This paper further illustrates how sequence of nucleotides can be arranged as logical table with a proper header and how the python script can be used to save and retrieve the data from this table. The storage and retrieval of 24 bits of data can easily be completed in 15-20 minutes under controlled lab conditions which includes manual and mechanical effort. This paper proves how by automation and robotic arms, this delay can be reduced further and how to overcome challenges that can come for preserving the DNA strands. This approach, can lead to development of DNA data warehouse where there will be infinite storage space as the DNA can be obtained virtually free of cost from any living thing.


2021 ◽  
Vol 13 (16) ◽  
pp. 3208
Author(s):  
Yinyi Cheng ◽  
Kefa Zhou ◽  
Jinlin Wang ◽  
Philippe De Maeyer ◽  
Tim Van de Voorde ◽  
...  

The spatial calculation of vector data is crucial for geochemical analysis in geological big data. However, large volumes of geochemical data make for inefficient management. Therefore, this study proposed a shapefile storage method based on MongoDB in GeoJSON form (SSMG) and a shapefile storage method based on PostgreSQL with open location code (OLC) geocoding (SSPOG) to solve the problem of low efficiency of electronic form management. The SSMG method consists of a JSONification tier and a cloud storage tier, while the SSPOG method consists of a geocoding tier, an extension tier, and a storage tier. Using MongoDB and PostgreSQL as databases, this study achieved two different types of high-throughput and high-efficiency methods for geochemical data storage and retrieval. Xinjiang, the largest province in China, was selected as the study area in which to test the proposed methods. Using geochemical data from shapefile as a data source, several experiments were performed to improve geochemical data storage efficiency and achieve efficient retrieval. The SSMG and SSPOG methods can be applied to improve geochemical data storage using different architectures, so as to achieve management of geochemical data organization in an efficient way, through time consumed and data compression ratio (DCR), in order to better support geological big data. The purpose of this study was to find ways to build a storage method that can improve the speed of geochemical data insertion and retrieval by using excellent big data technology to help us efficiently solve problem of geochemical data preprocessing and provide support for geochemical analysis.


Sign in / Sign up

Export Citation Format

Share Document