Comparative Evaluation of Large Data Model Representation Methods: The Analyst’s Perspective

Author(s):  
Daniel L. Moody
2010 ◽  
Vol 47 (4) ◽  
pp. 208-218 ◽  
Author(s):  
Robert M. Fuller ◽  
Uday Murthy ◽  
Brad A. Schafer

2012 ◽  
Vol 120 (4) ◽  
Author(s):  
D. Capko ◽  
A. Erdeljan ◽  
G. Svenda ◽  
M. Popovic

2018 ◽  
Vol 37 (3) ◽  
pp. 29-49
Author(s):  
Kumar Sharma ◽  
Ujjal Marjit ◽  
Utpal Biswas

Resource Description Framework (RDF) is a commonly used data model in the Semantic Web environment. Libraries and various other communities have been using the RDF data model to store valuable data after it is extracted from traditional storage systems. However, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional data-management tools. This challenge demands a scalable and distributed system that can manage data in parallel. In this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data. The storage system is built on top of Hadoop Distributed File Systems (HDFS) and uses the Apache Parquet format to store data in a compressed form. The experimental evaluation showed that storage requirements were reduced significantly as compared to Jena TDB, Sesame, RDF/XML, and N-Triples file formats. SPARQL queries are processed using Spark SQL to query the compressed data. The experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases.


2014 ◽  
Vol 687-691 ◽  
pp. 2776-2779
Author(s):  
Zhong Kan Xiong ◽  
Pei Zhen Wan ◽  
Jiu Ping Cai

Big data is one of the important development direction of modern information technology, realizing the sharing and analysis of large data will bring immeasurable economic value, but also has a tremendous role in promoting the social. In the age of big data, unified the data representation, large data processing, query, analysis and visualization are the key problem to be solved urgently. In order to provide a standardized framework construction of the large data service platform, this paper designed a large data service oriented architecture user experience. Secondly, in the aspect of data model, in order to achieve high data service for non structured data, the design of the non structured data model based on subject behavior. In large data service model, algebraic model large data services and their composition was established by using process algebra. In large data service applications, detailed retrieval, process analysis and visualization services, and by improving the retrieval accuracy and efficiency of the service in two aspects of measures to achieve the high data service optimization.


2006 ◽  
Vol 10 (2) ◽  
pp. 220-230 ◽  
Author(s):  
J. Kennedy ◽  
R. Hyam ◽  
R. Kukla ◽  
T. Paterson

2013 ◽  
Vol 12 (06) ◽  
pp. 1223-1259 ◽  
Author(s):  
YASSER HACHAICHI ◽  
JAMEL FEKI

A data warehouse (DW) is a large data repository system designed for decision-making purposes. Its design relies on a specific model called multidimensional. This multidimensional model supports analyses of huge volumes of data that trace the enterprise's activities over time. Several design methods were proposed to build multidimensional schemas from either the relational data model or the entity-relationship data model. Almost all proposals that treated the object-oriented data model assume the presence of the data source UML class-diagram. However, in practice, either such a diagram does not exist or is obsolete due to multiple changes/evolutions of the information system. Furthermore, these few proposals require an intense manual intervention of the designer, which requires a high expertise both in the DW domain and in the object database domain. To overcome these disadvantages, this work proposes an automatic DW schema design method starting from an object database (schema and its instances). This method applies a set of extraction rules to identify multidimensional concepts and to generate star schemas. It is defined for the standard ODMG model and, thus, can be adapted with slight changes for other object database models. In addition, its extraction rules have the merit of being independent of the domain semantics. Furthermore, they automatically generate schemas classified according to their analytical potential; this classification helps the DW designer in selecting the most relevant schemas among the generated ones. Finally, being automatic, our method is supported by a toolset that also prepares for the automatic generation of the Extract Transform and Load procedures used to load the DW.


Author(s):  
Fredi Palominos ◽  
Hernan Díaz ◽  
Felisa Córdova ◽  
Lucio Cañete ◽  
Claudia Durán

The proliferation and popularization of new instruments for measuring different types of electrophysiological variables have generated the need to store huge volumes of information, corresponding to the records obtained by applying this instruments on experimental subjects. Together with this must be added the data derived from the analysis and purification processes. Moreover, several stages involved in the processing of data is associated with one or more specific methods related to the area of research and to the treatment at which the base information (RAW) is subjected. As a result of this and with the passage of time, various problems occur, which are the most obvious consequence of that data and metadata derived from the treatment processes and analysis and can end up accumulating and requiring more storage space than the base data. In addition, the enormous amount of information, as it increases over time, can lead to the loss of the link between the processed data, the methods of treatment used, and the analysis performed so that eventually all becomes simply a huge repository of biometric data, devoid of meaning and sense. This paper presents an approach founded on a data model that can adequately handle different types of chronologies of physiological and emotional information, ensuring confidentiality of information according to the experimental protocols and relevant ethical requirements, linking the information with the methods of treatment used and the technical and scientific documents derived from the analysis. Consequently, the need to generate specific data model is justified by the fact that the tools currently associated with the storage of large volumes of information are not able to take care of the semantic elements that make up the metadata and information relating to the analysis of base records of physiological information. This work is an extension of our paper [25].


Sign in / Sign up

Export Citation Format

Share Document