Bootstrapping a Blockchain Based Ecosystem for Big Data Exchange

Author(s):  
Jinchuan Chen ◽  
Yunzhi Xue
Keyword(s):  
Big Data ◽  
Author(s):  
Mohammadhossein Barkhordari ◽  
Mahdi Niamanesh

Because of to the high rate of data growth and the need for data analysis, data warehouse management for big data is an important issue. Single node solutions cannot manage the large amount of information. Information must be distributed over multiple hardware nodes. Nevertheless, data distribution over nodes causes each node to need data from other nodes to execute a query. Data exchange among nodes creates problems, such as the joins between data segments that exist on different nodes, network congestion, and hardware node wait for data reception. In this paper, the Aras method is proposed. This method is a MapReduce-based method that introduces a data set on each mapper. By applying this method, each mapper node can execute its query independently and without need to exchange data with other nodes. Node independence solves the aforementioned data distribution problems. The proposed method has been compared with prominent data warehouses for big data, and the Aras query execution time was much lower than other methods.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2772 ◽  
Author(s):  
Aguinaldo Bezerra ◽  
Ivanovitch Silva ◽  
Luiz Affonso Guedes ◽  
Diego Silva ◽  
Gustavo Leitão ◽  
...  

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.


2019 ◽  
Vol 11 (11) ◽  
pp. 225 ◽  
Author(s):  
Yuling Chen ◽  
Jinyi Guo ◽  
Changlou Li ◽  
Wei Ren

In the big data era, data are envisioned as critical resources with various values, e.g., business intelligence, management efficiency, and financial evaluations. Data sharing is always mandatory for value exchanges and profit promotion. Currently, certain big data markets have been created for facilitating data dissemination and coordinating data transaction, but we have to assume that such centralized management of data sharing must be trustworthy for data privacy and sharing fairness, which very likely imposes limitations such as joining admission, sharing efficiency, and extra costly commissions. To avoid these weaknesses, in this paper, we propose a blockchain-based fair data exchange scheme, called FaDe. FaDe can enable de-centralized data sharing in an autonomous manner, especially guaranteeing trade fairness, sharing efficiency, data privacy, and exchanging automation. A fairness protocol based on bit commitment is proposed. An algorithm based on blockchain script architecture for a smart contract, e.g., by a bitcoin virtual machine, is also proposed and implemented. Extensive analysis justifies that the proposed scheme can guarantee data exchanging without a trusted third party fairly, efficiently, and automatically.


2021 ◽  
Vol 544 ◽  
pp. 469-484
Author(s):  
Tiantian Li ◽  
Wei Ren ◽  
Yuexin Xiang ◽  
Xianghan Zheng ◽  
Tianqing Zhu ◽  
...  

Author(s):  
Alberto Traverso ◽  
Frank J. W. M. Dankers ◽  
Leonard Wee ◽  
Sander M. J. van Kuijk

AbstractPre-requisites to better understand the chapter: basic knowledge of major sources of clinical data.Logical position of the chapter with respect to the previous chapter: in the previous chapter, you have learned what the major sources of clinical data are. In this chapter, we will dive into the main characteristics of presented data sources. In particular, we will learn how to distinguish and classify data according to its scale.Learning objectives: you will learn the major differences between data sources presented in previous chapters; how clinical data can be classified according to its scale. You will get familiar with the concept of ‘big’ clinical data; you will learn which are the major concerns limiting ‘big’ data exchange.


2018 ◽  
Vol 19 (3) ◽  
pp. 223-244
Author(s):  
Sonia Ikken ◽  
Eric Renault ◽  
Abdelkamel Tari ◽  
Tahar Kechadi

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale.  Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising.  We also show  that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution.


Service-Oriented Architecture is a method for scheming dealing and organizing systems that represent ecofriendly business functionality. The objective of this study is to find out the critical success factors that need to implement SOA in BIG DATA systems. Our study aimed at classifying these erroneous performs in execution of SOA. The acceptance of SOA has interested creators to text its requests and applications. The analysed results would be very useful for researchers who would like to implement SOA with BIG DATA systems. SOA lead numerous advantages such as value-added flexibility and appropriate alignment among processes as well as reduced cost of integration and maintenance. Generally, BIG DATA anxieties large-volume, composite, rising figures groups with numerous, self-directed sources. BIG DATA claims where data gathering has grownup extremely and is elsewhere the aptitude of usually used software utensils to detention, accomplish and development within the rise [1]. The greatest essential task for the BIG DATA applications is to discover the large volumes of data and excerpt valuable material or information for upcoming actions. The main purpose of this study is to identify the important factors that are needed to implement SOA in BIG DATA systems. Zhang and Yang suggests a reengineering approach which will restructure the legacy systems that leads to SOA by considering of an organization. This paper also express various challenges of SOA and identify the problems that improve SOA based services for data exchange in BIG DATA systems.


In EDA industry, GDSII format is the IC industry de facto standard for IC layout data exchange. The designs developed may require up to 1 Billion Byte (1TByte) of disk data. This huge amount of Big Data not only slows down the runtimes from design to physical verification but also increases the time to get a design to market. On the other hand, OASIS stream format which is replacement to GDSII is relatively new and is emerging in the industry. The OASIS stream format significantly reduces the data and thereby the tool is much likely to run faster. There hasn’t been significant development in enhancing the robustness of its usage due to lack of in-house test cases. This paper presents an approach to develop a debug utility for OASIS parser validation to increase its robustness. The debug utility is implemented using a Singleton design pattern. The utility essentially enables us to compare the data associated with both the stream formats and highlights the differences. The effective memory utilization of the proposed design is zero since all the structure are dynamically created and destroyed after its use. Iterative and unit testing were performed on the utility and the proposed design was tested with real time test cases to verify its robustness.


Sign in / Sign up

Export Citation Format

Share Document