NoSQL schema evolution and big data migration at scale

Author(s):  
Meike Klettke ◽  
Uta Storl ◽  
Manuel Shenavai ◽  
Stefanie Scherzinger
Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4508
Author(s):  
Xin Li ◽  
Liangyuan Wang ◽  
Jemal H. Abawajy ◽  
Xiaolin Qin ◽  
Giovanni Pau ◽  
...  

Efficient big data analysis is critical to support applications or services in Internet of Things (IoT) system, especially for the time-intensive services. Hence, the data center may host heterogeneous big data analysis tasks for multiple IoT systems. It is a challenging problem since the data centers usually need to schedule a large number of periodic or online tasks in a short time. In this paper, we investigate the heterogeneous task scheduling problem to reduce the global task execution time, which is also an efficient method to reduce energy consumption for data centers. We establish the task execution for heterogeneous tasks respectively based on the data locality feature, which also indicate the relationship among the tasks, data blocks and servers. We propose a heterogeneous task scheduling algorithm with data migration. The core idea of the algorithm is to maximize the efficiency by comparing the cost between remote task execution and data migration, which could improve the data locality and reduce task execution time. We conduct extensive simulations and the experimental results show that our algorithm has better performance than the traditional methods, and data migration actually works to reduce th overall task execution time. The algorithm also shows acceptable fairness for the heterogeneous tasks.


Big data testing services are to deliver end to end testing methodologies which address our big data challenges. The testing module includes two types of functionalities. One is functional testing and second is non- functional testing. The functional testing should be accomplished at every stage of big data processing. Functional testing is nothing but the big data sources extraction testing, data migration testing and big data ecosystem. Testing which completes ETL test strategy, Map job reduce validation, multicore Data integration validation and data duplication check. On the other side the non-functional testing is to ensure that there are no quality defeat in data and no performance related issues. It covers the area for security testing, performance testing which solve the problem of monitoring and identify bottlenecks.


2020 ◽  
Vol 7 (3) ◽  
pp. 1857-1871 ◽  
Author(s):  
Maria Kanwal ◽  
Asad Waqar Malik ◽  
Anis Ur Rahman ◽  
Imran Mahmood ◽  
Muhammad Shahzad

ETRI Journal ◽  
2014 ◽  
Vol 36 (6) ◽  
pp. 988-998 ◽  
Author(s):  
Hai Thanh Mai ◽  
Kyoung Hyun Park ◽  
Hun Soon Lee ◽  
Chang Soo Kim ◽  
Miyoung Lee ◽  
...  

2021 ◽  
Vol 13 (3) ◽  
pp. 1-15
Author(s):  
Rada Chirkova ◽  
Jon Doyle ◽  
Juan Reutter

Assessing and improving the quality of data are fundamental challenges in Big-Data applications. These challenges have given rise to numerous solutions targeting transformation, integration, and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between standalone tools in these areas. In this article, we focus on the problem of determining whether the available data-transforming procedures can be used together to bring about the desired quality characteristics of the data in business or analytics processes. For example, to help an organization avoid building a data-quality solution from scratch when facing a new analytics task, we ask whether the data quality can be improved by reusing the tools that are already available, and if so, which tools to apply, and in which order, all without presuming knowledge of the internals of the tools, which may be external or proprietary. Toward addressing this problem, we conduct a formal study in which individual data cleaning, data migration, or other data-transforming tools are abstracted as black-box procedures with only some of the properties exposed, such as their applicability requirements, the parts of the data that the procedure modifies, and the conditions that the data satisfy once the procedure has been applied. As a proof of concept, we provide foundational results on sequential applications of procedures abstracted in this way, to achieve prespecified data-quality objectives, for the use case of relational data and for procedures described by standard relational constraints. We show that, while reasoning in this framework may be computationally infeasible in general, there exist well-behaved cases in which these foundational results can be applied in practice for achieving desired data-quality results on Big Data.


ETRI Journal ◽  
2014 ◽  
Author(s):  
Hai Thanh Mai ◽  
Kyoung Hyun Park ◽  
Hun Soon Lee ◽  
Chang Soo Kim ◽  
Mi Young Lee ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document