scholarly journals A large data processing algorithm for energy efficiency in a heterogeneous cluster

2018 ◽  
Vol 17 ◽  
pp. 03023
Author(s):  
Lei Wang ◽  
Weichun Ge ◽  
Zhao Li ◽  
Zhenjiang Lei ◽  
Shuo Chen

It is reportedi that the electricity cost to operate a cluster may well exceed its acquisition cost, and the processing of big data requires large scale cluster and long period. Therefore, energy efficient processing of big data is essential for the data owners and users. In this paper, we propose a novel algorithm MinBalance to processing I/O intensive big data tasks energy efficiently in heterogeneous cluster. In the former step, four greedy policies are used to select the proper nodes considering heterogeneity of the cluster. While in the latter step, the workloads of the selected nodes will be well balanced to avoid the energy wastes caused by waiting. MinBalance is a universal algorithm and cannot be affected by the data storage strategies. Experimental results indicate that MinBalance can achieve over 60% energy reduction for large sets over the traditional methods of powering down partial nodes.

In current scenario, the Big Data processing that includes data storage, aggregation, transmission and evaluation has attained more attraction from researchers, since there is an enormous data produced by the sensing nodes of large-scale Wireless Sensor Networks (WSNs). Concerning the energy efficiency and the privacy conservation needs of WSNs in big data aggregation and processing, this paper develops a novel model called Multilevel Clustering based- Energy Efficient Privacy-preserving Big Data Aggregation (MCEEP-BDA). Initially, based on the pre-defined structure of gradient topology, the sensor nodes are framed into clusters. Further, the sensed information collected from each sensor node is altered with respect to the privacy preserving model obtained from their corresponding sinks. The Energy model has been defined for determining the efficient energy consumption in the overall process of big data aggregation in WSN. Moreover, Cluster_head Rotation process has been incorporated for effectively reducing the communication overhead and computational cost. Additionally, algorithm has been framed for Least BDA Tree for aggregating the big sensor data through the selected cluster heads effectively. The simulation results show that the developed MCEEP-BDA model is more scalable and energy efficient. And, it shows that the Big Data Aggregation (BDA) has been performed here with reduced resource utilization and secure manner by the privacy preserving model, further satisfying the security concerns of the developing application-oriented needs.


Sensors ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 134 ◽  
Author(s):  
Bunrong Leang ◽  
Sokchomrern Ean ◽  
Ga-Ae Ryu ◽  
Kwan-Hee Yoo

The large amount of programmable logic controller (PLC) sensing data has rapidly increased in the manufacturing environment. Therefore, a large data store is necessary for Big Data platforms. In this paper, we propose a Hadoop ecosystem for the support of many features in the manufacturing industry. In this ecosystem, Apache Hadoop and HBase are used as Big Data storage and handle large scale data. In addition, Apache Kafka is used as a data streaming pipeline which contains many configurations and properties that are used to make a better-designed environment and a reliable system, such as Kafka offset and partition, which is used for program scaling purposes. Moreover, Apache Spark closely works with Kafka consumers to create a real-time processing and analysis of the data. Meanwhile, data security is applied in the data transmission phase between the Kafka producers and consumers. Public-key cryptography is performed as a security method which contains public and private keys. Additionally, the public-key is located in the Kafka producer, and the private-key is stored in the Kafka consumer. The integration of these above technologies will enhance the performance and accuracy of data storing, processing, and securing in the manufacturing environment.


2019 ◽  
Vol 3 (1) ◽  
pp. 30-37
Author(s):  
Bayu Prasetyo ◽  
Faiz Syaikhoni Aziz ◽  
Kamil Faqih ◽  
Wahyu Primadi ◽  
Roni Herdianto ◽  
...  

The development of technology from year to year is increasingly rapid and diverse. All systems that exist in human life began to be designed with technology that requires large data storage. Big Data technology began to be developed to accommodate very large data volumes, rapid data changes, and very varied. Developing countries are starting to use Big Data a lot in developing their systems, such as healthcare, agriculture, building, transportation, and various other fields. In this paper, it explains the development of Big Data applied to the sectors previously mentioned in developing countries and also the challenges faced by developing countries in the process of developing their systems.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2018 ◽  
Vol 7 (4.6) ◽  
pp. 13
Author(s):  
Mekala Sandhya ◽  
Ashish Ladda ◽  
Dr. Uma N Dulhare ◽  
. . ◽  
. .

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining. 


2005 ◽  
Vol 44 (02) ◽  
pp. 149-153 ◽  
Author(s):  
F. Estrella ◽  
C. del Frate ◽  
T. Hauer ◽  
M. Odeh ◽  
D. Rogulin ◽  
...  

Summary Objectives: The past decade has witnessed order of magnitude increases in computing power, data storage capacity and network speed, giving birth to applications which may handle large data volumes of increased complexity, distributed over the internet. Methods: Medical image analysis is one of the areas for which this unique opportunity likely brings revolutionary advances both for the scientist’s research study and the clinician’s everyday work. Grids [1] computing promises to resolve many of the difficulties in facilitating medical image analysis to allow radiologists to collaborate without having to co-locate. Results: The EU-funded MammoGrid project [2] aims to investigate the feasibility of developing a Grid-enabled European database of mammograms and provide an information infrastructure which federates multiple mammogram databases. This will enable clinicians to develop new common, collaborative and co-operative approaches to the analysis of mammographic data. Conclusion: This paper focuses on one of the key requirements for large-scale distributed mammogram analysis: resolving queries across a grid-connected federation of images.


2017 ◽  
Vol 41 (3) ◽  
pp. 129-132 ◽  
Author(s):  
Peter Schofield

SummaryAdvances in information technology and data storage, so-called ‘big data’, have the potential to dramatically change the way we do research. We are presented with the possibility of whole-population data, collected over multiple time points and including detailed demographic information usually only available in expensive and labour-intensive surveys, but at a fraction of the cost and effort. Typically, accounts highlight the sheer volume of data available in terms of terabytes (1012) and petabytes (1015) of data while charting the exponential growth in computing power we can use to make sense of this. Presented with resources of such dizzying magnitude it is easy to lose sight of the potential limitations when the amount of data itself appears unlimited. In this short account I look at some recent advances in electronic health data that are relevant for mental health research while highlighting some of the potential pitfalls.


Author(s):  
Jamie Farnes ◽  
Ben Mort ◽  
Fred Dulwich ◽  
Stef Salvini ◽  
Wes Armour

The Square Kilometre Array (SKA) will be both the largest radio telescope ever constructed and the largest Big Data project in the known Universe. The first phase of the project will generate on the order of 5 zettabytes of data per year. A critical task for the SKA will be its ability to process data for science, which will need to be conducted by science pipelines. Together with polarization data from the LOFAR Multifrequency Snapshot Sky Survey (MSSS), we have been developing a realistic SKA-like science pipeline that can handle the large data volumes generated by LOFAR at 150 MHz. The pipeline uses task-based parallelism to image, detect sources, and perform Faraday Tomography across the entire LOFAR sky. The project thereby provides a unique opportunity to contribute to the technological development of the SKA telescope, while simultaneously enabling cutting-edge scientific results. In this paper, we provide an update on current efforts to develop a science pipeline that can enable tight constraints on the magnetised large-scale structure of the Universe.


Sign in / Sign up

Export Citation Format

Share Document