DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

DV-DVFS: Merging Data Variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

10.21203/rs.3.rs-45414/v4 ◽

2021 ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked from previous work. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

DV-DVFS: Merging Data variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

10.21203/rs.3.rs-45414/v2 ◽

2020 ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked from previous work. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

DV-DVFS: Merging Data variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

10.21203/rs.3.rs-45414/v3 ◽

2020 ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked from previous work. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

DV-DVFS: Merging Data variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

10.21203/rs.3.rs-45414/v1 ◽

2020 ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

Abstract Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in consumption of processing resources such as CPU consumption. In this paper, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider a deadline as our constraint and before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. We have used a set of data sets and applications in the evaluation phase. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

Space-Time Analytics for Spatial Dynamics

Data Mining ◽

10.4018/978-1-4666-2455-9.ch108 ◽

2013 ◽

pp. 2117-2131

Author(s):

May Yuan ◽

James Bothwell

Keyword(s):

Big Data ◽

Temperature Change ◽

Spatial Dynamics ◽

Space Time ◽

Data Sets ◽

Spatial Processes ◽

Multiple Sources ◽

Time Concepts ◽

New Thinking ◽

New Space

The so-called Big Data Challenge poses not only issues with massive volumes of data, but issues with the continuing data streams from multiple sources that monitor environmental processes or record social activities. Many statistics tools and data mining methods have been developed to reveal embedded patterns in large data sets. While patterns are critical to data analysis, deep insights will remain buried unless we develop means to associate spatiotemporal patterns to the dynamics of spatial processes that essentially drive the formation of patterns in the data. This chapter reviews the literature with the conceptual foundation for space-time analytics dealing with spatial processes, discusses the types of dynamics that have and have not been addressed in the literature, and identifies needs for new thinking that can systematically advance space-time analytics to reveal dynamics of spatial processes. The discussion is facilitated by an example to highlight potential means of space-time analytics in response to the Big Data Challenge. The example shows the development of new space-time concepts and tools to analyze data from two common General Circulation Models for climate change predictions. Common approaches compare temperature changes at locations from the NCAR CCSM3 and from the CNRM CM3 or animate time series of temperature layers to visualize the climate prediction. Instead, new space-time analytics methods are shown here the ability to decipher the differences in spatial dynamics of the predicted temperature change in the model outputs and apply the concepts of change and movement to reveal warming, cooling, convergence, and divergence in temperature change across the globe.

Download Full-text

Geophysics introduces new section on multiphysics and joint inversion

The Leading Edge ◽

10.1190/tle39100753.1 ◽

2020 ◽

Vol 39 (10) ◽

pp. 753-754

Author(s):

Jiajia Sun ◽

Daniele Colombo ◽

Yaoguo Li ◽

Jeffrey Shragge

Keyword(s):

Big Data ◽

Joint Inversion ◽

Effective Means ◽

Added Value ◽

Data Sets ◽

Computational Power ◽

Sources Of Information ◽

Multiple Sources ◽

Data Types ◽

Geoscientific Data

Geophysicists seek to extract useful and potentially actionable information about the subsurface by interpreting various types of geophysical data together with prior geologic information. It is well recognized that reliable imaging, characterization, and monitoring of subsurface systems require integration of multiple sources of information from a multitude of geoscientific data sets. With increasing data volumes and computational power, new data types, constant development of inversion algorithms, and the advent of the big data era, Geophysics editors see multiphysics integration as an effective means of meeting some of the challenges arising from imaging subsurface systems with higher resolution and reliability as well as exploring geologically more complicated areas. To advance the field of multiphysics integration and to showcase its added value, Geophysics will introduce a new section “Multiphysics and Joint Inversion” in 2021. Submissions are accepted now.

Download Full-text

DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS

IIUM Engineering Journal ◽

10.31436/iiumej.v22i2.1541 ◽

2021 ◽

Vol 22 (2) ◽

pp. 119-134

Author(s):

Ahad Shamseen ◽

Morteza Mohammadi Zanjireh ◽

Mahdi Bahaghighat ◽

Qin Xin

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Main Memory ◽

Experimental Results ◽

Primary Data ◽

Data Sets ◽

Decision Tree Classifier ◽

Vast Amount ◽

Tree Classifier

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.

Download Full-text

MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8110475 ◽

2019 ◽

Vol 8 (11) ◽

pp. 475

Author(s):

Junghee Jo ◽

Kang-Woo Lee

Keyword(s):

Big Data ◽

Internet Of Things ◽

Single Machine ◽

Experimental Results ◽

The Other ◽

Data Preparation ◽

Overall Performance

The conventional extracting–transforming–loading (ETL) system is typically operated on a single machine not capable of handling huge volumes of geospatial big data. To deal with the considerable amount of big data in the ETL process, we propose D_ELT (delayed extracting–loading –transforming) by utilizing MapReduce-based parallelization. Among various kinds of big data, we concentrate on geospatial big data generated via sensors using Internet of Things (IoT) technology. In the IoT environment, update latency for sensor big data is typically short and old data are not worth further analysis, so the speed of data preparation is even more significant. We conducted several experiments measuring the overall performance of D_ELT and compared it with both traditional ETL and extracting–loading– transforming (ELT) systems, using different sizes of data and complexity levels for analysis. The experimental results show that D_ELT outperforms the other two approaches, ETL and ELT. In addition, the larger the amount of data or the higher the complexity of the analysis, the greater the parallelization effect of transform in D_ELT, leading to better performance over the traditional ETL and ELT approaches.

Download Full-text

Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution

Journal of Artificial Intelligence Research ◽

10.1613/jair.3120 ◽

2011 ◽

Vol 40 ◽

pp. 469-521 ◽

Cited By ~ 14

Author(s):

A. Rahman ◽

V. Ng

Keyword(s):

Experimental Results ◽

The Other ◽

Superior Performance ◽

Traditional Learning ◽

Data Sets ◽

Coreference Resolution ◽

Pair Model ◽

Ranking Model ◽

Cluster Ranking

Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text