Structure Splitting for Elbrus Processor Compiler

This report presents a new version of Structure Splitting optimization, implemented for the compiler for Elbrus and SPARC processors. Structure Splitting tries to improve data locality by splitting arrays of structures into arrays of smaller structures. This solution helps to decrease probability of cache misses, which leads to execution time decrease. The optimization was generalized for the case of array of structures nested in another structure and possibility of its reallocation. Execution speed of two tests from SPEC CPU2000 and SPEC CPU2006 increased by 19 and 12 %.

Download Full-text

Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System

Energies ◽

10.3390/en13174508 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4508

Author(s):

Xin Li ◽

Liangyuan Wang ◽

Jemal H. Abawajy ◽

Xiaolin Qin ◽

Giovanni Pau ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Task Scheduling ◽

Execution Time ◽

Data Centers ◽

Big Data Analysis ◽

Data Locality ◽

Data Migration ◽

Task Execution ◽

Task Execution Time

Efficient big data analysis is critical to support applications or services in Internet of Things (IoT) system, especially for the time-intensive services. Hence, the data center may host heterogeneous big data analysis tasks for multiple IoT systems. It is a challenging problem since the data centers usually need to schedule a large number of periodic or online tasks in a short time. In this paper, we investigate the heterogeneous task scheduling problem to reduce the global task execution time, which is also an efficient method to reduce energy consumption for data centers. We establish the task execution for heterogeneous tasks respectively based on the data locality feature, which also indicate the relationship among the tasks, data blocks and servers. We propose a heterogeneous task scheduling algorithm with data migration. The core idea of the algorithm is to maximize the efficiency by comparing the cost between remote task execution and data migration, which could improve the data locality and reduce task execution time. We conduct extensive simulations and the experimental results show that our algorithm has better performance than the traditional methods, and data migration actually works to reduce th overall task execution time. The algorithm also shows acceptable fairness for the heterogeneous tasks.

Download Full-text

Hybrid Approach for Face Recognition from a Single Sample per Person by Combining VLC and GOM

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0380 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1523-1534 ◽

Cited By ~ 1

Author(s):

Ahmed Ghorbel ◽

Walid Aydi ◽

Imen Tajouri ◽

Nouri Masmoudi

Keyword(s):

Feature Extraction ◽

Face Recognition ◽

Execution Time ◽

Hybrid Approach ◽

Recognition System ◽

Single Sample ◽

Data Sets ◽

Extraction Techniques ◽

Face Recognition System ◽

Execution Speed

Abstract This paper proposes a new face recognition system based on combining two feature extraction techniques: the Vander Lugt correlator (VLC) and Gabor ordinal measures (GOM). The proposed system relies on the execution speed of VLC and the robustness of GOM. In this system, we applied the Tan and Triggs and retina modeling enhancement techniques, which are well suited for VLC and GOM, respectively. We evaluated our system on the standard FERET probe data sets and on extended YaleB database. The obtained results exhibited better face recognition rates in a shorter execution time compared to the GOM technique.

Download Full-text

Compiler Optimization to Improve Data Locality for Processor Multithreading

Scientific Programming ◽

10.1155/1999/235625 ◽

1999 ◽

Vol 7 (1) ◽

pp. 21-37

Author(s):

Balaram Sinharoy

Keyword(s):

Execution Time ◽

Compiler Optimization ◽

Optimization Method ◽

Data Locality ◽

Transformation Theory ◽

Loop Transformation ◽

Processor Utilization ◽

Multithreaded Processor ◽

High Level ◽

Cache Miss

Over the last decade processor speed has increased dramatically, whereas the speed of the memory subsystem improved at a modest rate. Due to the increase in the cache miss latency (in terms of the processor cycle), processors stall on cache misses for a significant portion of its execution time. Multithreaded processors has been proposed in the literature to reduce the processor stall time due to cache misses. Although multithreading improves processor utilization, it may also increase cache miss rates, because in a multithreaded processor multiple threads share the same cache, which effectively reduces the cache size available to each individual thread. Increased processor utilization and the increase in the cache miss rate demands higher memory bandwidth. A novel compiler optimization method has been presented in this paper that improves data locality for each of the threads and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit high level of intra‐thread and inter‐thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.

Download Full-text

Effect of Resampling on the Performance and Execution Speed of Adaptive Marginalized Particle Filter

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5359.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 4005-4012

Keyword(s):

Particle Filter ◽

State Estimation ◽

Execution Time ◽

Adaptive Filters ◽

Mean Square ◽

Different Types ◽

Execution Speed ◽

And Performance ◽

Major Factors ◽

Degeneracy Problem

One of the major factors that affects the performance of adaptive filters like Particle Filter (PF), Marginalized Particle Filter (MPF) and Adaptive Marginalized Particle Filter (AMPF) is sample degeneracy. Sample degeneracy occurs when the weights associated with particles converges to zero making them useless in state estimation. Resampling is the most common method used to avoid sample degeneracy problem, in which a new set of particles are generated and weights are assigned. Performance and execution time of these filter depends a lot on what type of resampling technique is employed. AMPF is the modified version of MPF which is typically faster than PF and MPF. The main aim of this paper is to find the effect of different types of resampling on the performance and execution time of AMPF. For this, a typical target tracking problem is simulated using MATLAB. AMPF with different types of resampling techniques is used for state estimation for the above-mentioned problem and the best in terms of performance and execution speed will be found out. From the simulation, it will be clear that AMPF with systematic resampling is found to be best in terms of execution speed and performance i.e. minimum Root Mean Square Error.

Download Full-text

An Optimal Data Placement Strategy for Improving System Performance of Massive Data Applications Using Graph Clustering

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2018070102 ◽

2018 ◽

Vol 9 (3) ◽

pp. 15-30 ◽

Cited By ~ 4

Author(s):

S. Vengadeswaran ◽

S. R. Balasundaram

Keyword(s):

Big Data ◽

Execution Time ◽

Clustering Algorithm ◽

Graph Clustering ◽

Data Placement ◽

Data Locality ◽

Query Execution ◽

Data Set ◽

Statistical Measures ◽

Default Data

This article describes how the time taken to execute a query and return the results, increase exponentially as the data size increases, leading to more waiting times of the user. Hadoop with its distributed processing capability is considered as an efficient solution for processing such large data. Hadoop's Default Data Placement Strategy (HDDPS) allocates the data blocks randomly across the cluster of nodes without considering any of the execution parameters. This result in non-availability of the blocks required for execution in local machine so that the data has to be transferred across the network for execution, leading to data locality issue. Also, it is commonly observed that most of the data intensive applications show grouping semantics. Hence during query execution, only a part of the Big-Data set is utilized. Since such execution parameters and grouping behavior are not considered, the default placement does not perform well resulting in several lacunas such as decreased local map task execution, increased query execution time, query latency, etc. In order to overcome such issues, an Optimal Data Placement Strategy (ODPS) based on grouping semantics is proposed. Initially, user history log is dynamically analyzed for identifying access pattern which is depicted as a graph. Markov clustering, a Graph clustering algorithm is applied to identify groupings among the dataset. Then, an Optimal Data Placement Algorithm (ODPA) is proposed based on the statistical measures estimated from the clustered graph. This in turn re-organizes the default data layouts in HDFS to achieve improved performance for Big-Data sets in heterogeneous distributed environment. Our proposed strategy is tested in a 15 node cluster placed in a single rack topology. The result has proved to be more efficient for massive datasets, reducing query execution time by 26% and significantly improves the data locality by 38% compared to HDDPS.

Download Full-text

A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018100101 ◽

2018 ◽

Vol 10 (4) ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Mohamed Merabet ◽

Sidi mohamed Benslimane ◽

Mahmoud Barhamgi ◽

Christine Bonnet

Keyword(s):

Linear Regression ◽

Linear Regression Model ◽

Execution Time ◽

Input Data ◽

Data Access ◽

Data Locality ◽

Data Prefetching ◽

Critical Factors ◽

Tasks Scheduling ◽

Predictive Map

This article describes how data locality is becoming one of the most critical factors to affect performance of MapReduce clusters because of network bisection bandwidth becomes a bottleneck. Task scheduler assigns the most appropriate map tasks to nodes. If map tasks are scheduled to nodes without input data, these tasks will issue remote I/O operations to copy the data to local nodes that decrease execution time of map tasks. In that case, prefetching mechanism can be useful to preload the needed input data before tasks is launching. Therefore, the key challenge is how this article can accurately predict the execution time of map tasks to be able to use data prefetching effectively without any data access delay. In this article, it is proposed that a Predictive Map Task Scheduler assigns the most suitable map tasks to nodes ahead of time. Following this, a linear regression model is used for prediction and data locality based algorithm for tasks scheduling. The experimental results show that the method can greatly improve both data locality and execution time of map tasks.

Download Full-text

Big Data Workflows: Locality-Aware Orchestration Using Software Containers

Sensors ◽

10.3390/s21248212 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8212

Author(s):

Andrei-Alin Corodescu ◽

Nikolay Nikolov ◽

Akif Quddus Khan ◽

Ahmet Soylu ◽

Mihhail Matskin ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Data Locality ◽

Computing Paradigm ◽

Limited Support ◽

Execution Speed ◽

Significant Performance ◽

Geographically Distributed ◽

Workflow Orchestration ◽

Data Centres

The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.

Download Full-text

DAG-CPM Scheduler for Parallel Execution of Critical Jobs

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e7862.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 467-474

Keyword(s):

Execution Time ◽

Job Scheduling ◽

Critical Path ◽

Substantial Reduction ◽

Parallel Execution ◽

Data Set ◽

Page View ◽

Execution Speed ◽

Log File ◽

Time Required

MapReduce applications having multiple jobs may be dependent on each other such as iterative Page View application [2] performs the required operation in several iterations before generating the result. Each iteration is considered as single job. Conventional Hadoop MapReduce schedules the jobs sequentially, but not customized to handle multi job application. Also, it will not perform the parallel execution of the dependent jobs. This prolongs the execution time to complete all the jobs. Therefore a new scheduler DAG–CPM Scheduler uses the critical path job scheduling model, to identify the jobs present in the critical path. Critical path job scheduling is optimized to offer support for multi job applications, critical path job is a series of jobs, if execution of a job is delayed, then time required to execute all jobs will be prolonged. DAG–CPM Scheduler schedules multiple jobs by dynamically constructing the job dependency in DAG for the currently running job based on the input and output of a job. DAG represents the dependency among the jobs, this dependency graph is used to insert a pipeline between the output of one job as input for map tasks of another job and it executes the dependent jobs in parallel which results into a substantial reduction in the execution time of an application. Experimental analysis on the proposed approach has been carried out on Page View application on Academic and research web server log file, such as, NASA and rnsit.ac.in of 10 GB data set. PigMix2 is executed on 8GB data set. Experimental results reveal that the average execution time is decreased by 41% compared to Hadoop in respect Page View application and Execution speed is 37.7% faster compared to Pig and DAG–CPM Scheduler can run 24.3% faster when compared to DAG–CPM Scheduler without pipeline.

Download Full-text

Locality properties of 3D data orderings with application to parallel molecular dynamics simulations

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019846282 ◽

2019 ◽

Vol 33 (5) ◽

pp. 998-1018

Author(s):

Ibrahim Al-Kharusi ◽

David W Walker

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulations ◽

Spatial Data ◽

Data Locality ◽

Data Reuse ◽

Dynamic Simulations ◽

Data Movement ◽

Execution Speed ◽

Dynamics Simulations ◽

The Impact

Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give results that are consistent with the timing results for the particle force computation obtained on NVidia GeForce GTX960 and Tesla P100 GPUs. Further analysis of the observed memory use, in terms of cache hits and the number of memory transactions, provides a more detailed explanation of execution behaviour for the different orderings. To the best of our knowledge, this is the first study to investigate memory analysis and data locality issues for molecular dynamics simulations of Lennard-Jones fluids on NVidia’s Maxwell and Tesla architectures.

Download Full-text

Effective Job Execution in Hadoop Over Authorized Deduplicated Data

Webology ◽

10.14704/web/v17i2/web17043 ◽

2020 ◽

Vol 17 (2) ◽

pp. 430-444

Author(s):

Sachin Arun Thanekar ◽

K. Subrahmanyam ◽

A.B. Bagwan

Keyword(s):

Execution Time ◽

Data Locality ◽

Experimental Results ◽

Storage Space ◽

Task Execution ◽

Space Management ◽

System Task ◽

Overall Performance ◽

Access Rights

Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.

Download Full-text