Performance analisys on multicore system using PAPI

Author(s):  
Esteban Hernandez Barragan ◽  
Josep Jorva Steves
Keyword(s):  
2017 ◽  
Vol 2017 ◽  
pp. 1-8
Author(s):  
Cem Bozkus ◽  
Basilio B. Fraguela

In recent years, vast amounts of data of different kinds, from pictures and videos from our cameras to software logs from sensor networks and Internet routers operating day and night, are being generated. This has led to new big data problems, which require new algorithms to handle these large volumes of data and as a result are very computationally demanding because of the volumes to process. In this paper, we parallelize one of these new algorithms, namely, the HyperLogLog algorithm, which estimates the number of different items in a large data set with minimal memory usage, as it lowers the typical memory usage of this type of calculation from O(n) to O(1). We have implemented parallelizations based on OpenMP and OpenCL and evaluated them in a standard multicore system, an Intel Xeon Phi, and two GPUs from different vendors. The results obtained in our experiments, in which we reach a speedup of 88.6 with respect to an optimized sequential implementation, are very positive, particularly taking into account the need to run this kind of algorithm on large amounts of data.


Author(s):  
Arun Kumar Sundar Rajan ◽  
Shriram K Vasudevan ◽  
Nirmala Devi M

<p>As the functionality in real-time embedded systems becoming complex, there has been a demand for higher computation capability, exploitation of parallelism and effective usage of the resources. Further, technological limitations in uniprocessor in terms of power consumption, instruction level parallelism reaching saturation, delay in access of memory blocks; directed towards emergence of multicore. Multicore design has its challenges as well. Increase in number cores has raised the demand for proper load distribution, parallelizing existing sequential codes, enabling effective communication and synchronization between cores, memory and I/O devices. This paper brings out the demand for effective load distribution with analyzes and discussion about the various task allocation techniques and algorithms associated with decentralized task scheduling technique for multicore systems. This paper also addresses on the multithreaded architecture, where parallel tasks are formulated from sequential code blocks and finally on the techniques to parallelize the sequential code block.</p>


Sign in / Sign up

Export Citation Format

Share Document