Microarchitectural Characterization on a Mobile Workload

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text

Improving ILP via Fused In-Order Superscalar and VLIW Instruction Dispatch Methods

Journal of Circuits System and Computers ◽

10.1142/s0218126619500208 ◽

2018 ◽

Vol 28 (02) ◽

pp. 1950020 ◽

Cited By ~ 1

Author(s):

Yumin Hou ◽

Xu Wang ◽

Jiawei Fu ◽

Junping Ma ◽

Hu He ◽

...

Keyword(s):

Prediction Method ◽

Digital Signal ◽

General Purpose ◽

Performance Comparison ◽

Instruction Level Parallelism ◽

Superscalar Processor ◽

Performance Improvements ◽

General Purpose Processor ◽

Evaluation Board ◽

Level Parallelism

In order to expand the computation capability of digital signal processing on a General Purpose Processor (GPP), we propose a fused microarchitecture that improves Instruction Level Parallelism (ILP) by supporting both in-order superscalar and very long instruction word (VLIW) dispatch methods in a single pipeline. This design is based on ARMv7-A&R Instruction Set Architecture (ISA). To provide a performance comparison, we first design an in-order superscalar processor, considering that ARM GPPs always adopt superscalar approaches. And then we expand VLIW dispatch method based on this processor, to realize the fused microarchitecture. The two designs are both evaluated on the Xilinx 7-series FPGA (XC7K325T-2FFG900C), using Xilinx Vivado design suite. The results show that, compared with the superscalar processor, the processor working under VLIW mode can improve the performance by 15% and 8%, respectively, when running EEMBC and DSPstone benchmarks. We also run the two benchmarks on ARM Cortex-A9 processor, which is integrated in the Zynq-7000 AP SoC device on Xilinx ZC706 evaluation board. The processor in VLIW mode shows 44% and 30% performance improvements than ARM Cortex-A9. The fused microarchitecture adopts a combined bimodal and PAp branch prediction method. This method achieves 93.7% prediction accuracy with limited hardware overhead.

Download Full-text

Case Study using Probe Vehicle Speeds to Assess Roadway Safety in Georgia

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120948858 ◽

2020 ◽

Vol 2674 (11) ◽

pp. 554-562

Author(s):

David J. Ederer ◽

Michael O. Rodgers ◽

Michael P. Hunter ◽

Kari E. Watkins

Keyword(s):

Performance Monitoring ◽

Negative Binomial ◽

Performance Metrics ◽

Large Body ◽

Vehicle Speed ◽

Crash Frequency ◽

Data Set ◽

Probe Vehicle ◽

Performance Metric ◽

The Relationship

Speed is a primary risk factor for road crashes and injuries. Previous research has attempted to ascertain the relationship between individual vehicle speeds, aggregated speeds, and crash frequency on roadways. Although there is a large body of research linking vehicle speeds to safety outcomes, there is not a widely applied performance metric for safety based on regularly reported speeds. With the increasingly widespread availability of probe vehicle speed data, there is an opportunity to develop network-level safety performance metrics. This analysis examined the relationship between percentile speeds and crashes on a principal arterial in Metropolitan Atlanta. This study used data from the National Performance Metric Research Data Set (NPMRDS), the Georgia Electronic Accident Reporting System, and the Highway Performance Monitoring System. Negative binomial regression models were used to analyze the relationship between speed percentiles, and speed differences to crash frequency on roadway sections. Results suggested that differences in speed percentiles, a measure of speed dispersion, are related to the frequency of crashes. Based on the models, the difference in the 85th percentile and median speed is proposed as a performance metric. This difference is easily measured using NPMRDS probe vehicle speeds, and provides a practical performance metric for assessing safety on roadways.

Download Full-text

An efficient technique for CT scan images classification of COVID-19

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201985 ◽

2020 ◽

pp. 1-14

Author(s):

Esraa Hassan ◽

Noha A. Hikal ◽

Samir Elmuogy

Keyword(s):

Neural Network ◽

Diagnostic Tool ◽

Deep Neural Network ◽

Data Augmentation ◽

Performance Metrics ◽

Classification Model ◽

Data Set ◽

Training Models ◽

The Earth ◽

Diagnostic Time

Nowadays, Coronavirus (COVID-19) considered one of the most critical pandemics in the earth. This is due its ability to spread rapidly between humans as well as animals. COVID_19 expected to outbreak around the world, around 70 % of the earth population might infected with COVID-19 in the incoming years. Therefore, an accurate and efficient diagnostic tool is highly required, which the main objective of our study. Manual classification was mainly used to detect different diseases, but it took too much time in addition to the probability of human errors. Automatic image classification reduces doctors diagnostic time, which could save human’s life. We propose an automatic classification architecture based on deep neural network called Worried Deep Neural Network (WDNN) model with transfer learning. Comparative analysis reveals that the proposed WDNN model outperforms by using three pre-training models: InceptionV3, ResNet50, and VGG19 in terms of various performance metrics. Due to the shortage of COVID-19 data set, data augmentation was used to increase the number of images in the positive class, then normalization used to make all images have the same size. Experimentation is done on COVID-19 dataset collected from different cases with total 2623 where (1573 training,524 validation,524 test). Our proposed model achieved 99,046, 98,684, 99,119, 98,90 In terms of Accuracy, precision, Recall, F-score, respectively. The results are compared with both the traditional machine learning methods and those using Convolutional Neural Networks (CNNs). The results demonstrate the ability of our classification model to use as an alternative of the current diagnostic tool.

Download Full-text

Performance comparison of cu/low-k, carbon nanotube, and optics for on-chip and off-chip interconnects

Proceedings of the 11th international workshop on System level interconnect prediction - SLIP '09 ◽

10.1145/1572471.1572493 ◽

2009 ◽

Author(s):

Krishna C. Saraswat

Keyword(s):

Carbon Nanotube ◽

Performance Comparison ◽

On Chip ◽

Low K

Download Full-text

Performance comparison of selected wired and wireless networks on chip architectures

2015 17th Conference of Open Innovations Association (FRUCT) ◽

10.1109/fruct.2015.7117974 ◽

2015 ◽

Author(s):

Maria Komar

Keyword(s):

Wireless Networks ◽

Performance Comparison ◽

Networks On Chip ◽

On Chip

Download Full-text

Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance

International Journal of Quality Statistics and Reliability ◽

10.1155/2009/126283 ◽

2009 ◽

Vol 2009 ◽

pp. 1-8 ◽

Cited By ~ 4

Author(s):

Janet Myhre ◽

Daniel R. Jeske ◽

Michael Rennie ◽

Yingtao Bi

Keyword(s):

Linear Regression ◽

Least Squares ◽

Performance Metrics ◽

Weighted Least Squares ◽

Ordinary Least Squares ◽

Automated System ◽

Least Squares Regression ◽

Data Set ◽

Tolerance Intervals ◽

Power Of The Test

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.

Download Full-text

Available instruction-level parallelism for superscalar and superpipelined machines

Proceedings of the third international conference on Architectural support for programming languages and operating systems - ASPLOS-III ◽

10.1145/70082.68207 ◽

1989 ◽

Cited By ~ 165

Author(s):

N. P. Jouppi ◽

D. W. Wall

Keyword(s):

Instruction Level Parallelism ◽

Level Parallelism

Download Full-text

Topic 8 Parallel Computer Architecture and Instruction-Level Parallelism

Euro-Par 2003 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-45209-6_78 ◽

2003 ◽

pp. 541-542

Author(s):

Stamatis Vassiliadis ◽

Nikitas Dimopoulos ◽

Jean-Francois Collard ◽

Arndt Bode

Keyword(s):

Computer Architecture ◽

Parallel Computer ◽

Instruction Level Parallelism ◽

Level Parallelism

Download Full-text

Distribution of Data Dissemination Model using DTN Routing Protocols

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5828.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3014-3019

Keyword(s):

Routing Protocol ◽

Routing Protocols ◽

Buffer Capacity ◽

Performance Metrics ◽

Data Dissemination ◽

Real Life ◽

Research Paper ◽

Performance Comparison ◽

Dtn Routing ◽

The Impact

In this research paper compare the protocol’s performance together with the experimental results of optimal routing using real-life scenarios of vehicles and pedestrians roaming in a city. In this research paper, conduct several simulation comparison experiments(in the NS2 Software) to show the impact of changing buffer capacity, packet lifetime, packet generation rate, and number of nodes on the performance metrics. This research paper is concluded by providing guidelines to develop an efficient DTN routing protocol. To the best of researcher(Parameswari et al.,) knowledge, this work is the first to provide a detailed performance comparison among the diverse collection of DTN routing protocols.

Download Full-text