HIGH LEVEL SIMULATION OF SVP MANY-CORE SYSTEMS

2011 ◽  
Vol 21 (04) ◽  
pp. 413-438 ◽  
Author(s):  
M. IRFAN UDDIN ◽  
MICHIEL W. VAN TOL ◽  
CHRIS R. JESSHOPE

The Microgrid is a many-core architecture comprising multiple clusters of fine-grained multi-threaded cores. The SVP API supported by the cores allows for the asynchronous delegation of work to different clusters of cores which can be acquired dynamically. We want to explore the execution of complex applications and their interaction with dynamically allocated resources. To date, any evaluation of the Microgrid has used a detailed emulation with a cycle accurate simulation of the execution time. Although the emulator can be used to evaluate small program kernels, it only executes at a rate of 100K instructions per second, divided over the number of emulated cores. This makes it inefficient to evaluate a complex application executing on many cores using dynamic allocation of clusters. In order to obtain a more efficient evaluation we have developed a co-simulation environment that executes high level SVP control code but which abstracts the scheduling of the low-level threads using two different techniques. The co-simulation is evaluated for both performance and simulation accuracy.

Author(s):  
Irfan Uddin

The microthreaded many-core architecture is comprised of multiple clusters of fine-grained multi-threaded cores. The management of concurrency is supported in the instruction set architecture of the cores and the computational work in application is asynchronously delegated to different clusters of cores, where the cluster is allocated dynamically. Computer architects are always interested in analyzing the complex interaction amongst the dynamically allocated resources. Generally a detailed simulation with a cycle-accurate simulation of the execution time is used. However, the cycle-accurate simulator for the microthreaded architecture executes at the rate of 100,000 instructions per second, divided over the number of simulated cores. This means that the evaluation of a complex application executing on a contemporary multi-core machine can be very slow. To perform efficient design space exploration we present a co-simulation environment, where the detailed execution of instructions in the pipeline of microthreaded cores and the interactions amongst the hardware components are abstracted. We present the evaluation of the high-level simulation framework against the cycle-accurate simulation framework. The results show that the high-level simulator is faster and less complicated than the cycle-accurate simulator but with the cost of losing accuracy.


Author(s):  
Weichun Liu ◽  
Xiaoan Tang ◽  
Chenglin Zhao

Recently, deep trackers based on the siamese networking are enjoying increasing popularity in the tracking community. Generally, those trackers learn a high-level semantic embedding space for feature representation but lose low-level fine-grained details. Meanwhile, the learned high-level semantic features are not updated during online tracking, which results in tracking drift in presence of target appearance variation and similar distractors. In this paper, we present a novel end-to-end trainable Convolutional Neural Network (CNN) based on the siamese network for distractor-aware tracking. It enhances target appearance representation in both the offline training stage and online tracking stage. In the offline training stage, this network learns both the low-level fine-grained details and high-level coarse-grained semantics simultaneously in a multi-task learning framework. The low-level features with better resolution are complementary to semantic features and able to distinguish the foreground target from background distractors. In the online stage, the learned low-level features are fed into a correlation filter layer and updated in an interpolated manner to encode target appearance variation adaptively. The learned high-level features are fed into a cross-correlation layer without online update. Therefore, the proposed tracker benefits from both the adaptability of the fine-grained correlation filter and the generalization capability of the semantic embedding. Extensive experiments are conducted on the public OTB100 and UAV123 benchmark datasets. Our tracker achieves state-of-the-art performance while running with a real-time frame-rate.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Ilia Lebedev ◽  
Christopher Fletcher ◽  
Shaoyi Cheng ◽  
James Martin ◽  
Austin Doupnik ◽  
...  

We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.


Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 204-216
Author(s):  
Xinyue Ye ◽  
Lian Duan ◽  
Qiong Peng

Spatiotemporal prediction of crime is crucial for public safety and smart cities operation. As crime incidents are distributed sparsely across space and time, existing deep-learning methods constrained by coarse spatial scale offer only limited values in prediction of crime density. This paper proposes the use of deep inception-residual networks (DIRNet) to conduct fine-grained, theft-related crime prediction based on non-emergency service request data (311 events). Specifically, it outlines the employment of inception units comprising asymmetrical convolution layers to draw low-level spatiotemporal dependencies hidden in crime events and complaint records in the 311 dataset. Afterward, this paper details how residual units can be applied to capture high-level spatiotemporal features from low-level spatiotemporal dependencies for the final prediction. The effectiveness of the proposed DIRNet is evaluated based on theft-related crime data and 311 data in New York City from 2010 to 2015. The results confirm that the DIRNet obtains an average F1 of 71%, which is better than other prediction models.


2021 ◽  
Author(s):  
◽  
Jack Voldemars Purvis

<p>Live coding focuses on improvising content by coding in textual interfaces, but this reliance on low level text editing impairs usability by not allowing for high level manipulation of content. VJing focuses on remixing existing content with graphical user interfaces and hardware controllers, but this focus on high level manipulation does not allow for fine-grained control where content can be improvised from scratch or manipulated at a low level. This thesis proposes the code jockey practice (CJing), a new hybrid practice that combines aspects of live coding and VJing practice. In CJing, a performer known as a code jockey (CJ) interacts with code, graphical user interfaces and hardware controllers to create or manipulate real-time visuals. CJing harnesses the strengths of live coding and VJing to enable flexible performances where content can be controlled at both low and high levels. Live coding provides fine-grained control where content can be improvised from scratch or manipulated at a low level while VJing provides high level manipulation where content can be organised, remixed and interacted with. To illustrate CJing, this thesis contributes Visor, a new environment for live visual performance that embodies the practice. Visor's design is based on key ideas of CJing and a study of live coders and VJs in practice. To evaluate CJing and Visor, this thesis reflects on the usage of Visor in live performances and feedback gathered from creative coders, live coders, and VJs who experimented with the environment.</p>


2021 ◽  
Author(s):  
◽  
Jack Voldemars Purvis

<p>Live coding focuses on improvising content by coding in textual interfaces, but this reliance on low level text editing impairs usability by not allowing for high level manipulation of content. VJing focuses on remixing existing content with graphical user interfaces and hardware controllers, but this focus on high level manipulation does not allow for fine-grained control where content can be improvised from scratch or manipulated at a low level. This thesis proposes the code jockey practice (CJing), a new hybrid practice that combines aspects of live coding and VJing practice. In CJing, a performer known as a code jockey (CJ) interacts with code, graphical user interfaces and hardware controllers to create or manipulate real-time visuals. CJing harnesses the strengths of live coding and VJing to enable flexible performances where content can be controlled at both low and high levels. Live coding provides fine-grained control where content can be improvised from scratch or manipulated at a low level while VJing provides high level manipulation where content can be organised, remixed and interacted with. To illustrate CJing, this thesis contributes Visor, a new environment for live visual performance that embodies the practice. Visor's design is based on key ideas of CJing and a study of live coders and VJs in practice. To evaluate CJing and Visor, this thesis reflects on the usage of Visor in live performances and feedback gathered from creative coders, live coders, and VJs who experimented with the environment.</p>


2021 ◽  
Vol 18 (4) ◽  
pp. 1-25
Author(s):  
Paul Metzger ◽  
Volker Seeker ◽  
Christian Fensch ◽  
Murray Cole

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Margarita Khomyakova

The author analyzes definitions of the concepts of determinants of crime given by various scientists and offers her definition. In this study, determinants of crime are understood as a set of its causes, the circumstances that contribute committing them, as well as the dynamics of crime. It is noted that the Russian legislator in Article 244 of the Criminal Code defines the object of this criminal assault as public morality. Despite the use of evaluative concepts both in the disposition of this norm and in determining the specific object of a given crime, the position of criminologists is unequivocal: crimes of this kind are immoral and are in irreconcilable conflict with generally accepted moral and legal norms. In the paper, some views are considered with regard to making value judgments which could hardly apply to legal norms. According to the author, the reasons for abuse of the bodies of the dead include economic problems of the subject of a crime, a low level of culture and legal awareness; this list is not exhaustive. The main circumstances that contribute committing abuse of the bodies of the dead and their burial places are the following: low income and unemployment, low level of criminological prevention, poor maintenance and protection of medical institutions and cemeteries due to underperformance of state and municipal bodies. The list of circumstances is also open-ended. Due to some factors, including a high level of latency, it is not possible to reflect the dynamics of such crimes objectively. At the same time, identification of the determinants of abuse of the bodies of the dead will reduce the number of such crimes.


2021 ◽  
pp. 002224372199837
Author(s):  
Walter Herzog ◽  
Johannes D. Hattula ◽  
Darren W. Dahl

This research explores how marketing managers can avoid the so-called false consensus effect—the egocentric tendency to project personal preferences onto consumers. Two pilot studies were conducted to provide evidence for the managerial importance of this research question and to explore how marketing managers attempt to avoid false consensus effects in practice. The results suggest that the debiasing tactic most frequently used by marketers is to suppress their personal preferences when predicting consumer preferences. Four subsequent studies show that, ironically, this debiasing tactic can backfire and increase managers’ susceptibility to the false consensus effect. Specifically, the results suggest that these backfire effects are most likely to occur for managers with a low level of preference certainty. In contrast, the results imply that preference suppression does not backfire but instead decreases false consensus effects for managers with a high level of preference certainty. Finally, the studies explore the mechanism behind these results and show how managers can ultimately avoid false consensus effects—regardless of their level of preference certainty and without risking backfire effects.


Sign in / Sign up

Export Citation Format

Share Document