Abstract
The next two chapters provide a deeper look into how to manage data. There are two different approaches that complement each other: Unified Shared Memory (USM) and buffers. USM exposes a different level of abstraction for memory than buffers—USM has pointers, and buffers are a higher-level interface. This chapter focuses on USM. The next chapter will focus on buffers.
Abstract
Chapter 10.1007/978-1-4842-5574-2_2 introduced us to the mechanisms that direct work to a particular device—controlling where code executes. In this chapter, we explore how to adapt to the devices that are present at runtime.
Abstract
Kernel programming originally became popular as a way to program GPUs. As kernel programming is generalized, it is important to understand how our style of programming affects the mapping of our code to a CPU.
Abstract
Vectors are collections of data. These can be useful because parallelism in our computers comes from collections of compute hardware, and data is often processed in related groupings (e.g., the color channels in an RGB pixel). Sound like a marriage made in heaven? It is so important, we’ll spend a chapter discussing the merits of vector types and how to utilize them. We will not dive into vectorization in this chapter, since that varies based on device type and implementations. Vectorization is covered in Chapters 10.1007/978-1-4842-5574-2_15 and 10.1007/978-1-4842-5574-2_16.
Abstract
This chapter is home to a number of pieces of useful information, practical tips, advice, and techniques that have proven useful when programming SYCL and using DPC++. None of these topics are covered exhaustively, so the intent is to raise awareness and encourage learning more as needed.
Abstract
We need to discuss our role as the concert master for our parallel programs. The proper orchestration of a parallel program is a thing of beauty—code running full speed without waiting for data, because we have arranged for all data to arrive and depart at the proper times. Code well-decomposed to keep the hardware maximally busy. It is the thing that dreams are made of!
Abstract
In Chapter 10.1007/978-1-4842-5574-2_4, we discussed ways to express parallelism, either using basic data-parallel kernels, explicit ND-range kernels, or hierarchical parallel kernels. We discussed how basic data-parallel kernels apply the same operation to every piece of data independently. We also discussed how explicit ND-range kernels and hierarchical parallel kernels divide the execution range into work-groups of work-items.
Abstract
We have spent the entire book promoting the art of writing our own code. Now we finally acknowledge that some great programmers have already written code that we can just use. Libraries are the best way to get our work done. This is not a case of being lazy—it is a case of having better things to do than reinvent the work of others. This is a puzzle piece worth having.
Abstract
When we are at our best as programmers, we recognize patterns in our work and apply techniques that are time proven to be the best solution. Parallel programming is no different, and it would be a serious mistake not to study the patterns that have proven to be useful in this space. Consider the MapReduce frameworks adopted for Big Data applications; their success stems largely from being based on two simple yet effective parallel patterns—map and reduce.