This chapter is a self-contained tutorial which tells you how to get started with parallel programming and how to design and implement parallel algorithms in a structured way using supersteps. It introduces a simple target architecture for designing parallel algorithms, the bulk synchronous parallel (BSP) computer. Using the computation of the inner product of two vectors as an example, the chapter shows how an algorithm is designed, hand in hand with its cost analysis. The inner-product algorithm is implemented in a short program that demonstrates the most important primitives of the communication library, BSPlib. Furthermore, a benchmarking program is given for measuring the BSP parameters of a parallel computer. Its use is demonstrated on a desktop computer and a supercomputer. Finally, a parallel regular sampling sort algorithm is presented, implemented, and tested.