A 18.4M Triangles/s 122.6 mW Tile Co-Processor for Embedded GPU Systems
This paper presents an efficient and accurate tile co-processor architecture which can be used in the tile based rendering systems. The design involves two key components, the vertex processing unit and the triangle tiling unit. The former part is used to get the vertices transformed, clipped and projected to generate the triangle list which located in the view frustum while the latter one reads in the triangle data and determines the tile list which indicates tiles that each triangle covers. A modified Bounding BOX (BBOX) test pipeline and a mask screening technology for different overlap types is proposed and employed in the design in order to get faster triangle binning with lower power consumption. The proposed architecture works at the frequency of 270 MHz, gains 18.4 M triangles tiling/sec with a power consumption less than 122.6 mW. The chip is implemented in 0.13 um CMOS technology and consumes 2.5 x 2.5 mm2 totally.