In this work, we present a novel bitsliced high-performance Viterbi algorithm suitable for high-throughput and data-intensive communication. A new column-major data representation scheme coupled with the bitsliced architecture is employed in our proposed Viterbi decoder that enables the maximum utilization of the parallel processing units in modern parallel accelerators. With the help of the proposed alteration of the data scheme, instead of the conventional bit-by-bit operations, 32-bit chunks of data are processed by each processing unit. This means that a single bitsliced parallel Viterbi decoder is capable of decoding 32 different chunks of data simultaneously. Here, the Viterbi’s Add-Compare-Select procedure is implemented with our proposed bitslicing technique, where it is shown that the bitsliced operations for the Viterbi internal functionalities are efficient in terms of their performance and complexity. We have achieved this level of high parallelism while keeping an acceptable bit error rate performance for our proposed methodology. Our suggested hard and soft-decision Viterbi decoder implementations on GPU platforms outperform the fastest previously proposed works by
4.3{\times }
and
2.3{\times }
, achieving 21.41 and 8.24 Gbps on Tesla V100, respectively.