As quantum computers become more affordable and commonplace, existing security systems that are based on classical cryptographic primitives, such as RSA and
Elliptic Curve Cryptography
(
ECC
), will no longer be secure. Hence, there has been interest in designing
post-quantum cryptographic
(
PQC
) schemes, such as those based on
lattice-based cryptography
(
LBC
). The potential of LBC schemes is evidenced by the number of such schemes passing the selection of NIST PQC Standardization Process Round-3. One such scheme is the Crystals-Dilithium signature scheme, which is based on the hard module-lattice problem. However, there is no efficient implementation of the Crystals-Dilithium signature scheme. Hence, in this article, we present a compact hardware architecture containing elaborate modular multiplication units using the Karatsuba algorithm along with smart generators of address sequence and twiddle factors for NTT, which can complete polynomial addition/multiplication with the parameter setting of Dilithium in a short clock period. Also, we propose a fast software/hardware co-design implementation on
Field Programmable Gate Array
(
FPGA
) for the Dilithium scheme with a tradeoff between speed and resource utilization. Our co-design implementation outperforms a pure C implementation on a Nios-II processor of the platform Altera DE2-115, in the sense that our implementation is 11.2 and 7.4 times faster for signature and verification, respectively. In addition, we also achieve approximately 51% and 31% speed improvement for signature and verification, in comparison to the pure C implementation on processor ARM Cortex-A9 of ZYNQ-7020 platform.