The fifth Generation (5G) mobile networks use millimeter Waves (mmWaves) to offer giga bit data rates. However, unlike microwaves, mmWave links are prone to user and topographic dynamics. They easily get blocked and end up forming irregular cell patterns for 5G. This in turn cause too early, too late, or wrong handoffs (HOs). To mitigate HO challenges, sustain connectivity and avert unnecessary HO, we propose a HO scheme based on Jump Markov Linear System (JMLS) and Deep Reinforcement Learning (DRL). JMLS is widely known to account for abrupt changes in system dynamics. DRL likewise emerges as an artificial intelligence technique for learning highly dimensional and time-varying behaviors. We combine the two techniques to account for time-varying, abrupt, and irregular changes in mmWave link behaviour by predicting likely deterioration patterns of target links. The prediction is optimized by meta training techniques that also reduces training sample size. Thus, the JMLS-DRL platform formulates intelligent and versatile HO policies for 5G. Results show our proposed prediction scheme about target link behavior post HO to be highly reliable. The scheme also averts unnecessary HOs thus ably supports longer dew time.