Quantitative systems pharmacology (QsP) may need to change in order to accommodate machine learning (ML), but ML may need to change to work for QsP. Here we investigate the use of neural network surrogates of stiff QsP models. This technique reduces and accelerates QsP models by training ML approximations on simulations. We describe how common neural network methodologies, such as residual neural networks, recurrent neural networks, and physics/biologically-informed neural networks, are fundamentally related to explicit solvers of ordinary differential equations (ODEs). Similar to how explicit ODE solvers are unstable on stiff QsP models, we demonstrate how these ML architectures see similar training instabilities. To address this issue, we showcase methods from scientific machine learning (SciML) which combine techniques from mechanistic modeling with traditional deep learning. We describe the continuous-time echo state network (CTESN) as the implicit analogue of ML architectures and showcase its ability to accurately train and predict on these stiff models where other methods fail. We demonstrate the CTESN's ability to surrogatize a production QsP model, a >1,000 ODE chemical reaction system from the SBML Biomodels repository, and a reaction-diffusion partial differential equation. We showcase the ability to accelerate QsP simulations by up to 56x against the optimized DifferentialEquations.jl solvers while achieving <5% relative error in all of the examples. This shows how incorporating the numerical properties of QsP methods into ML can improve the intersection, and thus presents a potential method for accelerating repeated calculations such as global sensitivity analysis and virtual populations.