Automatic Differentiation of Adaptive Solvers Does Not Always Converge to the Correct Derivatives
Please login to view abstract download link
Automatic differentiation (AD) of differential equation solvers has become essential for scientific machine learning, enabling gradient-based optimization of neural differential equations and physics-informed models [1]. When differentiating through adaptive time-stepping integrators, AD generates an extended system of ordinary differential equations comprising both the original state variables and sensitivity equations. A critical but under-appreciated issue arises: the adaptive algorithm controls local truncation error only on the original ODE variables, not on the sensitivity variables propagated alongside them. We demonstrate this error control mismatch using JAX's automatic differentiation of adaptive ODE solvers on a simple linear ODE with a known closed-form sensitivity solution. Even with tolerances set to 10-14, the computed derivatives exhibit 60% relative error, and we show this error does not decrease as tolerances approach zero. We provide a mathematical derivation explaining why solver tolerances fail to control derivative error, fully characterizing this behavior. This is not a software defect but a fundamental limitation of naive discrete adjoint approaches applied to adaptive algorithms [2]. We then showcase how automatic differentiation in Julia's DifferentialEquations.jl framework [3] has been modified to avoid this behavior through error-controlled sensitivity propagation and continuous adjoint methods, restoring correct convergence of computed gradients. Using open-source implementations and educational materials, we illustrate these subtleties for scientific computing courses. Understanding when standard AD pipelines silently produce incorrect gradients is essential for reliable deployment of differentiable simulation.
