International Journal of Control, Vol.88, No.9, 1702-1711, 2015
On Pantoja's problem allegedly showing a distinction between differential dynamic programming and stagewise Newton methods
In this journal, Pantoja has described a deterministic optimal control problem in which his stagewise Newton procedure yields an exact optimal solution whereas differential dynamic programming (DDP) does not. This problem is also quoted by Coleman and Liao (in another journal) as a correct instance with some emphasis on the advantage of Pantoja's procedure over DDP. Pantoja argues that the problem involves nonlinear dynamics in his terminal-cost problem formulation, and therefore DDP and stagewise Newton methods are different. The purpose of this paper is to show that, while for a general nonlinear optimal control problem DDP and Pantoja's method differ, his problem has a special structure such that it is a false example of this claim; more specifically, the reason is twofold. First, he made an obvious algebraic error in his computation. Second, his example is equivalent to a problem of linear dynamics and quadratic criterion (LQ in short). It is true that when a general LQ that involves quadratic stage costs is transformed to a terminal-cost problem, the nonlinear (quadratic) state dynamics would result from each quadratic stage cost of the LQ. Yet the LQ-solution procedure remains the same, i.e., with the same discrete (Riccati) recurrence equations that can be derived by classical dynamic programming. This means that DDP obtains the exact minimum point of the transformed terminal-cost criterion just as does the Newton method. Using a standard LQ of general type, we formally prove this equivalence in its terminal-cost version even with nonlinear state dynamics.
Keywords:stagewise Newton;problems of linear dynamics and quadratic criterion (LQ problem);differential dynamic programming (DDP)