Learning built on dynamic programming principles

Date & Place

August 1--3, 2005, 4:30 -- 6:00 pm.
Department of Mechanical Engineering, Tokyo Metropolitan University
(Organized and arranged by Professor Naoyuki Kobota).


Eiji Mizutani
Department of Computer Science, Tsing Hua University, Taiwan.


Dynamic Programming (DP) is an optimization procedure that is particularly applicable to problems requiring a sequence of interrelated decisions. In some contexts, DP may be introduced as just a memory-based algorithm to avoid computational overheads for subproblem solutions; e.g., speed up the process of computing Fibonacci numbers, but this should not be called a DP algorithm. DP attacks optimization problems, in which one state (vector) is transformed by a decision (vector) into another at each step. Two ``classical'' DP books are encouraged to consult (i.e., get back to first principles) for a wide variety of DP examples: ``Applied Dynamic Programming (Richard Bellman & Stuart Dreyfus, 1962)'' and ``The Art and Theory of Dynamic Programming (Stuart Dreyfus & Averill Law, 1977).'' As the second book title suggests, one might immediately realize why DP is said to be more art than science.

This tutorial will provide an introduction to DP principles, showing DP-formulations to several well-discussed textbook examples for concreteness, and then advance to two major topics below:

First, a widely-employed first-order backpropagation for multilayer perceptron neural-network learning will be derived by using a so-called ``invariant imbedding'' (with non-optimal cost-to-go value functions) in general Bolza-type optimal-control problems. Furthermore, learning with weight decay and hidden node teaching will be described in the same framework. (This is based on joint work with Stuart Dreyfus; please refer to the lecture note.)

Second, DTW (dynamic time warping) algorithms will be highlighted. In computational biology, various ``sequence alignment'' methods have been developed to measure similarities (or distances between given signal sequences) for pattern analysis on DNA and protein structure. Many of those methods are virtually a ``forward'' DP algorithm with the minimum cost-so-far (optimal value) functions; so is DTW. For discussion purposes, other related topics may be briefly addressed, such as temporal-difference reinforcement learning with approximate value & policy functions (also known as neuro-DP), K-means clustering from a policy-iteration perspective, and A*-search (a popular artificial intelligence search with cost-so-far and cost-to-go functions combined) as well as its variants for robot path-planning.

Lecture notes & related papers

Lecture notes

First-Order BP

Second-Order BP


To Students:

Students are encouraged to read in advance the lecture note (see above), and bring a code (just one-page print-out) of multilayer-perceptron (MLP) backpropagation (the backward-pass process only), which may be written in C/C++, Matlab, or any other computer languages. Nowadays, many sample codes are available on the web; hence, easy to find!