Learning to Move Autonomously
 
L. Ikemoto, O. Arikan, D. Forsyth
Learning to Move Autonomously in a Hostile World
SIGGRAPH 2005 Technical Sketch
Berkeley Technical Report
 
 
Abstract
This paper describes a framework for controlling autonomous agents.  We estimate an optimal controller using a novel reinforcement learning method based on stochastic optimization.  The agent's skeletal configurations are taken from a motion graph which contains seamless transitions, guaranteeing smooth, natural-looking motion.  The controller learns a parametric value function for choosing transitions at the branch points in the motion graph.  Since this query can be completed quickly, synthesis is performed online in real-time.  We couple the local controller with a global path planner to create a system which produces realistic motion even in a rapidly changing environment.
 
A shorter version of this paper appeared as a technical sketch at SIGGRAPH 2005.
Download
Examples
This sequence was recorded from a live, interactive demo, in which the user controls the crate and can move it anywhere at anytime.  A virtual agent is tasked with traveling from the left of the scene to the target on the right.  While the agent is running towards the target, the user moves the crate into the position shown on the left, blocking the agent's intended path.  The local motion planner selects frames that avoid hitting the object but still make progress toward the goal (A).  The user again moves the object into the agent's path, and again the system successfully copes (B).  Altogether, the user tries to block the agent by moving the crate 3 times, but the agent still dodges it and arrives at the target position seamlessly.
In this scenario, the agent must hide from the enemy skull who scares him. On the left, the agent begins walking towards the goal, but notices said skull around the corner.  He hides just behind the crate (inset) until the enemy disappears.  The controller then leads the agent to his goal position (right).  This sequence demonstrates that the controller independently learned the emergent behavior of using obstacles to hide from enemies; behaviors are not explicitly encoded. Note that this controller is the same one that produced the first example above.