TD-Gammon       


Artificial Intelligence Accomplishment | 1990s

IBM researchers: Gerald Tesauro

Where the work was done: T.J. Watson Research Center

What we accomplished: Gerald Tesauro (pictured) developed an innovative combination of nonlinear function approximation with reinforcement learning (RL) techniques and showed it could achieve success in large-scale complex decision making problems. The approach was tested in a self-teaching backgammon program called TD-Gammon. Starting from a random initial strategy, and learning its strategy almost entirely from self-play, TD-Gammon achieved a remarkable level of performance. When operating without any lookahead search, it demonstrated a highly sophisticated sense of positional judgement rivaling that of human masters. When its positional evaluation was augmented by very shallow (2-ply, 3-ply) search procedures, the program matched and ultimately surpassed the playing ability of world-champion human players. This achievement has been highly influential in the AI and computer gaming communities, and has inspired numerous real-world applications of similar RL techniques.

Related links: Temporal difference learning and TD-Gammon, March 1995 paper in Communications of the ACM.

Image credit: IBM Think Magazine, December 1992

BACK TO ARTIFICIAL INTELLIGENCE

BACK TO IBM RESEARCH ACCOMPLISHMENTS