Genetic
Programming Maze Marchers
(By Forrest Sondahl, 2005)
The applet requires Java 1.4.1 or higher. It will not run on Windows 95 or Mac OS 8 or 9. Mac users must have OS X 10.2.6 or higher and use a browser that supports Java 1.4. (Safari works, IE does not. Mac OS X comes with Safari. Open Safari and set it as your default web browser under Safari/Preferences/General.) On other operating systems, you may obtain the latest Java plugin from Sun's Java site.
Created with NetLogo
* View/download model file:
Genetic
Programming Maze Marchers.nlogo
View Analysis for this
Model * Back to Forrest's Final Project
Main Page.
This model demonstrates the use of genetic programming to evolve
walking instructions to solve a particular maze. Genetic programming
(sometimes abbreviated GP) is a technique in computer science that uses
the concepts of natural selection, genetics, and evolution to generate
computer programs to solve a particular problem. In this case, the
problem is navigating mazes. (This model is similar to Genetic
Programming Maze Rules, but instead of evolving movement rules for the
agents, this model evolves the step by step commands an agent should
execute to solve a particular maze.)
In the world of biology, species evolve by means of natural
selection and the interactions of DNA. Genetic Programming takes these
ideas, and applies them in the field of computer science. Given a
problem, the goal is to evolve a computer program that can solve the
problem. It starts with a population of randomly generated computer
programs. (The ingredients of these randomly generated computer
programs are chosen based on the problem that is to be solved.) Each
program is run, and its performance is measured by a "fitness
function", which reflects how good each program is at solving the
problem. Then programs are chosen from the population to "reproduce".
The programs are chosen randomly, but with a weighting mechanism that
makes it more likely that the more "fit" programs are chosen.
(Analogously in the biological world, defective organisms can get lucky
and pass on their DNA, but the more fit organisms have a better chance
of doing so.) There are three forms of reproduction that occur:
After a full new population of programs is formed, the old
population is discarded, and each of the new programs is run and their
fitness is measured. Reproduction occurs, and the cycle continues,
until a program with a certain level of fitness is found -- namely, a
program that is good enough to solve the problem given. The word
"until" implies that such a state will be reached. This isn't
necessarily true. For one thing, the problem posed might be an
impossible one (e.g. "What is the answer to life the universe and
everything?"). Or it could be solvable, but not with the ingredients
that the programs are made from (e.g. "What is the solution to a
quadratic equation?" with programs made only of "+" and "-" operators.)
But even if the solution is within the realm of possible programs
generated by the genetic programming process, success is by no means
guaranteed. Genetic Programming is a stochastic, rather than
deterministic, approach to solving problems. Currently there are, in
fact, no proofs that genetic programming works -- merely empirical
evidence showing that in some situations it does. The success of the
genetic programming process is highly dependent on the choice of
ingredients for the programs and a well designed "fitness function"
that "leads" the population in the right direction toward the goal.
Other important parameters include the size of the population, the
length of each program's code, and the relative probability of each of
the reproduction mechanisms. For more information on genetic
programming, see these web sites:
http://www.geneticprogramming.com/Tutorial/index.html
http://www.genetic-programming.com/gpanimatedtutorial.html
In many NetLogo models, agents are given predetermined rules, and
then the emergent behaviors that form through the interactions of these
agents are studied. In contrast, this model is starts with a desired
behavior (solving a maze) and works on trying to discover the code that
the agents should execute to achieve this behavior, through use of
Genetic Programming, as described in the section above.
In this model, the maze-solving programs are represented by
"codeturtles". Codeturtles each have a piece of NetLogo code assigned
to them, and it's their job to perform it. Codeturtles will then be
chosen, based on their fitness, to reproduce and create another
generation of codeturtles.
The ingredients that make up the code are extremely simple -- only
three commands:
Thus, a small example program might look like:
maze-turn-right
maze-forward maze-turn-left maze-turn-right maze-forward maze-forward maze-turn-left maze-forward maze-turn-left |
(The internal representation of the program is actually a tree
structure, since this has been often found to produce better results
for genetic programming. The code trees are then compiled into the form
you see above, to be run as NetLogo code.)
The fitness function, which measures each codeturtles progress, is a
simple one: "What is the geometric distance to the goal square?" The
lower this number is, the more fit a turtle is. It is easy, of course,
to create mazes where this is clearly not the case (e.g. load
"maze4.txt") but for many mazes, this distance measurement serves as a
decent heuristic. A better (in fact, perfect) fitness function would
count the minimum number of open path squares that the codeturtle would
have to cross to reach the goal. However, if we had such a fitness
function, then we would already have some algorithm for computing the
solution to our maze! And if we had such an algorithm, then why would
we be trying to evolve codeturtles to solve it? Using the solution to
find the solution seems like a cheap trick that turns this model
entirely into a toy problem. Also, fitness is computed for each
codeturtle after each step it takes -- not just at the end of the run.
Since fitness is calculated so often, the efficiency of computing
fitness is important, and this is another advantage for geometric
distance fitness, as opposed to true walking distance to goal fitness.
1. Set the MAZE-FILE chooser to the maze file you want to use.
2. Adjust the slider parameters (see below), or use the default
settings.
3. Press the SETUP button.
4A. Press the GO button to begin the simulation
4B. Press the STEP button to go through one generation at a time.
5. Watch the View, to see the codeturtles attempt to solve the maze
with their given code DNA.
6. Watch the FITNESS plot, to see how the population is doing.
7. If a codeturtle successfully gets to the end of the maze, then the
simulation will stop. To stop it earlier, press the GO button again.
(It may take some time for the current generation to finish.)
8. Press the SHOW-BEST button to see the code for the current best
(most fit) codeturtle.
Parameters:
MAZE-FILE: The maze file to be loaded.
POPULATION-SIZE: The number of codeturtles in each generation
INITIAL-CODE-MAX-DEPTH: The maximum depth of randomly generated code
trees, that codeturtles start with. (It also affects the size of
mutations that occur). In general, a larger number generally means
longer programs are created.
BRANCH-CHANCE: Controls the amount of branching in the generation of
random code trees. Again, a larger number generally means longer
programs are created.
CLONE-CHANCE, MUTATE-CHANCE, CROSSOVER-CHANCE: These three sliders
control the relative probability of each genetic operation occurring,
with respect to the others.
(Examples: If the sliders are set to 10, 10, 10, then there is a 1/3
chance of each genetic operation happening. If the sliders are set to
10, 10, 20, then there is a 25% chance of cloning, 25% chance of
mutation, and 50% chance of crossover. If the sum of these three
sliders is 100, then each slider represents the percent chance of that
genetic operation being chosen to create an offspring for the new
generation.)
FIX-RANDOM-SEED?: If true, then RANDOMSEED is used to start the
process. This allows a particular run to be reproduced exactly, and
thus examined more closely, (provided that the parameters are the
same). If false, then RANDOMSEED is not used.
RANDOMSEED: This is the number used to seed the random number
generator, if FIX-RANDOM-SEED? is true, to allow for reproducible
results.
Notes:
You cannot see the codeturtles while they are moving even if you slow
down the speed slider, because of the way their code is executed, so
what you see on the View is their positions after they have moved for
that generation. They leave a colored trail behind them, which is not
cleared between generations. This chronicles the progress of the
turtles.
The best fit codeturtle (or codeturtles, when there is a tie) in each
generation is shown as a neutral face. If a codeturtle finds the goal,
then it is shown as a smiley face.
For humans, some mazes are easier to solve than others. Likewise,
some mazes are easier for this model to solve than others. Which of the
five included mazes are easiest for this model, and which are hardest?
Why might "maze4.txt" be difficult for this model, even though it
appears to be a very trivial maze to our eyes? Think about the fitness
function, as well as other factors.
The average and best fitness values shown in the plot sometimes go up
and sometimes go down. Why do you think this is? Does genetic
programming always find a solution?
Sometimes you will see the best-fitness plot flatten out for a long
time, because the best codeturtle keeps going to the same dead-end
place which is fairly close to the goal (e.g. use RANDOMSEED=1 on
"maze1.txt" for 40 generations). This is called a "local minimum." This
is because the codeturtles have gotten as close as possible in their
current neighborhood. It is not a "global minimum", since it is in fact
possible for the codeturtles to get all the way to the solution -- but
in order to get there, they first have to move to places that are
farther away from the goal before they can get closer. The concepts of
local and global mimima are applicable to many problems in computer
science and other fields. While some algorithms have no way to escape
from local minima, genetic programming often manages to overcome them
(e.g. keep watching RANDOMSEED=1 on "maze1.txt" until generation 112 --
you will see it escape the first local minimum, and find another, until
it eventually finds the goal.) What is it about genetic programming
that allows it to escape? What parameters in the model could be changed
to better overcome local minima?
The colors of the codeturtles have some meaning. They are initially
randomly colored, but:
* When a codeturtle results from cloning, it has the same color as its
parent.
* When a codeturtle results from crossover, it has a color averaging
its two parents.
* When a codeturtle is mutated, it has a random color.
Try changing the POPULATION-SIZE slider. With a small population,
each generation moves by more quickly, but it generally takes more
generations to find a solution. Also, small populations mean increased
inbreeding. What affect does this have on the process? How low can the
population go such that the process still finds a solution on a fairly
complicated maze such as "maze0.txt" for instance? What does the plot
look like when you have a population size of just 1 or 2?
Try changing the INITIAL-CODE-MAX-DEPTH and the BRANCH-CHANCE. Note
that if the codeturtles' code gets long, then the codeturtles run more
slowly. Try starting with a very short code length (e.g.
INITIAL-CODE-MAX-DEPTH 4, BRANCH-CHANCE .80). After one generation, the
best-fit codeturtle's code may be only 6 lines long. It will take many
more commands than this simply to walk the distance to the goal. Does
it take longer for genetic programming find a solution, when starting
with code fragments that are far too inadequate? How does the code grow
in size?
Crossover is usually the driving force of genetic programming. Try
moving the genetic operations sliders around, and run the model. What
happens if you only allow cloning, and no mutation or crossover? Only
mutation? Only crossover?
There is a model called Genetic Programming Maze Maker, which can be
used to create maze files. Create several interesting maze files, and
add them to the MAZE-FILE chooser. What types of mazes does this model
do well with? What types of mazes are hard for the model to handle? Are
there mazes that humans can solve that this model can't?
Sometimes the codeturtle with the best fitness in a given generation
doesn't make it into the next generation. The probability is in its
favor, but nothing is guaranteed. Some implementations of Genetic
Programming DO guarantee that the best program so far is always cloned
to the new generation. This would mean that the "best fitness" plot
would be strictly non-increasing. Think of a situation where this could
lead to poor results. Make changes to the GP Library framework, such
that the best codeturtle is always cloned. (Or better yet, make it so
that it's a switch that can be set.) Use BehaviorSpace to decide if it
improves results or not.
This maze is built on a rectangular grid. Instead of thinking of it as
a grid, imagine that each open floor square is a small closed room,
connected to other rooms by teleport devices. In the current model, the
teleport devices can only connect to adjacent squares. But suppose
instead, that any room could be connected to any other room in the maze
by teleportation. Enough preamble -- try to build a generalized version
of this model in which codeturtles are traveling along the edges of a
network, trying to find a goal node, instead of a goal square.
The NetLogo feature on which this whole model stands is the ability
to take a string of text, and run it as NetLogo code. This is achieved
through the "run" primitive.
Extensive use of recursion and lists has been employed, especially to
deal with the tree structures which codeturtles use to store code.
Since trees are not natively supported in NetLogo, they have been
implemented as nested lists.
It is also interesting to note that this model is built from two parts.
The first part is the "GP Library for NetLogo", which consists of a
framework of procedures that are useful for any model that is using
genetic programming. The second part consists of procedures that are
specific to this model. Since NetLogo doesn't support any formal
concept of code libraries, this separation is largely achieved through
positioning of the code, naming conventions, and comments.
"Genetic Programming Maze Rules" - The brother model to this one.
"Genetic Programming Maze Maker" - A tool for loading/saving maze files
this model uses.
"Genetic Programming Demo" - A simple model demonstrating how to use
the GP Library.
Start here if you want to build your own genetic programming
model.
There are also several models out there that work with Genetic
Algorithms, which are closely related to Genetic Programming. See:
"Echo" under Biology
"ARS-Genetics" and "BinaryGA" by Thomas Hills, in the User Community
Models.
Author: Forrest Sondahl
Date: November 28, 2005
Project Web Page: http://cs.northwestern.edu/~fjs750/netlogo/final/
Part of the Genetic Programming Library for NetLogo project, which
consists of a library of code that makes it easier to write genetic
programming models, as well as several sample models that demonstrate
the use of the library.
Created for the course CS 460 Multi-Agent Modeling, at Northwestern
University.
Back to Forrest's Final Project Main Page.