Genetic
Programming Maze Rules
(By
Forrest
Sondahl, 2005)
The applet requires Java 1.4.1 or higher. It will
not run on Windows 95 or Mac OS 8 or 9. Mac users must have OS X
10.2.6 or higher and use a browser that supports Java 1.4. (Safari
works, IE does not. Mac OS X comes with Safari. Open Safari and set
it as your default web browser under Safari/Preferences/General.) On
other operating systems, you may obtain the latest Java plugin from
Sun's
Java site.
Created with NetLogo
* View/download model file:
Genetic
Programming Maze Rules.nlogo
View Analysis for this
Model * Back to Forrest's Final Project
Main Page.
WHAT IS IT?
This model demonstrates the use of genetic programming to
evolve
movement rules that agents can use to solve mazes. Genetic programming
(sometimes abbreviated GP) is a technique in Computer Science that uses
the concepts of natural selection, genetics, and evolution to generate
computer programs to solve a particular problem. In this case, the
problem is navigating mazes. (This model is similar to Genetic
Programming Maze Marchers, but instead of evolving the steps an agent
should take to solve a particular maze, this model evolve movement
rules that are applicable to solving mazes in general.)
WHAT IS GENETIC PROGRAMMING?
In the world of biology, species evolve by means of natural selection
and the interactions of DNA. Genetic Programming takes these ideas, and
applies them in the field of computer science. Given a problem, the
goal is to evolve a computer program that can solve the problem. It
starts with a population of randomly generated computer programs. (The
ingredients of these randomly generated computer programs are chosen
based on the problem that is to be solved.) Each program is run, and
its performance is measured by a "fitness function", which reflects how
good each program is at solving the problem. Then programs are chosen
from the population to "reproduce". The programs are chosen randomly,
but with a weighting mechanism that makes it more likely that the more
"fit" programs are chosen. (Analogously in the biological world,
defective organisms can get lucky and pass on their DNA, but the more
fit organisms have a better chance of doing so.) There are three forms
of reproduction that occur:
- Cloning (the child program is identical to its parent)
- Mutation (the child program has some of its code replaced by
randomly
generated code)
- Crossover (two parent programs are chosen from the population,
and
the child program consists of a mixture of code from the two parents)
After a full new population of programs is formed, the old population
is discarded, and each of the new programs is run and their fitness is
measured. Reproduction occurs, and the cycle continues, until a program
with a certain level of fitness is found -- namely, a program that is
good enough to solve the problem given. The word "until" implies that
such a state will be reached. This isn't necessarily true. For one
thing, the problem posed might be an impossible one (e.g. "What is the
answer to life the universe and everything?"). Or it could be solvable,
but not with the ingredients that the programs are made from (e.g.
"What is the solution to a quadratic equation?" with programs made only
of "+" and "-" operators.) But even if the solution is within the realm
of possible programs generated by the genetic programming process,
success is by no means guaranteed. Genetic Programming is a stochastic,
rather than deterministic, approach to solving problems. Currently
there are, in fact, no proofs that genetic programming works -- merely
empirical evidence showing that in some situations it does. The success
of the genetic programming process is highly dependent on the choice of
ingredients for the programs and a well designed "fitness function"
that "leads" the population in the right direction toward the goal.
Other important parameters include the size of the population, the
length of each program's code, and the relative probability of each of
the reproduction mechanisms. For more information on genetic
programming, see these web sites:
http://www.geneticprogramming.com/Tutorial/index.html
http://www.genetic-programming.com/gpanimatedtutorial.html
HOW IT WORKS
In many NetLogo models, agents are given predetermined rules, and then
the emergent behaviors that form through the interactions of these
agents are studied. In contrast, this model is starts with a desired
behavior (solving a maze) and works on trying to discover the agent's
rules, through use of Genetic Programming, as described in the section
above.
In this model, the maze-solving programs are represented by
"codeturtles". Codeturtles each have a piece of NetLogo code assigned
to them, and it's their job to perform it. Codeturtles will then be
chosen, based on their fitness, to reproduce and create another
generation of codeturtles.
The ingredients from which the code is built are fairly simple:
Four commands:
- maze-turn-right
(think "rt 90")
- maze-turn-left
(think "lt 90")
- ifelse control
structure
- " " blank command (does nothing)
Three reporters:
- maze-wall-ahead? (Is
there a wall in front of me?)
- maze-wall-right? (Is
there a wall to my right?)
- maze-wall-left? (Is
there a wall to my left?)
Thus, a small example program might look like:
maze-turn-right
ifelse maze-wall-ahead? [
maze-turn-left
] [
maze-turn-right
ifelse maze-wall-right? [
maze-turn-left
maze-turn-left
] [
maze-turn-left
]
maze-turn-right
]
|
(The internal representation of the program is actually a tree
structure, since this has been often found to produce better results
for genetic programming. The code trees are then compiled into the form
you see above, to be run as NetLogo code.)
You may be wondering about the blank command. Since it does nothing,
what purpose could it possibly have in the program ingredients?
Basically, it provides a placeholder -- for instance, you can have an
"if" without an "else", simply by having the else block be made up of
blank commands.
You may also be wondering how it is the codeturtles move, since they
have no "forward" command. This is because each codeturtle's program is
just the rules to decide where to go in each given step. Codeturtles
have a lifespan of 480 steps (enough to get them to the goal in most
mazes, if they have a decent movement strategy). During each step, the
codeturtles execute their code, and then move forward one square in the
direction they are pointing (unless a wall blocks their path).
The fitness function, which measures each codeturtles progress, is a
simple one: "What is the geometric distance to the goal square?" The
lower this number is, the more fit a turtle is. It is easy, of course,
to create mazes where this is clearly not the case (e.g. load
"maze4.txt") but for many mazes, this distance measurement serves as a
decent heuristic. A better (in fact, perfect) fitness function would
count the minimum number of open path squares that the codeturtle would
have to cross to reach the goal. However, if we had such a fitness
function, then we would already have some algorithm for computing the
solution to our maze! And if we had such an algorithm, then why would
we be trying to evolve codeturtles to solve it? Using the solution to
find the solution seems like a cheap trick that turns this model
entirely into a toy problem. Also, fitness is computed for each
codeturtle after each step it takes -- not just at the end of the run.
Since fitness is calculated so often, the efficiency of computing
fitness is important, and this is another advantage for geometric
distance fitness, as opposed to true walking distance to goal fitness.
HOW TO USE IT
1. Set the MAZE-FILE chooser to the maze file you want to use.
2. Adjust the slider parameters (see below), or use the default
settings.
3. Press the SETUP button.
4A. Press the GO button to begin the simulation
4B. Press the STEP button to go through one generation at a time.
5. Watch the View, to see the codeturtles attempt to solve the maze
with their given code DNA.
6. Watch the FITNESS plot, to see how the population is doing.
7. If a codeturtle successfully gets to the end of the maze, then the
simulation will stop. To stop it earlier, press the GO button again.
(It may take some time for the current generation to finish.)
8. Press the REPLAY-STEP button to watch the last generation run
through the maze again.
9. Press the SHOW-BEST button to see the code for the current best
(most fit) codeturtle.
Parameters:
MAZE-FILE: The maze file to be loaded.
POPULATION-SIZE: The number of codeturtles in each generation
INITIAL-CODE-MAX-DEPTH: The maximum depth of randomly generated code
trees, that codeturtles start with. (It also affects the size of
mutations that occur). In general, a larger number generally means
longer programs are created.
BRANCH-CHANCE: Controls the amount of branching in the generation of
random code trees. Again, a larger number generally means longer
programs are created.
CLONE-CHANCE, MUTATE-CHANCE, CROSSOVER-CHANCE: These three sliders
control the relative probability of each genetic operation occurring,
with respect to the others.
(Examples: If the sliders are set to 10, 10, 10, then there is a 1/3
chance of each genetic operation happening. If the sliders are set to
10, 10, 20, then there is a 25% chance of cloning, 25% chance of
mutation, and 50% chance of crossover. If the sum of these three
sliders is 100, then each slider represents the percent chance of that
genetic operation being chosen to create an offspring for the new
generation.)
FIX-RANDOM-SEED?: If true, then RANDOMSEED is used to start the
process. This allows a particular run to be reproduced exactly, and
thus examined more closely, (provided that the parameters are the
same). If false, then RANDOMSEED is not used.
RANDOMSEED: This is the number used to seed the random number
generator, if FIX-RANDOM-SEED? is true, to allow for reproducible
results.
THINGS TO NOTICE
For humans, some mazes are easier to solve than others. Likewise, some
mazes are easier for this model to solve than others. Which of the five
included mazes are easiest for this model, and which are hardest? Why
might maze0.txt be easier than maze3.txt? Think about the fitness
function, as well as other factors.
The average and best fitness values shown in the plot sometimes go up
and sometimes go down. Why do you think this is? Does genetic
programming always find a solution?
Usually the best fitness value makes a sudden jump down to the solution
at the end. Why is this? Why aren't there codeturtles that get within 2
or 3 squares of the goal, but don't actually make it all the way?
Occasionally, a codeturtle that was in a very early generation, maybe
even Generation 0, finds the solution. What do you think this says
about the difficulty of the problem? Do you think that genetic
programming is a good choice for solving this problem?
You may notice that during a generation, after a certain number of
steps, many of the codeturtles have turned to sad faces, while the
turtles that are still moving will speed up. There is a reason for
this. Because the genetic programming process runs quite slowly,
especially with large populations, some heuristics are applied in this
model to stop codeturtles that are looking hopeless. For instance, if a
codeturtle doesn't move or change its heading in a given step, then it
is stuck, (because it will do the same thing next turn) and we do not
need to keep running it. Weeding out bad turtles helps speed up the
model.
The colors of the codeturtles have some meaning. They are initially
randomly colored, but:
* When a codeturtle results from cloning, it has the same color as its
parent.
* When a codeturtle results from crossover, it has a color averaging
its two parents.
* When a codeturtle is mutated, it has a random color.
THINGS TO TRY
Try changing the POPULATION-SIZE slider. With a very small population,
each generation moves by much more quickly, but it generally takes more
generations to find a solution. Also, small populations mean increased
inbreeding. What affect does this have on the process? How low can the
population go such that the process still works?
Try changing the INITIAL-CODE-MAX-DEPTH and the BRANCH-CHANCE. Note
that if INITIAL-CODE-MAX-DEPTH <= 3, then IFELSE statements can't
form, meaning that the codeturtles are doomed. Note also that if the
codeturtles' code gets long, then the codeturtles run very slowly.
Crossover is usually the driving force of genetic programming. Try
moving the genetic operations sliders around, and run the model. What
happens if you only cloning, and no mutation or crossover? Only
mutation? Only crossover?
EXTENDING THE MODEL
There is a model called Genetic Programming Maze Maker, which can be
used to create maze files. Create several interesting maze files, and
add them to the MAZE-FILE chooser. What types of mazes does the model
do well with? What types of mazes are hard for the model to handle?
What types are impossible?
Sometimes over the generations, the code trees expand and get very
large. (In crossover, a new codeturtle can be made up of the larger
part of its two parents, and basically double in size. In mutation, a
single node can be replaced by a medium-sized subtree.) If one of these
large-tree codeturtles is highly fit, the largeness can quickly spread
to most of the population. This can result in some incredibly slow
performance for the model when this happens. Thus, a nice extension to
the GP Library would be to have a "maximum-tree-size" parameter, and
when trees that are too large get created, they should be trimmed off
somehow.
Right now, these turtles have no memory of where they've been. They can
only decide which direction to move based on which squares around them
have adjacent walls. This really only gives them one solution they can
find -- the well known "right hand rule" (or the "left hand rule", of
course). (It is worth noting that this rule only works on a certain
class of mazes.) In any case, it would be interesting to give the
codeturtles more information to base their decisions on. Consider two
reporters called "maze-already-traveled-ahead?" and
"maze-already-traveled-here" which report true if the turtle has
already traveled on the square they see in front of them, or the square
they are currently on. Would these additions be useful? Would it be
possible to give the turtles primitives such that they could learn to
do a depth first search? Or come up with your own codeturtle
primitives, and see whether they help or hurt the efficiency of finding
a solution.
NETLOGO FEATURES
The NetLogo feature on which this whole model stands is the ability to
take a string of text, and run it as NetLogo code. This is achieved
through the "run" primitive.
Extensive use of recursion and lists has been employed, especially to
deal with the tree structures which codeturtles use to store code.
Since trees are not natively supported in NetLogo, they have been
implemented as nested lists.
It is also interesting to note that this model is built from two parts.
The first part is the "GP Library for NetLogo", which consists of a
framework of procedures that are useful for any model that is using
genetic programming. The second part consists of procedures that are
specific to this model. Since NetLogo doesn't support any formal
concept of code libraries, this separation is largely achieved through
positioning of the code, naming conventions, and comments.
RELATED MODELS
"Genetic Programming Maze Marchers" - The brother model to this one.
"Genetic Programming Maze Maker" - A tool for loading/saving maze files
this model uses.
"Genetic Programming Demo" - A simple model demonstrating how to use
the GP Library.
Start here if you want to build your own genetic programming
model.
There are several models out there that work with Genetic Algorithms,
which are closely related to Genetic Programming. See:
"Echo" under Biology
"ARS-Genetics" and "BinaryGA" by Thomas Hills, in the User Community
Models.
CREDITS AND REFERENCES
Author: Forrest Sondahl
Date: November 28, 2005
Project Web Page: http://cs.northwestern.edu/~fjs750/netlogo/final/
Part of the Genetic Programming Library for NetLogo project, which
consists of a library of code that makes it easier to write genetic
programming models, as well as several sample models that demonstrate
the use of the library.
Created for the course CS 460 Multi-Agent Modeling, at Northwestern
University.
Back to Forrest's Final Project Page.