Lithos is a stack based evolutionary computation system. Unlike most EC systems, its representation language is computationally complete, while also being faster and more compact than the S-expressions used in genetic programming. The version presented here applies the system to the game of Go, but can be changed to other problems by simply plugging in a different evaluation function. Source code and Windows executable are provided. Let me know by email if you have any queries or interesting results or modifications.
This software is in the public domain.
THIS SOFTWARE IS PROVIDED STRICTLY AS IS. THE AUTHOR MAKES NO REPRESENTATION OR WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. UNDER NO CIRCUMSTANCES SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT OR INDIRECT EXPENSES OR DAMAGES AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THIS SOFTWARE.
Like other evolutionary computation systems, Lithos evolves a population of individuals. In each generation, every individual is tested for fitness at a task, then the less fit ones are replaced by offspring of the more fit ones, using crossover and mutation to achieve variety.
In this case, the individuals are programs, each represented as a sequence of instructions. In testing fitness, each program is run on some input and the output is evaluated according to some measure of quality. In the version presented here, the input is the current state of the board in the game of Go, and the output is interpreted as a decision about the next move to make.
The system is controlled by the following parameters:
Population is the number of programs. This doesn't change over time; offspring of more fit programs replace less fit ones.
Max Size is the maximum allowed size of a program. (At the start of a run, the entire population is initialized to empty programs. Crossover and mutation may produce longer or shorter programs as generations go by, up to Max Size instructions.)
Max Memory is the number of words of memory available to each program at run time. (This does not include the memory to store the program code.) Must be at least large enough to hold the program's input data; for the game of Go, this means at least 383 words.
Max Time is the number of instructions can execute during its "thinking time" in each move of the game (or step of the task or whatever). If this is exceeded, the program times out and whatever it has left on the stack at that point is taken as its output.
Max Moves is the maximum number of moves the game is allowed to continue for. (In practice this will rarely be reached.)
Overselection Rate is the number of individuals in each generation guaranteed to be transmitted unchanged to the next. With the default value of 2, the best 2 individuals are copied to the next generation before tournament selection takes place for the rest of the population slots.
Tournament Size is the number of individuals involved in each selection tournament.
Crossover Rate and Mutation Rate control how often these two operators will be applied when producing offspring programs for the next generation. The default values of 50 and 50 mean they are equally likely to be applied; 90 and 10, for example, would apply crossover 90% of the time and mutation only 10%. At least one of these values must be nonzero.
Autosave Frequency controls how often the current population is automatically saved to the data
file in case of power failure or other external interruption. The default value of 100 saves every 100 generations. A value of 1 would save every generation, 0 would turn off autosave altogether.
Log Frequency controls how often entries are written to the log
file. The default value of 1 writes a log entry every generation. A value of 10 would log every 10 generations (suitable for runs with a small population for very many generations), 0 would turn off logging.
The parameter values are read from a file called params
if it is present, otherwise the following defaults are used:
[Population] 100 [Max Size] 1000 [Max Memory] 1000 [Max Time] 10000 [Max Moves] 1000 [Overselection Rate] 2 [Tournament Size] 4 [Crossover Rate] 50 [Mutation Rate] 50 [Autosave Frequency] 100 [Log Frequency] 1
This is the format used, so creating a params
file and inserting the above into it will keep the defaults; individual entries can then be changed. Note that if the file is present at all, it must have all the entries in this exact order.
When all the individuals in a generation have been evaluated for fitness, the program chooses some of them to be transmitted to the next generation. Taking the default values of Population = 100, Overselection Rate = 2 and Tournament Size = 4, the procedure is as follows:
To fill a slot:
To choose a parent by tournament selection:
To mutate an individual:
To cross over two individuals:
To choose a break point:
When all the slots in the population have been filled, the process of evaluating all the individuals for fitness begins again.
The log
file has columns for the following values:
Generation: The current generation number.
Complexity: The size of the most fit individual.
Diversity: The number of "subspecies" in the population, where two programs belong to the same subspecies if their first and last instructions are equal. (Zero length programs are ignored in this count, thus the maximum possible value is 30 * 30 = 900. A crude measure, but simple to define and cheap to compute.)
Score: A measure of the performance of the most fit individual at the task. For games, it's the number of points scored by that individual when playing against the second most fit one.
The columns are tab separated so the file can easily be loaded into a spreadsheet for plotting graphs.
Lithos uses a stack based virtual machine. A program within it consists of a sequence of instructions, normally executed one after the other; most instructions take their inputs from the stack and push their outputs onto it.
The virtual memory is an array of words. Taking the default value of Max Memory = 1000, memory addresses go from 0 to 999. Inputs are placed at the bottom of memory - for the game of Go, words 0 to 382 are initialized to the current state of the game, with the rest being zero.
Each word contains an integer, 32 bits on typical hardware. (Or in general, whatever the C compiler decides an int
is. Note that Population * Max Size cannot exceed the range of an int
; in practice, this means that the total size of all the VM programs cannot exceed 2 gigabytes, which is a restriction that 32 bit machines would impose anyway.) Thus, VM programs can directly handle, on typical systems, numbers in the range of approximately -2 billion to +2 billion.
The stack grows from the top down and is initially empty. It wraps around on overflow or underflow. So if the first instruction is CONST1
, the value 1 will be placed in location 999. If the first instruction is ADD
, the stack will underflow so the input words in locations 0 and 1 will be added together and the result will be placed in location 1. (Evolved programs quite often exploit this.)
Stack words are left unchanged unless explicitly overwritten, so in the above ADD
example, location 0 will retain its original contents.
When a program terminates (either by executing its last instruction or by staying in a loop and timing out after Max Time instructions), whatever it has left on top of the stack is taken as its output. (Because of wraparound, this means that a null program will effectively echo its input - the first words of the input will be "on top" of the stack.)
While a single memory space is used for input, data stack and subroutine return addresses, it is not used for program code. This is in a separate memory area, and programs have no way to directly access their own code.
There are 30 instructions, each consisting of an opcode only (no immediate operands). In the table below, the inputs are taken from the stack, pushed in the order given (i.e. the last operand is on top of the stack) and the outputs are pushed onto the stack.
Instruction | Inputs | Outputs | Description |
---|---|---|---|
Numbers | |||
CONST0 |
0 |
Constant 0 | |
CONST1 |
1 |
Constant 1 | |
RANDOM |
random(0..1) |
Random number, either 0 or 1 | |
Arithmetic | |||
INC |
x |
x + 1 |
Increment |
DEC |
x |
x - 1 |
Decrement |
ADD |
x, y |
x + y |
Addition |
SUB |
x, y |
x - y |
Subtraction |
MUL |
x, y |
x * y |
Multiplication |
DIV |
x, y |
x / y |
Division |
Comparison | |||
EQ |
x, y |
x == y |
Equal |
NE |
x, y |
x != y |
Not equal |
LT |
x, y |
x < y |
Less than |
GT |
x, y |
x > y |
Greater than |
LE |
x, y |
x <= y |
Less than or equal |
GE |
x, y |
x >= y |
Greater than or equal |
Logic | |||
AND |
x, y |
x and y |
And |
OR |
x, y |
x or y |
Or |
NOT |
x |
not x |
Not |
Control flow | |||
LABEL0 |
Label 0 | ||
LABEL1 |
Label 1 | ||
JMP |
Unconditional jump | ||
JT |
x |
Jump if true | |
JF |
x |
Jump if false | |
CALL |
addr |
Call subroutine | |
RET |
addr |
Return from subroutine | |
Stack manipulation | |||
DUP |
x |
x, x |
Duplicate topmost item |
SWAP |
x, y |
y, x |
Swap topmost items |
POP |
x |
Discard topmost item | |
Memory access | |||
LOAD |
addr |
mem[addr] |
Load a word from memory |
STORE |
addr, x |
Store a word to memory |
The generator used by the RANDOM
instruction is the same as that used by the mutation and crossover operators, and is seeded from the system clock when the program starts up.
Division by zero, or dividing INT_MIN
by -1, is treated as division by 1. Other arithmetic operations wrap around on overflow as normal for machine arithmetic.
The conditional branch instructions take 0 as false and any nonzero value as true. The comparison operators return 0 for false and -1 for true. This is because the logical operators work on each bit of their inputs (they are like the &
, |
and ~
operators in C). So not(1 == 1)
reduces to not(-1)
which reduces to 0, the correct result.
Label instructions perform no operation when executed. Attempts to jump to a null or nonexistent label also perform no operation, except that inputs if any are still consumed, and a CALL
instruction still pushes the return address on the stack.
LOAD
s and STORE
s outside the bounds of memory perform no operation, except that inputs are still consumed.
The idea for the control flow mechanism comes from Thomas Ray's Tierra system. In most real microprocessors, branches have an immediate offset operand which is added to a base address (the code segment or current instruction pointer) to give the absolute address to jump to. This is unlikely to work well in evolutionary computation because it is too brittle. A single insertion or deletion would change the address of every instruction after that point, thereby typically changing the meaning of the entire program.
Instead, a template matching system is used. Consider the following code fragment:
JMP LABEL0 LABEL1 LABEL0 LABEL1
On encountering the JMP
, the interpreter looks at the sequence of label instructions after it - in this case 0101. It then scans back and forth through the code looking for occurrences of the same sequence of labels, in other words for the fragment
LABEL0 LABEL1 LABEL0 LABEL1
not immediately preceded or followed by any other labels. The nearest such found (measured by distance of the starting label from the jump instruction; in case of a tie, backward jumps are preferred) is taken as the destination, and control is transferred to the instruction immediately following that label sequence.
(In practice, for efficiency reasons the interpreter performs the scans once up front rather than every time a jump instruction is encountered.)
A subroutine calling mechanism is also provided, where the CALL
instruction finds its target by the template system and pushes the return address onto the stack (in the form of an integer from 1 to Max Size, being the address of the instruction following the CALL
). The RET
instruction takes the return address from the stack (irrespective of whether it was put there by a CALL
). Programs evolved thus far have rarely used this mechanism for its envisioned purpose, but somewhat to the author's surprise, have often used CALL
to obtain constant numbers and RET
as an indirect jump.
Two versions of the factorial function are shown below as examples. (These were hand coded, not evolved. Comments are included in square brackets.) The initial instruction sequence pushes the number 5 on the stack; when the routine finishes, this has been replaced with 120.
[n = 5] CONST0 INC INC INC INC INC [factorial(n)] CALL LABEL0 [goto end] JMP LABEL0 LABEL0 [separator between labels] NOT [factorial:] LABEL0 SWAP [if n <= 1] DUP CONST1 LE [goto done] JT LABEL1 [n * factorial(n - 1)] DUP DEC CALL LABEL0 MUL [return] SWAP RET [done:] LABEL1 [1] POP CONST1 [return] SWAP RET [end:] LABEL0 LABEL0
[n = 5] CONST0 INC INC INC INC INC [result = 1] CONST0 CONST1 STORE [loop:] LABEL0 [if n <= 1] DUP CONST1 LE [goto done] JT LABEL1 [result *= n] DUP CONST0 LOAD MUL CONST0 SWAP STORE [n--] DEC [goto loop] JMP LABEL0 [separator between labels] NOT [done:] LABEL1 [return result] CONST0 LOAD
The current version of Lithos evolves programs to play the game of Go. It should be emphasized that the objective of this is not to produce something that can defeat a skilled human player - evolved programs under current hardware play very poorly even compared to hand-written programs. (Unsurprisingly, considering that they consist of a few hundred to a few thousand bytes of code, versus tens to hundreds of kilobytes for hand-written programs.) The objective is to experiment with artificial evolution on a problem that - unlike most of those studied, but like biological evolution - is effectively open-ended.
The following points should be noted concerning the rules as applied here:
In each generation, every program plays two games against every other, one time taking black and the other time taking white. Thus, programs co-evolve in an environment of other evolving programs; no preexisting hand-written opponent is required.
(Note that this means computation time per generation varies as the square of Population. This algorithm is used anyway because much of the advantage of having a large population in this context is a more accurate fitness function: each individual's fitness is better assessed in play against a more statistically significant number of different opponents.)
The procedure for playing two programs against each other is as follows:
After all games have been played, if two programs have scored the same number of wins but are of different sizes then size is used as a tie breaker; the shorter program has higher fitness. (Even a single extra win counts for more than any advantage in compactness, though.)
To allow a program to make a move:
(As it happens, a null program will place three stones in the top left corner of the board by effectively echoing its input. Against another null program, this claims the entire board as territory in the absence of any opposing stones. The Score column of the log
file typically shows scores of 381 points in the first few generations, then dropping to very low levels as wins start being contested, and slowly climbing as skill improves.)
To give visual feedback, the evaluation function displays each generation the game between the most fit (taking black) and second most fit (taking white) individuals of the previous generation. This is shown as two boards side by side, the first being the final state of the board (.
= empty square, *
= black stone, O
= white stone) and the second being the resulting territory claims.
The code for all of this is in the eval.c
file. To apply Lithos to other tasks, simply replace the contents of this file with an appropriate evaluation function.
The code was developed under Microsoft C++ version 4.2, but is written in vanilla ANSI C so should work substantially unmodified on any compiler. The included makefile is for GNU Make.
To use the program, simply download it into any directory and run it. The data
and log
files will be placed in the current directory. (The params
file should also be placed there if you want to change the parameters.)
If an existing data
file is found, the program will load it and continue from where it left off, otherwise it will start with a fresh population. Therefore if you want to end the current run and begin a new one, simply move, rename or delete the data
file before restarting the program. When doing this you should do the same with the log
file so that the program will create a fresh one.
To pause the program, press the space bar; press again to restart. Any other key will save the current population to the data
file and exit. (With larger populations, there may be a few seconds delay while the program finishes the current evaluation.)