Lithos

Lithos is a stack based evolutionary computation system. Unlike most EC systems, its representation language is computationally complete, while also being faster and more compact than the S-expressions used in genetic programming. The version presented here applies the system to the game of Go, but can be changed to other problems by simply plugging in a different evaluation function. Source code and Windows executable are provided. Let me know by email if you have any queries or interesting results or modifications.

This software is in the public domain.

THIS SOFTWARE IS PROVIDED STRICTLY AS IS. THE AUTHOR MAKES NO REPRESENTATION OR WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. UNDER NO CIRCUMSTANCES SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT OR INDIRECT EXPENSES OR DAMAGES AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THIS SOFTWARE.

Contents

The genetic algorithm

Like other evolutionary computation systems, Lithos evolves a population of individuals. In each generation, every individual is tested for fitness at a task, then the less fit ones are replaced by offspring of the more fit ones, using crossover and mutation to achieve variety.

In this case, the individuals are programs, each represented as a sequence of instructions. In testing fitness, each program is run on some input and the output is evaluated according to some measure of quality. In the version presented here, the input is the current state of the board in the game of Go, and the output is interpreted as a decision about the next move to make.

Parameters

The system is controlled by the following parameters:

Population is the number of programs. This doesn't change over time; offspring of more fit programs replace less fit ones.

Max Size is the maximum allowed size of a program. (At the start of a run, the entire population is initialized to empty programs. Crossover and mutation may produce longer or shorter programs as generations go by, up to Max Size instructions.)

Max Memory is the number of words of memory available to each program at run time. (This does not include the memory to store the program code.) Must be at least large enough to hold the program's input data; for the game of Go, this means at least 383 words.

Max Time is the number of instructions can execute during its "thinking time" in each move of the game (or step of the task or whatever). If this is exceeded, the program times out and whatever it has left on the stack at that point is taken as its output.

Max Moves is the maximum number of moves the game is allowed to continue for. (In practice this will rarely be reached.)

Overselection Rate is the number of individuals in each generation guaranteed to be transmitted unchanged to the next. With the default value of 2, the best 2 individuals are copied to the next generation before tournament selection takes place for the rest of the population slots.

Tournament Size is the number of individuals involved in each selection tournament.

Crossover Rate and Mutation Rate control how often these two operators will be applied when producing offspring programs for the next generation. The default values of 50 and 50 mean they are equally likely to be applied; 90 and 10, for example, would apply crossover 90% of the time and mutation only 10%. At least one of these values must be nonzero.

Autosave Frequency controls how often the current population is automatically saved to the data file in case of power failure or other external interruption. The default value of 100 saves every 100 generations. A value of 1 would save every generation, 0 would turn off autosave altogether.

Log Frequency controls how often entries are written to the log file. The default value of 1 writes a log entry every generation. A value of 10 would log every 10 generations (suitable for runs with a small population for very many generations), 0 would turn off logging.

The parameter values are read from a file called params if it is present, otherwise the following defaults are used:

[Population]            100
[Max Size]              1000
[Max Memory]            1000
[Max Time]              10000
[Max Moves]             1000
[Overselection Rate]    2
[Tournament Size]       4
[Crossover Rate]        50
[Mutation Rate]         50
[Autosave Frequency]    100
[Log Frequency]         1

This is the format used, so creating a params file and inserting the above into it will keep the defaults; individual entries can then be changed. Note that if the file is present at all, it must have all the entries in this exact order.

The next generation

When all the individuals in a generation have been evaluated for fitness, the program chooses some of them to be transmitted to the next generation. Taking the default values of Population = 100, Overselection Rate = 2 and Tournament Size = 4, the procedure is as follows:

  1. Fill the first 2 slots of the next generation with the 2 best individuals of this one.
  2. Fill each of the other 98 slots with an offspring of one or two current individuals.

To fill a slot:

  1. Choose a parent by tournament selection.
  2. Decide whether to use crossover or mutation according to the Crossover Rate and Mutation Rate parameters.
  3. If mutation, fill the slot with a mutated version of the parent.
  4. If crossover, choose another parent by tournament selection, and fill the slot with an offspring based on a combination of the parents' codes.

To choose a parent by tournament selection:

  1. Choose 4 individuals at random.
  2. Select the most fit of those 4.

To mutate an individual:

  1. Choose at random whether to perform insertion, substitution or deletion (equal probabilities, except if the individual is of zero length then only insertion can be performed; if of length equal to Max Size then insertion cannot be performed).
  2. If insertion, choose an insertion point (between any two adjacent instructions, or at the start or end, equal probability for each point) and insert a random instruction there.
  3. If substitution, choose an instruction and change it at random. (1 in 30 chance of changing it to the same instruction, i.e. leaving it unchanged.)
  4. If deletion, choose an instruction and delete it.
  5. Flip a coin; if it comes up heads, go back to step 1. (So usually only one or two instructions will be mutated, but in theory any number could be.)

To cross over two individuals:

  1. Choose a break point in each individual.
  2. Copy the first individual's code up to its break point, then copy the second individual's code from its break point on.

To choose a break point:

  1. The start and end are valid break points.
  2. Every point between a label and a nonlabel instruction is also valid. (Labels are explained under Control flow; the theory is that code should get spliced at reasonable breaks between functional units. Informal eyeballing of debugging output suggests this works fairly often.)
  3. Choose a valid break point at random.

When all the slots in the population have been filled, the process of evaluating all the individuals for fitness begins again.

The log file

The log file has columns for the following values:

Generation: The current generation number.

Complexity: The size of the most fit individual.

Diversity: The number of "subspecies" in the population, where two programs belong to the same subspecies if their first and last instructions are equal. (Zero length programs are ignored in this count, thus the maximum possible value is 30 * 30 = 900. A crude measure, but simple to define and cheap to compute.)

Score: A measure of the performance of the most fit individual at the task. For games, it's the number of points scored by that individual when playing against the second most fit one.

The columns are tab separated so the file can easily be loaded into a spreadsheet for plotting graphs.

The virtual machine

Lithos uses a stack based virtual machine. A program within it consists of a sequence of instructions, normally executed one after the other; most instructions take their inputs from the stack and push their outputs onto it.

The virtual memory is an array of words. Taking the default value of Max Memory = 1000, memory addresses go from 0 to 999. Inputs are placed at the bottom of memory - for the game of Go, words 0 to 382 are initialized to the current state of the game, with the rest being zero.

Each word contains an integer, 32 bits on typical hardware. (Or in general, whatever the C compiler decides an int is. Note that Population * Max Size cannot exceed the range of an int; in practice, this means that the total size of all the VM programs cannot exceed 2 gigabytes, which is a restriction that 32 bit machines would impose anyway.) Thus, VM programs can directly handle, on typical systems, numbers in the range of approximately -2 billion to +2 billion.

The stack grows from the top down and is initially empty. It wraps around on overflow or underflow. So if the first instruction is CONST1, the value 1 will be placed in location 999. If the first instruction is ADD, the stack will underflow so the input words in locations 0 and 1 will be added together and the result will be placed in location 1. (Evolved programs quite often exploit this.)

Stack words are left unchanged unless explicitly overwritten, so in the above ADD example, location 0 will retain its original contents.

When a program terminates (either by executing its last instruction or by staying in a loop and timing out after Max Time instructions), whatever it has left on top of the stack is taken as its output. (Because of wraparound, this means that a null program will effectively echo its input - the first words of the input will be "on top" of the stack.)

While a single memory space is used for input, data stack and subroutine return addresses, it is not used for program code. This is in a separate memory area, and programs have no way to directly access their own code.

The instruction set

There are 30 instructions, each consisting of an opcode only (no immediate operands). In the table below, the inputs are taken from the stack, pushed in the order given (i.e. the last operand is on top of the stack) and the outputs are pushed onto the stack.

Instruction Inputs Outputs Description
Numbers
CONST0 0 Constant 0
CONST1 1 Constant 1
RANDOM random(0..1) Random number, either 0 or 1
Arithmetic
INC x x + 1 Increment
DEC x x - 1 Decrement
ADD x, y x + y Addition
SUB x, y x - y Subtraction
MUL x, y x * y Multiplication
DIV x, y x / y Division
Comparison
EQ x, y x == y Equal
NE x, y x != y Not equal
LT x, y x < y Less than
GT x, y x > y Greater than
LE x, y x <= y Less than or equal
GE x, y x >= y Greater than or equal
Logic
AND x, y x and y And
OR x, y x or y Or
NOT x not x Not
Control flow
LABEL0 Label 0
LABEL1 Label 1
JMP Unconditional jump
JT x Jump if true
JF x Jump if false
CALL addr Call subroutine
RET addr Return from subroutine
Stack manipulation
DUP x x, x Duplicate topmost item
SWAP x, y y, x Swap topmost items
POP x Discard topmost item
Memory access
LOAD addr mem[addr] Load a word from memory
STORE addr, x Store a word to memory

The generator used by the RANDOM instruction is the same as that used by the mutation and crossover operators, and is seeded from the system clock when the program starts up.

Division by zero, or dividing INT_MIN by -1, is treated as division by 1. Other arithmetic operations wrap around on overflow as normal for machine arithmetic.

The conditional branch instructions take 0 as false and any nonzero value as true. The comparison operators return 0 for false and -1 for true. This is because the logical operators work on each bit of their inputs (they are like the &, | and ~ operators in C). So not(1 == 1) reduces to not(-1) which reduces to 0, the correct result.

Label instructions perform no operation when executed. Attempts to jump to a null or nonexistent label also perform no operation, except that inputs if any are still consumed, and a CALL instruction still pushes the return address on the stack.

LOADs and STOREs outside the bounds of memory perform no operation, except that inputs are still consumed.

Control flow

The idea for the control flow mechanism comes from Thomas Ray's Tierra system. In most real microprocessors, branches have an immediate offset operand which is added to a base address (the code segment or current instruction pointer) to give the absolute address to jump to. This is unlikely to work well in evolutionary computation because it is too brittle. A single insertion or deletion would change the address of every instruction after that point, thereby typically changing the meaning of the entire program.

Instead, a template matching system is used. Consider the following code fragment:

	JMP
	LABEL0
	LABEL1
	LABEL0
	LABEL1

On encountering the JMP, the interpreter looks at the sequence of label instructions after it - in this case 0101. It then scans back and forth through the code looking for occurrences of the same sequence of labels, in other words for the fragment

	LABEL0
	LABEL1
	LABEL0
	LABEL1

not immediately preceded or followed by any other labels. The nearest such found (measured by distance of the starting label from the jump instruction; in case of a tie, backward jumps are preferred) is taken as the destination, and control is transferred to the instruction immediately following that label sequence.

(In practice, for efficiency reasons the interpreter performs the scans once up front rather than every time a jump instruction is encountered.)

A subroutine calling mechanism is also provided, where the CALL instruction finds its target by the template system and pushes the return address onto the stack (in the form of an integer from 1 to Max Size, being the address of the instruction following the CALL). The RET instruction takes the return address from the stack (irrespective of whether it was put there by a CALL). Programs evolved thus far have rarely used this mechanism for its envisioned purpose, but somewhat to the author's surprise, have often used CALL to obtain constant numbers and RET as an indirect jump.

Example programs

Two versions of the factorial function are shown below as examples. (These were hand coded, not evolved. Comments are included in square brackets.) The initial instruction sequence pushes the number 5 on the stack; when the routine finishes, this has been replaced with 120.

Factorial(n) recursive style

	[n = 5]
	CONST0
	INC
	INC
	INC
	INC
	INC

	[factorial(n)]
	CALL
	LABEL0

	[goto end]
	JMP
	LABEL0
	LABEL0

	[separator between labels]
	NOT

	[factorial:]
	LABEL0
	SWAP

	[if n <= 1]
	DUP
	CONST1
	LE
	[goto done]
	JT
	LABEL1

	[n * factorial(n - 1)]
	DUP
	DEC
	CALL
	LABEL0
	MUL

	[return]
	SWAP
	RET

	[done:]
	LABEL1

	[1]
	POP
	CONST1

	[return]
	SWAP
	RET

	[end:]
	LABEL0
	LABEL0

Factorial(n) iterative style

	[n = 5]
	CONST0
	INC
	INC
	INC
	INC
	INC

	[result = 1]
	CONST0
	CONST1
	STORE

	[loop:]
	LABEL0

	[if n <= 1]
	DUP
	CONST1
	LE
	[goto done]
	JT
	LABEL1

	[result *= n]
	DUP
	CONST0
	LOAD
	MUL
	CONST0
	SWAP
	STORE

	[n--]
	DEC

	[goto loop]
	JMP
	LABEL0

	[separator between labels]
	NOT

	[done:]
	LABEL1

	[return result]
	CONST0
	LOAD

The game of Go

The current version of Lithos evolves programs to play the game of Go. It should be emphasized that the objective of this is not to produce something that can defeat a skilled human player - evolved programs under current hardware play very poorly even compared to hand-written programs. (Unsurprisingly, considering that they consist of a few hundred to a few thousand bytes of code, versus tens to hundreds of kilobytes for hand-written programs.) The objective is to experiment with artificial evolution on a problem that - unlike most of those studied, but like biological evolution - is effectively open-ended.

The following points should be noted concerning the rules as applied here:

In each generation, every program plays two games against every other, one time taking black and the other time taking white. Thus, programs co-evolve in an environment of other evolving programs; no preexisting hand-written opponent is required.

(Note that this means computation time per generation varies as the square of Population. This algorithm is used anyway because much of the advantage of having a large population in this context is a more accurate fitness function: each individual's fitness is better assessed in play against a more statistically significant number of different opponents.)

The procedure for playing two programs against each other is as follows:

  1. Allocate an empty 19 by 19 board.
  2. Allocate a memory space of Max Memory words for each program, initialized to all zeros.
  3. Present black with the current state of the board and allow it to make its move. An invalid move is taken as a pass.
  4. Do the same for white.
  5. Repeat steps 3 and 4 until both players have passed in a row or both have made Max Moves moves.
  6. Record the winner of the game.

After all games have been played, if two programs have scored the same number of wins but are of different sizes then size is used as a tie breaker; the shorter program has higher fitness. (Even a single extra win counts for more than any advantage in compactness, though.)

To allow a program to make a move:

  1. Copy the board state row by row into words 0 to 380 of the program's memory space. 0 = empty square, 1 = own stone, -1 = opponent's stone.
  2. Set word 381 to the number of prisoners the player has taken and word 382 to the negative of the number the opponent has taken.
  3. Leave the rest of the memory space unchanged so that the program can store information between moves if it so chooses.
  4. Run the program until it ends or times out after Max Time instructions.
  5. Take the top two words of the stack as the row and column of the desired move (assuming pushed in row, column order).
  6. If either is outside the range 0 to 18 or the move is otherwise illegal, count this as a pass.

(As it happens, a null program will place three stones in the top left corner of the board by effectively echoing its input. Against another null program, this claims the entire board as territory in the absence of any opposing stones. The Score column of the log file typically shows scores of 381 points in the first few generations, then dropping to very low levels as wins start being contested, and slowly climbing as skill improves.)

To give visual feedback, the evaluation function displays each generation the game between the most fit (taking black) and second most fit (taking white) individuals of the previous generation. This is shown as two boards side by side, the first being the final state of the board (. = empty square, * = black stone, O = white stone) and the second being the resulting territory claims.

The code for all of this is in the eval.c file. To apply Lithos to other tasks, simply replace the contents of this file with an appropriate evaluation function.

Download

C source code

The code was developed under Microsoft C++ version 4.2, but is written in vanilla ANSI C so should work substantially unmodified on any compiler. The included makefile is for GNU Make.

Windows executable

To use the program, simply download it into any directory and run it. The data and log files will be placed in the current directory. (The params file should also be placed there if you want to change the parameters.)

If an existing data file is found, the program will load it and continue from where it left off, otherwise it will start with a fresh population. Therefore if you want to end the current run and begin a new one, simply move, rename or delete the data file before restarting the program. When doing this you should do the same with the log file so that the program will create a fresh one.

To pause the program, press the space bar; press again to restart. Any other key will save the current population to the data file and exit. (With larger populations, there may be a few seconds delay while the program finishes the current evaluation.)

Back to home page


FastCounter by bCentral