useful or interesting facts

And · **#21**

AlphaZero.jl
About A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://github.com/jonathan-laurent/AlphaZero.jl

dust · **#22**

And wrote:

AlphaZero.jl
About A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://github.com/jonathan-laurent/AlphaZero.jl

Interesting. There's a note that:

Quote:

This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA9550-16-1-0288 and FA8750-18-C-0092. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.

Is there anymore information on the link to DARPA?

And · **#23**

ALL SYSTEMS GO
BY DAVID A. MECHNER
The Sciences, Volume 38, Number 1,
© 1998 David A. Mechner
This past July, the annual conference of the American Association for Artificial Intelligence was marked by an oddly heroic event. Throughout the conference, held in Providence, Rhode Island, world-class players of chess, checkers and other games pitted their skills against state-of-the-art computer adversaries. For the most part the humans either lost or barely eked out victories. Then a twenty-seven-year-old native of New Mexico named Janice Kim challenged a computer to the oldest, most complex game of all: go.
When a video image of the match flickered onto one of the oversize monitors in the "Hall of Champions," audience members laughed and shook their heads in disbelief: the computer was being granted a twenty-five-stone handicap. Kim is an exceptional player--the only female go professional the West has ever produced--but this seemed ridiculous: the first stone she placed looked like a lone paratrooper dropped into an enemy garrison.

I was the commentator for the match, yet I had not seen a handicap like that in a long time. When I started playing go as a high school sophomore living in Chappaqua, New York, my father, then an experienced amateur, started me out at twenty-three stones. I did not stay at that handicap for long. The rules of the game are easy to learn--much simpler than those of chess--but playing go well requires a combination of restraint and aggression, of quick intuition and cold calculation that appealed to me. And I liked to win. I devoured all the go books I could find and spent every weekend at the go club in New York City. By 1988, still a high school senior, I was the U.S. representative to the world go youth championship in Paris, and had made arrangements to apprentice myself to a Japanese go master in the fall.

Despite its prestige and history in Japan, its huge popularity in Korea, and its cachet among European intellectuals, go is all but unknown in the United States. As a result, we go players are used to explaining ourselves. We can intone an (inevitably stilted) litany in our sleep: Go is an ancient game of strategy that originated in China some 4,000 years ago. The board is a nineteen-line grid; the stones are black and white, all alike; and the subtleties that develop from those simple elements make it the deepest game in the world.

Go is a game of construction--the board starts out empty, and the opponents, one playing black, the other white, take turns placing their stones on the intersections, trying to surround and control unclaimed territories. Once played, the stones do not move. Go is a game of construction, but it is also a game of war: once a group of stones is completely surrounded, it is removed from the board. As conflicts arise over territory, each player struggles to surround and capture his opponent's stones while protecting his own. The game ends by agreement, when neither player can improve his position.

But that only begins to explain the true character of the game. If chess is a battle, go is an entire campaign. It is a long game in which a number of battles may rage simultaneously, none of them decisive, and a single move can affect the entire board. There is a saying (mostly among go aficionados, I have to admit) that go is to chess as poetry is to double-entry accounting.

Unfortunately, a twenty-five-stone handicap gives the game's true depth and beauty little chance to emerge. After a dozen moves, Kim's stones were adrift without a base. It looked as if Handtalk, the best go program in the world, would maintain control of the entire board and deal her a crushing defeat. A few people in the audience suggested that the game be abandoned, that the handicap was too great for even a professional to overcome.

As commentator, I was torn. When I arrived at my go master's house in Tokyo, tradition dictated that the apprentice paint a slogan in Japanese calligraphy and hang it on the walls of the playing room. My slogan, blobbed in a script more awkward than any Japanese first grader's, was katsu, or "win." To begin training for go at the professional level, you need only love the game; to survive for any length of time, though, you have to hate to lose. My sympathies, in other words, were entirely with Kim, but I was tempted to agree with the skeptics.

My concern turned out to be unwarranted. Just as the spectators began to lose heart, the computer made a bad move, and as Kim told us after the game, she heard a little voice in her head saying, "Just put one foot in front of the other, keep taking little baby steps across the board, and you will win." And win she did. The loss with a twenty-five-stone handicap gives the world's best go program a ranking of around seventeen kyu--that of a beginner who has been playing for a couple of months. In stark contrast, last year the world's best chess program, Deep Blue, handily defeated the Russian world champion, Garry Kasparov, with no handicap at all. Kasparov seemed to feel that his was an epochal defense of humanity's intellectual supremacy, yet the affair afforded go programmers little more than a rueful chuckle: the day a go program can stump even an average amateur is still a long way off.

If computers can be programmed to master chess, why are they so weak at go? The answer, in a nutshell, is that Deep Blue's approach simply will not work with go. Go is too complex, too subtle a game to yield to exhaustive, relatively simpleminded computer searches. To master it, a program has to think more like a person. That fact represents a high hurdle to programmers--and an exciting opportunity. Since the 1950s chess has posed a grand challenge to the field of artificial intelligence, or AI. But the more chess programs have improved, the more they have sidestepped the issues that once made them seem worth designing. Go sends investigators back to the basics--to the study of learning, of knowledge representation, of pattern recognition and strategic planning. It forces us to find new models for how we think, or at least for how we think we think.

A rtificial intelligence has always joined mundane goals with lofty ones. On one level the field has been about making clever devices: computers that understand English, systems that diagnose diseases, toys that play games. On another level, though, such devices are supposed to serve as windows on the mind. AI investigators have long maintained that intelligence can be understood strictly in terms of computation--and apart from any physical systems that happen to exhibit it. By dissecting and characterizing intelligence, then building its components into devices that can do some of the things that people do, engineers and programmers hope to succeed where philosophers and psychologists have failed: they hope to develop a scientific understanding of intelligence.

In the late 1950s, when computer programs began, however tentatively, to approximate elements of human reasoning, AI investigators made some bold predictions. Within a decade, many said, computers would be as intelligent as humans: they would communicate with people in English and discover important mathematical proofs. But distilling the calculus of cognition proved much harder than most people expected.

Intelligence demands knowledge, of course, but most of what people know is so deeply ingrained that it hardly seems like knowledge at all: that a cup of water, tipped over, will spread onto a table; that painting a wall will not change the shape of the wall or cause a flash flood in Texas. But those bits of common sense are embedded in a nervous system that has been optimized over millions of years of evolution. Reproducing such knowledge in a computer program is still impossible. The result, for AI, is that programs designed to deal with some part of the real world "intelligently" can do so only by ignoring everything else. A program called MYCIN, for instance, can correctly diagnose a patient with meningitis if certain symptoms are described to it. But if you describe the rust spots on your car, MYCIN will likely diagnose your car with meningitis as well.

To skirt the shortcomings of their machines, AI investigators have had to focus on "toy domains"--highly simplified and idealized worlds in which the rules governing reality can be expressed by a handful of axioms. In the 1950s chess seemed an excellent choice. The game is formally simple--played as it is with an eight-by-eight-square checkerboard, thirty-two pieces and a few basic rules--yet mastering it is a hallmark of human intellectual achievement. At the time, machines that could evaluate billions of positions a minute were practically inconceivable. To succeed at chess, it seemed, computers would have to beat human players at their own game. Investigators therefore began by focusing on planning, on the optimum ways of representing tactical and strategic knowledge of the game, and on sophisticated positional judgment, as well as the mechanics of "search."

The term search, in this context, refers to a simple algorithm known as "minimax," used in all successful modern chess programs. When choosing from among several moves, the logic goes, explore as many branches of the "game tree" as possible: first try all your currently possible moves, then try all your opponent's possible responses to each of your moves, then try your responses to those responses, and so on, until you run out of time to think. Finally, pick the move that leads to the best possible position, assuming your opponent will offer the strongest possible resistance.

By the 1980s, chess-playing programs that focused on identifying high-level goals and ways to achieve them were making only halting progress. Programs that concentrated on search, meanwhile, were fast becoming serious contenders. The story of computer chess had begun to sound like the children's book Fortunately, by Remy Charlip: Fortunately, the further computers could think ahead, the better their moves were. Unfortunately, deeper searches took longer. Fortunately, computer hardware doubled in speed every couple of years. Unfortunately, that speed encouraged investigators to abandon high-level processing and focus on search alone.

Today, forty-seven years after the minimax algorithm was first proposed as an approach to computer chess, Deep Blue can examine a staggering 200 million positions per second. That raw power, combined with clever refinements in the algorithm, enable Deep Blue to think fourteen or fifteen moves ahead--and as deep as forty or forty-five moves when necessary. As a result, the program has beaten Kasparov, arguably the strongest chess player ever. The grand challenge of computer chess has been met. Unfortunately, it has left many of the questions that inspired it unanswered.

Over the years, AI investigators have gotten used to swallowing big, nasty doses of reality. Many of them, therefore, are greeting Deep Blue's success as a refreshing change of pace. Still, others are left cold. They deride Deep Blue for using "brute force"--for having nothing whatever to do with human intelligence. If the purpose of AI is to learn enough about intelligence to create intelligent computer programs, as the pioneering investigator John McCarthy, professor of computer science at Stanford University, suggests, then computer chess has little to do with AI, and Deep Blue is just a triumph of engineering. When human players look ahead, they examine just a few moves at various choice points--or, more often, just one. Instead of considering billions of positions, they consider dozens. When a reporter asked the former world chess champion José Capablanca how many moves ahead he looked while playing, Capablanca replied: "Only one, but it's always the right one."

Other students of AI argue, however, that criticisms of brute force are misguided, and that computer chess shows that the end justifies the means. Since computers match human performance in mechanistic ways, they say, people who are uncomfortable calling computers intelligent will simply have to redefine what they mean by intelligence. Not only is it unnecessary for computers to emulate human thought, they argue, it is a bad idea. People have parallel "hardware"--the brain--in which the basic computational events occur slowly (on the order of milliseconds), but simultaneously, by the billions. Digital computers have serial hardware, which carries out fundamental operations one after the other, but in a matter of nanoseconds. People are good at recognizing patterns; computers are good at crunching numbers. Trying to get computers to solve problems the way people do, such AI advocates conclude, makes as much sense as trying to build an airplane that has feathers and flaps its wings.

That sounds reasonable, but it still smacks of retreat: if a task seems too hard, Deep Blue fans seem to be suggesting, just redefine it. Fortunately or unfortunately, depending on your perspective, the game of go will not let programmers off that easy. The most practical way for a computer to play go, it seems, just happens to be the most interesting way.

A cursory analysis might suggest that the difficulty of computer go has to do with combinatorics. The go board has a much larger grid than a chessboard, so the number of possible moves is much larger. On average, at any given moment in a chess game, a player has twenty-five legal moves to choose from. Various refinements have enabled the minimax algorithm to effectily considers only the five or six best choices, pruning away the rest. In go, on average, a player can choose from 250 legal moves per turn. Even if a go algorithm could prune that number down to fifteen or sixteen, and consider 200 million positions per second, it would need seventy years to explore as deeply as Deep Blue does in three minutes.

Very well, one might say; computers will get faster soon enough. But the problem goes deeper than that. Go programs not only fail to evaluate 200 million go positions a second; they cannot accurately evaluate a single one.

Remember that Deep Blue must not only follow those billions of branches of the game tree; it must also calculate the value of the final position in each branch, so that a choice of moves can be made. In chess, positions are relatively easy to evaluate. The identity of each piece is overwhelmingly important: being down a pawn is acceptable, being down a queen is not. Throw in a few positional factors, such as king safety and the number of squares the pieces on each side can attack, and you have a definition of positional value that, combined with massive search, is good enough to beat a grand master.

In go, such a strategy will not work. Each stone is worth the same amount, and there is no king to protect. The amount of territory the players control determines who is ahead. The trick is deciding just who controls what.

A group of stones can be surrounded and captured, but it can also assume a defensive formation, called "making life," that protects it from capture for the rest of the game. In order to make life, a group has to claim two separate pieces of territory, or "eyes." But to ensure a group's safety a player need not always waste moves forming both eyes: he only has to know that he can do so if attacked--like a castle guard who keeps the drawbridge down until invaders are sighted. Similarly, groups of stones are "dead" when they have no hope of making life. The opponent can capture and remove a dead group at any time, but he may not bother to do so. Why waste lives storming castle walls when the defenders have nearly run out of food?

Now take a particular group of black stones resting on the go board: how should the computer count it? Is the group dead? If so, the space it occupies and the territory it surrounds belong to white. Or is the group alive? Then the territory it surrounds belongs to black. The answers to such questions can be passing subtle. Is the group defensible or indefensible? Does the group need to spend moves defending itself, or can it become the aggressor? Hundreds of points, at any given moment, can hinge on the status of groups that are neither clearly alive nor clearly dead. How well a player analyzes the signs helps determine how well he plays the game.

With experience, people can learn to judge a group's status quickly and skillfully. But how can those judgments be turned into numbers a computer can use? People are astonishingly good at pattern matching: they can recognize a face at various angles, in shadows or in light, smiling or frowning; they can recognize a voice across a squealing subway car or a crackling telephone line. But people cannot easily evaluate go positions at a glance. They must learn to combine their pattern-matching abilities with logic and knowledge in order to make deep but highly restricted searches of the game tree. Go programs, on the other hand, depend largely on rough heuristics, or rules of thumb, for deciding whether groups are alive or dead. That tactic works in simple situations, but when used to evaluate even slightly complicated positions it can fail disastrously.

Kim's game against Handtalk demonstrated the problem perfectly. Early on, she trapped two of Handtalk's stones so that she could capture them at any time. Handtalk recognized that those stones were lost--that a rescue attempt would be pointless. But Kim's configuration posed another, hidden threat: her stones could not only capture Handtalk's stones, they could make life as well, thereby threatening all the stones around them. Handtalk failed to notice the dual nature of the attack. It wrote off the battle as an isolated loss of two stones, and never bothered to defend the others at risk.

I can sympathize with Handtalk in a strange way: I have had to learn the difficulties of go twice, once as an aspiring player, and once as a programmer trying to teach the game to a computer. When I moved to Tokyo at the age of eighteen, I felt like a kind of retarded older brother to my master's four other pupils, who ranged in age from eleven to seventeen. Every weekend we joined some fifty other aspirants at the playing hall of the Japanese Go Association for the ongoing tournament that would, at the end of the year, determine who was promoted to professional status and who was left to struggle on for another year.

At the time, I was the only Westerner in the league. Most of the young people I played with were hugely talented and had started playing by the age of eight. Many of them had dropped out of high school to concentrate on the game. Yet even among these professional students, our house was considered the most serious.

My teacher's house was perhaps the last still following an ancient Japanese instructional tradition of austerity. Our days were spent kneeling on tatami mats in front of our go boards, playing through masters' games and mulling over our victories and defeats of the preceding weekend. Sometimes we would cross the street to play ball in the park or visit the corner store, but we were implicitly forbidden to have outside friends or interests.

When I got back to the United States and started college, I tried playing go against the existing computer programs, but I was appalled at their shortcomings. I was sure I could do better. After writing down some ideas, I teamed up with an old friend, Tim Klinger, now a doctoral candidate in computer science at New York University, and we set about creating our own go program.

As an experienced player, I knew that certain configurations of stones come up again and again. Tim and I decided to incorporate those patterns into our program as a number of if-then rules: if pattern X appears on the board, then Y is a good move to achieve goal Z. Because I think in terms of groups of stones, our program also recognizes and operates on groups. That general approach--explicitly seeking to model human knowledge and decision-making processes--might be called the traditional AI approach to computer go. All the best go programs have used it.

Because "life" and "death" are critical aspects of go, we have concentrated mostly on them. Our life-and-death engine performs deep, narrow search. Instead of assuming that all moves are admissible, as a chess program does, it begins by assuming that none are. First, the algorithm establishes a goal for the search--"capture stone A," for instance. Then it decomposes that goal into subgoals, sub-subgoals and so on. Once a "goal tree" has been established, the program can draw on its various if-then rules to determine which moves are most likely to achieve each minor goal in turn. Once all those subsets of moves have been determined and compared, the program is left with a few legal moves most likely to achieve the ultimate goal.

The goal tree enables the program to recognize futile positions quickly and abandon bad lines of play. Suppose, for instance, that the program has to achieve subgoals A and B in order to achieve ultimate goal C. If it can find no moves to achieve A, it can stop; there is no need to waste time determining how to pursue B.

Our life-and-death engine is almost complete. Although the knowledge base we have given it is made up of only about 200 patterns, the system is solving problems successfully. Over the next six months we hope to incorporate the engine into our go program and flesh out the knowledge base. If all goes well, and if life and death in go are as important as we think they are, the resultant program should easily best every other in the field.

In the meantime, we have a number of problems to overcome. The traditional AI approach relies on introspection: How do I evaluate a position? How do I encode that method into software? But our powers of introspection are extremely limited; if you doubt it, try explaining exactly how you catch a ball. (You may be tempted to say, "I just raise my hand and grab the ball as it flies by." But that only begins to explain it. First you have to calculate the distance and trajectory of the ball from the flat image projected onto your retina; then you must use that trajectory to calculate how much the muscles in your hand, shoulder, arm, back and legs need to flex in order to move your limbs and bend your joints in just the right way at just the right moment. As experts in robotics have found out, the whole process is almost impossible to state as an equation, let alone to solve in real time.) Perverse as it sounds, writing a go program is a constant battle against common sense. At every turn, your brain tries to convince you that what it is doing is complete and coherent--but it never is.

Then there are the software engineering problems common to any large programming project. Our program handles a lot of complex knowledge, and while it is designed to do so as efficiently as possible, unwanted interactions between its various parts are hard to avoid. Countless parameters and pieces of code can interfere with performance, and we spend a dismaying amount of time tracking down problems caused by seemingly innocuous changes.

We are increasingly tempted to write a program that simply teaches itself--either by analyzing the games of strong players or by playing against itself. That approach may sound too simpleminded to work, but a couple of remarkable programs have already taught themselves to play other games. TD-Gammon, the best backgammon program in the world, written by the physicist Gerald Tesauro of the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, started out by making random moves. Then, after playing millions of practice games against itself, the program bootstrapped itself into a nearly perfect player. The "neural net" that encodes TD-Gammon's expertise takes a long time to evaluate positions by the standards of computer chess: TD-Gammon can search only three moves deep in 15 seconds. But the positional knowledge it encodes is so sophisticated that its decisions are nearly always right.

A few programmers have tried to apply the same techniques to go, but their success has been limited. Even the most sophisticated self-taught programs play like rank beginners; a bright newcomer to the game of go might beat any of those programs in the very first game. Yet machine learning remains a highly promising avenue of research, and I am beginning to pursue it now, with the help of the electrical engineers Donald C. Wunsch and Raonak Zaman of Texas Tech University in Lubbock. Learning to play go from scratch may be too much to ask of a neural net. But if such a net were added to our already sophisticated program, it might gradually teach itself some of the more intuitive points of go strategy--how to evaluate the strength of a group or the value of a wall of stones, for instance.

No matter how you approach it, go is hard to program. Fortunately, its difficulties happen to be the right kind of difficulties: they force programmers to be innovative, to unify existing techniques, and thereby to foster basic advances in artificial intelligence.

And yet, for the moment, computer go remains the campaign-finance reform of AI: everyone agrees it is a good idea, but no one wants to do it alone. It may be that computer go is simply too daunting, too prone to disappointment, to attract the kind of concerted effort that led to Deep Blue's triumph. Although a few tens of thousands of dollars in prizes are given out every year in computer go competitions, they are not enough to support even one programmer. The Ing Foundation in Taiwan is one of the most generous supporters of computer go, and offers a tantalizing $1.6 million prize for a program that can beat a professional. Unfortunately, the offer ends in the year 2000, and the goal is still so remote that the offer might as well expire tomorrow.

As it stands, the field of computer go is dominated by dedicated hobbyists. Handtalk was not written by a team of computer scientists and engineers funded by a giant multinational corporation, nor by a government-funded team of academics at a top research institute. It was written by a retired chemistry professor from Guangzhou Province in China, on his ancient PC, and in assembly language because he did not have a C compiler. The bottleneck in computer go has to do not with processor power, but with ideas.

Nevertheless, following gingerly in the footsteps of AI optimists before me, I predict that thirty years from now a computer will play go, without a handicap, against someone of Kim's caliber. And the computer will win. Whether the essence of intelligence can be captured in manipulated symbols, whether massive search will be of any use outside of toy domains, whether self-taught machines can ever solve real-world problems: those are all open questions. But whatever the answers, computer go is poised to be an integral part of the process of discovery.

David A. Mechner is a doctoral fellow at New York University's Center for Neural Science.

useful or interesting facts

Who is online