Before reporting the results, let me thank Fred Gitelman and Tom Throop for being kind enough to make copies of their programs (Bridge Master and Bridge Baron respectively) available to me. Let me also remind folks that both BM and BB are in a far more advanced state of development than GIB is.
The BM test was run under Microsoft Windows on a 200 MHz Pentium Pro. BB was given 10 seconds to select each play, and GIB was given 100 seconds per play. These numbers were actually selected to equalize the computational resources used by the two programs; BB almost always used its full 10 second allotment once it entered its double dummy mode around trick 7 (it uses DD analysis near the end of the deal, much as GIB does throughout). GIB very rarely used its full allotment; when it did, it was only for the first trick or two and after that play became extremely quick. On average, GIB seemed to take about 70 seconds to play the full deal, although there were a handful of deals on which GIB spent more like 400 seconds in total. The additional computational resources seemed to make very little difference, although I do know of at least one hand on which they were needed for GIB to find the winning line.
GIB was given the meanings of the auctions leading to the contracts in question; I typed them in by hand but didn't give GIB any fancy inferences. So an opening pass simply showed 0-12 HCP, and so on. GIB was also given similarly obvious information about the opening lead (e.g., leading a J against 6NT presumably shows the 10 and denies the Q). The information presented was always consistent with the opponents' actual holdings; GIB currently simply ignores information that is known to be inconsistent with the observed holdings, instead of allowing it to degrade gracefully. (For example, an opening bidder who cannot hold 12 HCP is assumed by GIB to hold any number whatsoever, although 11 HCP is probably a much better guess.) BB was not given this information because I was unable to find a way to select BB's final contract while allowing the opponents to bid their hands.
Of the 180 deals in BM, BB was successful on 33. At least one of these was a case where BB took the wrong line but BM did not punish it for doing so.
GIB was successful on 116 of the deals. Here are more detailed results:
|
Level 1 2 3 4 5 Total |
BB 16 8 2 1 4 33 18.3% |
GIB 33 22 18 19 24 116 64.4% |
Each entry is the number of deals that were played successfully by the program in question.
GIB's mistakes are illuminating. Some of them are of the sort that have been discussed before, when people have suggested that it would not be possible to develop an effective card player using the DD approach; things like failing to gather information to avoid a guess.
In general, however, GIB's mistakes are of another sort. GIB is very good (nearly optimal, in fact) at identifying specific possibilities that will allow a contract to be made or defeated. What it is weak at is combining such possibilities. As an example, suppose that you are playing a hand and you can take finesse a or finesse b; if you take a successful finesse, you make it and if you take an unsuccessful finesse, you go down. Or, you can do something clever (c) that allows you to make if either finesse is on. Finally, you can do something random (d) that simply defers the guess.
GIB will choose randomly between (c) and (d) in this situation, assuming that if it can defer the guess, it will make it correctly in the future! (And on a DD basis, it would.) This pattern accounts for virtually all of GIB's mistakes; as BM's deals get more difficult, they more often involve combining a variety of possibly winning options and that is why GIB's performance falls off at levels 2 and 3.
At still higher levels, however, BM typically involves the successful development of complex end positions, and GIB's performance rebounds. This appeared to happen to BB as well, although to a much lesser extent. It was quite gratifying for me to see GIB discover entry shifting and guard squeezes among a wide variety of other end positions. Of course, the BM deals were developed based on the end positions that the bridge community itself has discovered; GIB may well find possibilities that people have yet to stumble upon.
Relative to the 1997 computer bridge championship in Albuquerque, GIB is running on identical hardware, under the same operating system, and with identical time constraints. A large number of bugs that were present in Albuquerque have been fixed. The Windows version of GIB does not bid well because of inefficient code produced by the Microsoft C compiler; it is for this reason that the Linux version was used.
The 200 MHz Pentium Pro is substantially faster than the machine made available to Bridge Baron in Albuquerque. The Bridge Baron team in Albuquerque, however, used a development version of their program as opposed to the commercial release. This development version is probably about 1/3 IMP/board better than the commercial release, but I have heard reports that the version running in Albuquerque took approximately 90 seconds per play, which is obviously beyond tournament limits. I have offered to repeat the match using whatever software Bridge Baron would like to provide me.
The 1/3 IMP/board calculation is based on a technical article by Stephen Smith and Dana Nau in which they report that over 1000 deals, the new version outplays the old on 254, and is outplayed by it on 202. Assuming that the net difference of 52 deals is divided evenly among games going down vs. making, part scores going down vs. making, and overtricks, and that the result is vulnerable half the time, we find the there is an average pickup of about 6 IMPs when the new version does better, hence 312 IMPs over 1000 boards.
The match consisted of 32 randomly dealt boards, and GIB won by a score of 93-47. Most of BB's IMPs came on two deals: one where GIB bid a good vulnerable slam that went down (basically either a finesse or a finesse and a split, 13 IMPs), and another where GIB had disastrous results at both tables, going for 1100 against a vulnerable game at one and languishing in a part score at the other (14 IMPs). BB's other pickups were all 5 IMPs or fewer and involved part score battles.
In general, however, GIB won the part score battles. It seemed to be more effective in all aspects of the game: competitive and constructive auctions in addition to card play. The 1.4 IMP/board margin is almost exactly what was predicted by a group of experts during the San Francisco nationals, and is obviously well in excess of the difference between the commercial and development versions of BB.