Texas Hold 'em Series (2): Poker Hands Dataset
2019-03-30
In this post, I’ll walk through the whole process to download, clean and then browse one of world’s largest poker hands history dataset, the IRC Poker Database1, which is a little bit aged but well-known for its huge size. The work we’re doing here is meant to be a preparation for further analysis and model training.
History
Before the advent of real-money online poker servers, there was the Internet Relay Chat (IRC) poker server. The server was programmed by Todd Mummert, with support code by Greg Reynolds, and other Usenet rec.gambling.poke enthusiasts. The participants in these games were mostly computer geeks with a passion for poker. Many were serious students of the game, armed with the analytical skills needed to understand the mathematics, and all other aspects of advanced poker strategy.
Michael Maurer wrote a program called Observer that sat in on IRC poker channels and quietly logged the details of every game it witnessed. This resulted in the collection of the more than 10 million complete hands of poker (from 1995-2001) that constitute the IRC Poker Database.
Sadly, the IRC games are now gone (but might be resurrected one day).2
Notation
We’ll be using the same short-hand notations as we gave in the last post. For bet actions, we define
-
: no actionB
: blind betf
: foldk
: checkb
: betc
: callr
: raiseA
: all-inQ
: quitK
: kicked out
As for rounds, we denote
p
: pre-flopf
: flopt
: turnr
: rivers
: showdown
Data Preparation
I’ve written several scripts3 for all sorts of data preparations and the code can be found on my GitHub repository. After entering the repo, run the following codes in order:
wget http://poker.cs.ualberta.ca/IRC/IRCdata.tgz # download the database (-> IRCdata.tgz)
tar -xvf IRCdata.tgz # unzip the tgz file (-> IRCdata)
python3 extract.py # extract data (-> hands.json)
python3 clean.py # drop invalid hand data (-> hands_valid.json)
Eventually there’re $10{,}233{,}955$ hands in hands.json
and $437{,}862$ in hands_valid.json
after cleaning.
Data Inspection
You may run the following code to inspect hands in their original order. Any time you’d like to stop browsing, you can just use Ctrl+C
to interrupt the process.
python3 browse.py # print hands in a formatted way
The script lists extracted hands history as below.
############################################################
time : 199612 id : 2093
board : ['Qd', '6s', 'Td', 'Qc', 'Jh']
pots : [(2, 60), (2, 60), (2, 60), (2, 60)]
players :
Tiger (#1)
{'action': 30,
'bankroll': 2922,
'bets': [{'actions': ['B', 'r'], 'stage': 'p'},
{'actions': ['k'], 'stage': 'f'},
{'actions': ['k'], 'stage': 't'},
{'actions': ['k'], 'stage': 'r'}],
'pocket_cards': ['9s', 'Ac'],
'winnings': 30}
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
jvegas2 (#2)
{'action': 30,
'bankroll': 139401,
'bets': [{'actions': ['B', 'c'], 'stage': 'p'},
{'actions': ['k'], 'stage': 'f'},
{'actions': ['k'], 'stage': 't'},
{'actions': ['k'], 'stage': 'r'}],
'pocket_cards': ['9c', 'As'],
'winnings': 30}
############################################################
So this screenshot describes hand #2093, which happened in December of 1996. There were two players at the table, namely Tiger (the SB) and jvegas2 (the BB). By default the game started with Tiger paying 5USD, who’s got a 9♠ and an A♣ with a bankroll of 2,922USD, and jvegas2 paying 10USD, whose pocket cards were 9♣ and A♠ with a bankroll of 139,401USD. Then Tiger raised to 30USD (3BB) and jvegas2 called. So preflot pot was 60USD and there’re two players. The flop was Q♦, 6♠ and 10♦. It was a dry hand by far so both checked at the flop. The turn was Q♣, and then both checked again. At the river came J♥, nothing special, and again checked both. Both players stuck to the showdown and it was a tie, so the two shared the total pot 60USD.