Building a Game-Theory Optimal Poker Bot

Poker is an imperfect information game. Unlike chess (where you can see the whole board), your opponent's cards are hidden. The mathematically optimal strategy isn't "always do X," it's a mixed strategy: bet 60% of the time, check 40%, with the exact frequencies depending on your cards, the board, the pot size, your stack depth, and the action history. Computing these frequencies requires solving a game tree with millions of nodes.

I built a bot that does this in real-time. It reads cards from the browser DOM via SVG parsing, solves postflop game trees with TexasSolver (a C++ poker solver), and executes actions through Playwright with human-like timing. It plays 6-max No-Limit Hold'em on a popular online casino at $0.25/$0.50 stakes, running 100-200 hands per hour with minimal supervision.

Vision: Reading Cards from SVGs

The casino renders cards as base64-encoded SVG images embedded in CSS background-image properties. Each card face is an <svg> element with paths for the rank symbol and suit symbol, colored by fill attributes.

The card recognition pipeline:

%%MERMAID_START%%graph LR A[DOM Query] --> B[Base64 Decode] B --> C{MD5 Hash Lookup} C -->|Hit 99%| D[Return rank, suit] C -->|Miss| E[Parse SVG Paths] E --> F[Color: Red/Black] E --> G[Rank: Path Hash] E --> H[Suit: Curve Analysis] F --> I[Card Identified] G --> I H --> I I --> J[Update Hash Cache]%%MERMAID_END%%

Hash cache is the fast path. Every SVG the bot has seen before is stored as an MD5 hash mapping to (rank, suit) in a JSON file. On startup, ~300 known cards load into memory. Cache hits take 0.1ms.

SVG parsing handles cache misses (new card art after site updates). The parser:

Extracts the second <path> element's fill color. Red (#E9113C) means hearts or diamonds. Black (#071824) means spades or clubs.
Hashes the rank path's d attribute (the actual SVG path data) against a rank hash table. Each rank (A, K, Q, ..., 2) has a unique path shape.
Identifies the suit from the large clipped symbol group (16x16 at a specific position). The detection is geometric:
- Diamond: Only M and L commands (straight lines forming a rhombus), no curves
- Heart: Two paths with bezier curves, moderate path length, low starting Y coordinate
- Spade: Two paths with curves, high starting Y (points upward)
- Club: Multiple bezier curves (C commands), longest total path length

This works because each suit has a fundamentally different shape. Diamonds are the only suit with no curves. Hearts and spades both have curves but differ in vertical orientation. Clubs have the most complex outline.

Unknown card learning: When the site pushes a UI update with new card art, the bot can't parse it. It stores the raw SVG and sends it to the dashboard via WebSocket. I label it manually ("Ac", "Kh", etc.) and the hash cache updates. After labeling 50-ish cards, the bot handles the new art set autonomously.

The whole vision system reads full table state (hole cards, board, pot, all player stacks and actions) in 50-100ms via concurrent DOM queries.

The Decision Engine

Three tiers, from cheapest to most expensive:

Preflop: Hand Charts

Preflop decisions use static range tables. For each position (UTG through BTN) and each facing situation (unopened, facing raise, facing 3-bet), there's a lookup table of playable hands.

# Simplified example
RFI_RANGES = {
    "UTG": "77+, ATs+, KQs, AQo+",
    "CO":  "22+, A2s+, K9s+, Q9s+, J9s+, ATo+, KJo+",
    "BTN": "22+, A2s+, K2s+, Q5s+, J7s+, T7s+, 97s+, A2o+, K8o+, Q9o+",
}

The ranges get wider as position improves (later position = more information before you act). UTG opens ~12% of hands, BTN opens ~45%.

Short-stack mode kicks in below 8 big blinds. The decision simplifies to push or fold. Hand strength is scored by a formula: pairs get rank * 2 + 14, non-pairs get high_card + low_card + suited_bonus. Position thresholds determine the cutoff (UTG needs 24+, BTN needs 16+).

BB isolation raises handle limpers. If 3+ players limp in, the big blind raises to 4x BB + 1x BB per limper with a tighter range. This exploits the wide, weak ranges of players who limp rather than raise.

Postflop: GTO Solver

Postflop is where it gets interesting. The decision space is too large for static tables. Each unique combination of board cards, hole cards, pot size, stack depth, and action history produces a different optimal strategy. The only way to compute it is to solve the game tree.

TexasSolver is a C++ poker solver that computes Nash equilibrium strategies. The bot generates a text input file describing the game state:

set_pot 67
set_effective_stack 2000
set_board Qs,Jh,2h
set_range_ip JJ+,AKs,AQs,AJs,...
set_range_oop 22+,AK,AQ,...
set_bet_sizes ip,flop,bet,67
set_bet_sizes ip,turn,bet,67
set_bet_sizes ip,river,bet,75
build_tree
start_solve

The solver runs as a subprocess with 16 threads, converging to 0.5% accuracy. For a flop decision (3 streets to solve), this takes 10-12 seconds. Turn decisions (2 streets) take 5-8 seconds. River decisions (1 street) take 2-3 seconds.

The output is a JSON tree where each node contains the strategy for every possible hand:

{
  "AsKh": {"bet_67": 0.6, "check": 0.4},
  "7d7c": {"bet_67": 0.1, "check": 0.9}
}

AKo on this board bets 60% of the time (it has overcards and a gutshot). Pocket sevens check 90% (weak hand on a high board). The bot selects an action using weighted random: 60% of the time it bets with AK, 40% it checks. This mixed strategy is what makes it unexploitable. If it always bet AK, opponents could adjust. With mixed frequencies, there's no counter-strategy.

Three-Tier Cache

Solving every hand live would mean 10-second decisions on every street. The cache hierarchy fixes this:

Tier 1: Memory-mapped file (solver_mega.bin). I pre-solved ~50,000 common boards offline and packed the results into a single binary file with an index. The bot memory-maps this file (zero-copy, no deserialization) and looks up results by hashing the board + ranges + pot + stack. Hit rate in real play: ~95%.

Tier 2: LRU cache (128 entries). Live-solved hands get cached here. If the same board + action situation comes up again in the session, it's instant.

Tier 3: Live solver. Cache miss. Run TexasSolver as a subprocess. 10-15 second timeout, result stored in Tier 2.

Suit isomorphism is the key optimization that makes the cache practical. The flop Qs Jh 2h and Qh Js 2s are strategically identical (swap hearts and spades). By canonicalizing both to the same key, a single cached result covers all suit permutations. This reduces the number of unique boards by up to 24x (4! suit permutations, minus those already equivalent).

Bet Sizing

Keeping the game tree tractable requires limiting bet sizes. Each additional bet size exponentially grows the tree (every node branches for each possible size). The bot uses:

Flop/Turn: 67% pot (one size)
River: 75% pot (one size)
Donk bets: 50-60% pot
Raises: 60% of previous bet
All-in: always available

This is a deliberate tradeoff. Real GTO uses continuous sizing (any amount), but discrete sizes with one option per street keep solving under 12 seconds. The accuracy loss from simplified sizing is much smaller than the accuracy gain from actually solving (vs. using heuristics).

Action Execution

Playwright controls a Chromium browser. The executor:

Waits with human-like timing. Action delay follows a normal distribution (mean 600ms, std 200ms). Folds are slightly faster (0.7x multiplier), raises slightly slower (1.1x). This avoids the bot-like pattern of instant actions.
Sets the bet amount by finding the input element, using a native property setter (not simulated keystrokes, which React intercepts), and dispatching input + change events. If the amount is below the minimum, it catches the validation dialog and adjusts.
Clicks the action button by dispatching the full React event chain: pointerdown, pointerup, click, mousedown, mouseup. React's synthetic event system requires these in sequence. Missing any one of them and the click silently fails.

Safety net: The _align_action_to_available() function maps the decision engine's output to the actual available UI buttons. If the solver says "raise" but only "call" and "fold" are available (because the UI hasn't updated yet), it falls back to the appropriate legal action. It never folds when check is free. It never calls when check is available.

Villain Profiling

The bot tracks opponent statistics in SQLite:

VPIP (Voluntarily Put In Pot): How often they play hands
PFR (Preflop Raise): How often they raise preflop
AF (Aggression Factor): (bets + raises) / calls
WTSD (Went to Showdown): How often they see showdown

From these stats, it classifies player types:

def player_type(vpip, pfr):
    if vpip >= 45 or (vpip >= 30 and pfr <= 12):
        return "fish"    # Loose-passive: plays too many hands, rarely raises
    if vpip < 17 and pfr < 12:
        return "nit"     # Tight-passive: only plays premium hands
    if vpip >= 28 and pfr >= 20:
        return "lag"     # Loose-aggressive: plays many hands aggressively
    if 17 <= vpip <= 28 and 12 <= pfr <= 22:
        return "tag"     # Tight-aggressive: the standard solid style
    return "unknown"

Against fish (loose-passive), the bot tightens its calling range because fish tend to overvalue weak hands. Against nits, it widens its postflop ranges to exploit their tightness. Against LAGs, it plays standard GTO (aggressive opponents are hard to exploit, so equilibrium strategy is safest).

The Dashboard

A FastAPI server runs alongside the bot, serving a WebSocket-powered dashboard on port 8420. The SharedHub class acts as an in-process event bus (no serialization overhead, just Python objects shared between the bot loop and the web server).

The dashboard shows:

Current hand state (hole cards, board, pot, player positions)
Decision log with reasoning codes (solver_mmap, preflop_chart, heuristic, push_fold)
Strategy frequencies when available ("bet 67%: 0.6, check: 0.4")
Villain stats per player at the table
Session profit tracking with peak/drawdown metrics
Unknown card labeling interface

Recovery

The bot handles disconnections, table closures, and seat losses with exponential backoff:

Detect seat loss (hero stack disappears from DOM)
Wait 3 cycles, retry sit-in
If that fails, wait 6 cycles, retry
Double the wait each time up to 10 retries
After 10 failures, hard reset: full page reload, navigate back to lobby, find a new table, sit down

Session-aware profit tracking detects table switches by checking player overlap. If more than 70% of players changed, it archives the current session and starts a new one. Rebuys are tracked separately so the P&L stays accurate across table switches.

Post-Processing Guards

The solver produces GTO strategies, but GTO has blind spots in multi-way pots (the solver computes heads-up equilibria). Several post-processing guards adjust:

Multiway tightening: If 3+ players see the flop and the solver says "call" (computed for heads-up), the bot checks hand strength against a tighter threshold. Weak draws that are profitable heads-up become unprofitable against multiple opponents.

Naked draw guard: The solver sometimes calls with bare gutshots (4 outs, ~8% equity). In practice, these are marginal at best and lose money against the player pool. The bot folds them.

Facing-bet reroute: If the solver's action path assumed we'd act first (check or bet) but we're actually facing a bet (opponent bet before us), the solver's output is for the wrong game tree. The bot falls back to a heuristic based on hand strength rather than using mismatched solver output.

SPR guard: With a small stack-to-pot ratio (less than 3), speculative hands (small pairs, suited connectors) don't have enough implied odds to continue. The bot folds them preflop even if the chart says call.

What I'd Do Differently

The mmap cache should update online. Right now, live-solved hands go into the LRU cache but not the mmap file. Over time, the most commonly encountered board textures at my stake should feed back into the persistent cache. I'd need a background process that periodically flushes the LRU cache to the mmap file.

Card recognition should use template matching, not SVG parsing. The SVG approach works for this specific casino but wouldn't transfer to a site that uses raster images (PNG/JPEG) for cards. A small CNN or template matching system would be more portable, though the hash cache approach would still work as the fast path.

The solver integration should support multi-way solving. TexasSolver computes heads-up equilibria. For 3+ player pots, the bot approximates by solving heads-up against the last aggressor and then applying multiway tightening heuristics. A true 3-way solver would produce better strategies but the computational cost is orders of magnitude higher (the game tree grows exponentially with player count).

I should track solver accuracy vs. actual outcomes. The bot logs every decision and its reasoning, but it doesn't correlate decisions with final hand results. Building a feedback loop that measures "when the solver said bet 67% and I bet, what was my win rate?" would help identify spots where the solver's assumptions (bet sizes, ranges) are miscalibrated for the player pool.