Building a bot to play Pokémon games

Most current "AI plays Pokémon" projects try to make a single agent reason through the entire game from first principles. I wanted something closer to a programmable automation framework: deterministic tools for solved subproblems, with an agent orchestrating higher-level goals.

Making a bot¶

I've always been interested in bots which can play video games. As a kid, I was fascinated by bots like JMacro and RSBot which could seemingly play for hours without being detected, but I had no idea how they worked (I now know that they used scripted macros to walk the player through a loop of predetermined states, monitoring game state with a combination of colour recognition and direct client memory reads).

Getting bots to play the Pokémon games is a pretty classic experiment nowadays. The Gameboy Advance game Pokémon FireRed is a good environment: it has a limited and discrete action space (keypresses), a long-horizon challenge chunked up into clear goals (gyms), and reasonably understandable reward signals (enemy HP, Pokémon levels, Pokédex count).

FireRed automated playthrough — My bot playing through the first 2 gyms of Pokémon FireRed

Unless the bot is a fully deterministic macro, then it needs to have some kind of decision engine which decides what to do based on current environment state. There are examples where the decision engine has been a pre-trained LLM (Claude or GPT), an RL model, or even a Twitch hivemind.

The trend is to go full agentic, harness an LLM, and let it reason through the entire game from first principles. But I wanted to develop something a bit more macro-based, where I could kick off a full run and let it go without any additional prompts, or where a central control LLM would leverage a collage of scripts to complete different types of playthrough. It feels like a good ratchet: once a capability works reliably, freeze it into repeatable code instead of paying tokens to rediscover it every run - for example, why should I have an LLM decide how to playthrough every battle if I can 'solve' all battles and apply an optimal fight policy every time? I don't necessarily want 'LLM Plays Pokémon', I'm more interested in steerable code-based automation.

Orchestrator
    ↓
Mission System
    ↓
Goal Executors
    ↓
Reusable Tools
    ↓
Data + Memory

Bot architecture: goal framework¶

Taking inspiration from JMacro and other scriptable bots, we abstract the game into a series of missions and goals. This is purely declarative configuration which specifies the tasks we want the bot to complete at a high level.

Mission Registry (`scripts/missions/init.py`)¶

MISSION_SEQUENCE: list[str] = [
    "beat_brock",
    "beat_misty",
    # Future: "beat_surge", "beat_erika", etc.
]

Mission Structure (`scripts/missions/beat_brock.py`)¶

MISSION_NAME = "beat_brock"

GOALS = [
    # Oak's Parcel prerequisite
    FetchParcelGoal(),
    DeliverParcelGoal(),

    # Pewter preparation
    HealGoal(city="pewter"),
    BuyGoal(item_id=4, quantity=10, city="pewter"),   # Poké Balls
    BuyGoal(item_id=13, quantity=5, city="pewter"),   # Potions

    # Party building
    CatchGoal(species="Pikachu", target_map=(1, 0)),

    # Training
    TrainGoal(
        condition=MinLevelCondition(slot=0, level=12),
        grinding_map=(3, 20),  # Route 2
        heal_threshold=0.5,
        heal_city="pewter",
    ),

    # Gym battle
    HealGoal(city="pewter"),
    GymGoal(gym_id="pewter"),
    HealGoal(city="pewter"),

    # Post-badge catch (Route 3 opens)
    CatchGoal(species="Geodude", target_map=(1, 1)),
]

Each Goal is then an atomic objective with success criteria.

Base Architecture¶

Goal Abstract Base Class:

class Goal(ABC):
    max_retries: int = 3

    @abstractmethod
    def run(self, ctx: "MissionContext") -> None:
        """Execute goal, raise GoalFailed if success condition not met"""

    def key(self) -> str:
        """Stable string for progress tracking (dataclass fields → string)"""

MissionContext Dataclass (threaded through all goals):

progress: MissionProgress for completion tracking
data: GameData with static registries
battle_handler: Optional callback for goal-specific battle policy
checkpoint_fn: Optional callback for mid-goal checkpoints
run_logger: Optional goal-specific logging

MissionProgress:

Loads/saves to saves/progress_{mission}.json
completed_keys: Set of goal keys already achieved

GameData:

Centralized access to static game data - loads data/gyms.json, data/pokemon_centers.json, data/pokemarts.json

Goal Implementations¶

HealGoal (scripts/goals/heal_goal.py):

Resolve city from current map or parameter
Look up PC data in pokemon_centers.json
Navigate to PC, enter via warp
Interact with Nurse Joy (dialog + YES/NO menu)
Verify full HP via FullPartyHPCondition
Exit PC via south warp

BuyGoal (scripts/goals/buy_goal.py):

Navigate to specified mart
Enter shop, talk to clerk
Navigate item list to target item_id
Confirm purchase, verify bag count
Exit shop

TrainGoal (scripts/goals/train_goal.py):

Navigate to grinding_map
Loop: Walk in grass until encounter
Battle with strategy="fight" (flee from wild)
Check condition.met(state) after each battle
Heal at specified city when HP < heal_threshold

GymGoal (scripts/goals/gym_goal.py):

Check if badge already earned (skip if yes)
Navigate to gym entrance, enter via warp
Fight each trainer in registry trainers list
Navigate to leader tile, press A for battle
Execute leader battle with strategy="smart"
Verify badge bit set in flag array

CatchGoal (scripts/goals/catch_goal.py):

Navigate to target_map
Walk in grass/cave until target species encountered
Battle with policy="catch" (weaken + throw balls)
Verify species is caught

NavigateGoal:

Thin wrapper around navigate_to_map()
Success condition: player at target map within tolerance

InteractGoal (scripts/goals/interact_goal.py):

Navigate to target_pos (or adjacent if occupied)
Press A to interact
Advance through dialogue
Verify success_condition.met(read_state(client))

Bot architecture: Runtime orchestration¶

Missions and goals are run by orchestration wrappers which handle retries, checkpoints, progress tracking. And obviously these executors have a top-level entry point run_game.py which boots the system and handles startup and shutdown tasks.

Core Execution Loop (`run_mission()`)¶

def run_mission(goals, ctx, name, checkpoint_fn=None):
    ctx.progress.mission_name = name

    # Deduplicate identical goal keys
    effective_keys = _deduplicate_keys([g.key() for g in goals])

    for goal, key in zip(goals, effective_keys):
        if ctx.progress.is_complete(key):
            log_goal(ctx, name, key, 0, "skip", None)
            continue

        ctx.current_goal_key = key
        _run_with_retries(goal, ctx, key, name)

        if checkpoint_fn:
            checkpoint_fn(key, ctx)

Bot architecture: tools¶

I already mentioned the use of functions (e.g. optimal battle policies) across multiple Goals. Codification feels like a sensible thing to do, instead of asking the decision engine (LLM, system brain) to find an elegant solution every time, provided that we don't go overboard with codifying every tiny detail which would defeat the point of having a bot.

Tool example: Battle Engine (`tools/autobattle.py`)¶

State Modeling:

BattlePokemon: Species, HP, status, moves, stats
BattleState: Current battle snapshot (participants, turn state)
BattleAction: FIGHT, BAG, SWITCH, RUN with parameters

Decision Logic (decide_action()):

Move selection policy: based on scores (e.g. power × effectiveness × STAB) and other heuristics
Healing policy: HP < threshold + item available → ItemAction
Switch policy: Bad matchup detection + better alternative available

Strategies:

"fight": Best type-effective move (default)
"smart": Type-aware + voluntary switching
"flee": Run from wild battles
"catch": Weakening moves + Poké Ball throwing

Execution Flow:

def run_battle(client, strategy="fight", objective="win"):
    while in_battle(client):
        state = read_battle_state(client)
        action = decide_action(state, strategy, objective)
        execute_action(client, action)
        wait_for_action_menu(client)

Tool example: Within-Map Navigation (`tools/navigator.py`)¶

Core Algorithm: Breadth-First Search (BFS) on collision grids

find_path(): BFS over walkable tiles with complex constraints
execute_path(): Step execution with verification via prevCoords
navigate_to(): Primary primitive (find_path + execute_path with replanning)

Soft-Blocking Recovery:

Detect blocking NPC via read_npc_positions()
Attempt replanning around NPC
Wait timeout for NPC movement
Raise NavigationError if permanently blocked
Attempt recovery to a neutral state on persistent failures

Goals are then composed of tool calls with sufficient flexibility to explore and set parameters (e.g. target coordinates). For example, within GymGoal.run() there will be calls to:

navigate_to_map() → bring player to the Gym of city
step_through_warp() → find the door tile and enter the Gym
run_battle() → for every battle in the Gym, run the battle engine

Bot architecture: data layer¶

Some game information is hard-coded, but everything has been worked out by the agent at some point. My agent's main operating constraint is that it can't actually see the game graphics; it has to get all of its information from programmatic reads of the game's internal memory state. So I typically ask it to write a "probe script" which looks at memory addresses to deduce location waypoints, unique tile behaviours, NPC locations, etc.

My rationale for this approach is that a human player would typically pull up a world map to see which routes connect different cities, rather than blindly navigating. To be honest, I prefer the idea that the agent could probe everything live during a run, and avoid pre-existing hardcoded knowledge, but the bot isn't currently at that point.

Pokémon Centers (data/pokemon_centers.json):

{
  "pewter": {
    "map_id": [6, 0],
    "outdoor_map": [3, 2],
    "entrance": [12, 10],
    "nurse_tile": [4, 4],
    "exit_tile": [4, 8]
  }
}

PokéMarts (data/pokemarts.json):

{
  "pewter": {
    "map_id": [6, 1],
    "outdoor_map": [3, 2],
    "entrance": [4, 8],
    "counter_tile": [5, 4],
    "item_list": [4, 13, 14, 15, 24, ...]
  }
}

Typical workflow¶

I am typically asking the agent to develop the next Mission or Goal that it requires for progressing through the storyline, with instructions to utilise existing tools where possible, and to use ad hoc probes to bake offline data only where needed for determining 'difficult to see' locations. And of course I'm asking it to use re-usable probe scripts so that its knowledge accumulates.

It runs its prototype script in a Ralph loop against live tests (usually verified by logs or gamestate memory reads) until it passes the objective, at which point the script is added into the overarching game run for deterministic execution. It is so important with agentic development and Ralph loops to ensure that the agent has rich feedback signals - I have to consistently remind it to add verbose logging to everything it is doing, so that it can debug and iterate.

Sometimes it misinterprets logs (it especially struggles to detect timing/race conditions) or persistently reads the wrong memory address. If it gets stuck (e.g. it encounters a previously unseen NPC type that it can't figure out how to engage) then I'll ask it to take a screenshot. If it gets really stuck then the best thing to do is stream the run over VNC and watch what happens. There was a funny example at the start of development (where it was working with limited knowledge of different tile types) where it insisted that its navigation route should be passable and it couldn't understand why it kept getting blocked.

Next steps¶

I'm hoping that subsequent storyline missions will become faster and faster to execute, although I anticipate some tricky moments when the agent encounters new obstacles for the first time (not looking forward to ice tiles and boulder puzzles...).

When we have enough scripted building blocks, it will also be interesting to recombine them in new ways. It will be possible to make an optimisation harness that tests different Goal sequencing and parameters to find the quickest playthrough time, or to adapt them for different playstyles (like going for Pokédex completion runs). And maybe I can strip out the hardcoding so that this codebase becomes a game-specific harness that an LLM agent can drive from first principles.