Most current "AI plays Pokémon" projects try to make a single agent reason through the entire game from first principles. I wanted something closer to a programmable automation framework: deterministic tools for solved subproblems, with an agent orchestrating higher-level goals.

Making a bot

I've always been interested in bots which can play video games. As a kid, I was fascinated by bots like JMacro and RSBot which could seemingly play for hours without being detected, but I had no idea how they worked (I now know that they used scripted macros to walk the player through a loop of predetermined states, monitoring game state with a combination of colour recognition and direct client memory reads).

Getting bots to play the Pokémon games is a pretty classic experiment nowadays. The Gameboy Advance game Pokémon FireRed is a good environment: it has a limited and discrete action space (keypresses), a long-horizon challenge chunked up into clear goals (gyms), and reasonably understandable reward signals (enemy HP, Pokémon levels, Pokédex count).

FireRed automated playthrough
My bot playing through the first 2 gyms of Pokémon FireRed

Unless the bot is a fully deterministic macro, then it needs to have some kind of decision engine which decides what to do based on current environment state. There are examples where the decision engine has been a pre-trained LLM (Claude or GPT), an RL model, or even a Twitch hivemind.

The trend is to go full agentic, harness an LLM, and let it reason through the entire game from first principles. But I wanted to develop something a bit more macro-based, where I could kick off a full run and let it go without any additional prompts, or where a central control LLM would leverage a collage of scripts to complete different types of playthrough. It feels like a good ratchet: once a capability works reliably, freeze it into repeatable code instead of paying tokens to rediscover it every run - for example, why should I have an LLM decide how to playthrough every battle if I can 'solve' all battles and apply an optimal fight policy every time? I don't necessarily want 'LLM Plays Pokémon', I'm more interested in steerable code-based automation.

Orchestrator
    ↓
Mission System
    ↓
Goal Executors
    ↓
Reusable Tools
    ↓
Data + Memory

Bot architecture: goal framework

Taking inspiration from JMacro and other scriptable bots, we abstract the game into a series of missions and goals. This is purely declarative configuration which specifies the tasks we want the bot to complete at a high level.

Mission Registry (scripts/missions/__init__.py)

MISSION_SEQUENCE: list[str] = [
    "beat_brock",
    "beat_misty",
    # Future: "beat_surge", "beat_erika", etc.
]

Mission Structure (scripts/missions/beat_brock.py)

MISSION_NAME = "beat_brock"

GOALS = [
    # Oak's Parcel prerequisite
    FetchParcelGoal(),
    DeliverParcelGoal(),

    # Pewter preparation
    HealGoal(city="pewter"),
    BuyGoal(item_id=4, quantity=10, city="pewter"),   # Poké Balls
    BuyGoal(item_id=13, quantity=5, city="pewter"),   # Potions

    # Party building
    CatchGoal(species="Pikachu", target_map=(1, 0)),

    # Training
    TrainGoal(
        condition=MinLevelCondition(slot=0, level=12),
        grinding_map=(3, 20),  # Route 2
        heal_threshold=0.5,
        heal_city="pewter",
    ),

    # Gym battle
    HealGoal(city="pewter"),
    GymGoal(gym_id="pewter"),
    HealGoal(city="pewter"),

    # Post-badge catch (Route 3 opens)
    CatchGoal(species="Geodude", target_map=(1, 1)),
]

Each Goal is then an atomic objective with success criteria.

Base Architecture

Goal Abstract Base Class:

class Goal(ABC):
    max_retries: int = 3

    @abstractmethod
    def run(self, ctx: "MissionContext") -> None:
        """Execute goal, raise GoalFailed if success condition not met"""

    def key(self) -> str:
        """Stable string for progress tracking (dataclass fields → string)"""

MissionContext Dataclass (threaded through all goals):

MissionProgress:

GameData:

Goal Implementations

HealGoal (scripts/goals/heal_goal.py):

  1. Resolve city from current map or parameter
  2. Look up PC data in pokemon_centers.json
  3. Navigate to PC, enter via warp
  4. Interact with Nurse Joy (dialog + YES/NO menu)
  5. Verify full HP via FullPartyHPCondition
  6. Exit PC via south warp

BuyGoal (scripts/goals/buy_goal.py):

  1. Navigate to specified mart
  2. Enter shop, talk to clerk
  3. Navigate item list to target item_id
  4. Confirm purchase, verify bag count
  5. Exit shop

TrainGoal (scripts/goals/train_goal.py):

  1. Navigate to grinding_map
  2. Loop: Walk in grass until encounter
  3. Battle with strategy="fight" (flee from wild)
  4. Check condition.met(state) after each battle
  5. Heal at specified city when HP < heal_threshold

GymGoal (scripts/goals/gym_goal.py):

  1. Check if badge already earned (skip if yes)
  2. Navigate to gym entrance, enter via warp
  3. Fight each trainer in registry trainers list
  4. Navigate to leader tile, press A for battle
  5. Execute leader battle with strategy="smart"
  6. Verify badge bit set in flag array

CatchGoal (scripts/goals/catch_goal.py):

  1. Navigate to target_map
  2. Walk in grass/cave until target species encountered
  3. Battle with policy="catch" (weaken + throw balls)
  4. Verify species is caught

NavigateGoal:

InteractGoal (scripts/goals/interact_goal.py):

  1. Navigate to target_pos (or adjacent if occupied)
  2. Press A to interact
  3. Advance through dialogue
  4. Verify success_condition.met(read_state(client))

Bot architecture: Runtime orchestration

Missions and goals are run by orchestration wrappers which handle retries, checkpoints, progress tracking. And obviously these executors have a top-level entry point run_game.py which boots the system and handles startup and shutdown tasks.

Core Execution Loop (run_mission())

def run_mission(goals, ctx, name, checkpoint_fn=None):
    ctx.progress.mission_name = name

    # Deduplicate identical goal keys
    effective_keys = _deduplicate_keys([g.key() for g in goals])

    for goal, key in zip(goals, effective_keys):
        if ctx.progress.is_complete(key):
            log_goal(ctx, name, key, 0, "skip", None)
            continue

        ctx.current_goal_key = key
        _run_with_retries(goal, ctx, key, name)

        if checkpoint_fn:
            checkpoint_fn(key, ctx)

Bot architecture: tools

I already mentioned the use of functions (e.g. optimal battle policies) across multiple Goals. Codification feels like a sensible thing to do, instead of asking the decision engine (LLM, system brain) to find an elegant solution every time, provided that we don't go overboard with codifying every tiny detail which would defeat the point of having a bot.

Tool example: Battle Engine (tools/autobattle.py)

State Modeling:

Decision Logic (decide_action()):

Strategies:

Execution Flow:

def run_battle(client, strategy="fight", objective="win"):
    while in_battle(client):
        state = read_battle_state(client)
        action = decide_action(state, strategy, objective)
        execute_action(client, action)
        wait_for_action_menu(client)

Tool example: Within-Map Navigation (tools/navigator.py)

Core Algorithm: Breadth-First Search (BFS) on collision grids

Soft-Blocking Recovery:

  1. Detect blocking NPC via read_npc_positions()
  2. Attempt replanning around NPC
  3. Wait timeout for NPC movement
  4. Raise NavigationError if permanently blocked
  5. Attempt recovery to a neutral state on persistent failures

Goals are then composed of tool calls with sufficient flexibility to explore and set parameters (e.g. target coordinates). For example, within GymGoal.run() there will be calls to:

Bot architecture: data layer

Some game information is hard-coded, but everything has been worked out by the agent at some point. My agent's main operating constraint is that it can't actually see the game graphics; it has to get all of its information from programmatic reads of the game's internal memory state. So I typically ask it to write a "probe script" which looks at memory addresses to deduce location waypoints, unique tile behaviours, NPC locations, etc.

My rationale for this approach is that a human player would typically pull up a world map to see which routes connect different cities, rather than blindly navigating. To be honest, I prefer the idea that the agent could probe everything live during a run, and avoid pre-existing hardcoded knowledge, but the bot isn't currently at that point.

Pokémon Centers (data/pokemon_centers.json):

{
  "pewter": {
    "map_id": [6, 0],
    "outdoor_map": [3, 2],
    "entrance": [12, 10],
    "nurse_tile": [4, 4],
    "exit_tile": [4, 8]
  }
}

PokéMarts (data/pokemarts.json):

{
  "pewter": {
    "map_id": [6, 1],
    "outdoor_map": [3, 2],
    "entrance": [4, 8],
    "counter_tile": [5, 4],
    "item_list": [4, 13, 14, 15, 24, ...]
  }
}

Typical workflow

I am typically asking the agent to develop the next Mission or Goal that it requires for progressing through the storyline, with instructions to utilise existing tools where possible, and to use ad hoc probes to bake offline data only where needed for determining 'difficult to see' locations. And of course I'm asking it to use re-usable probe scripts so that its knowledge accumulates.

It runs its prototype script in a Ralph loop against live tests (usually verified by logs or gamestate memory reads) until it passes the objective, at which point the script is added into the overarching game run for deterministic execution. It is so important with agentic development and Ralph loops to ensure that the agent has rich feedback signals - I have to consistently remind it to add verbose logging to everything it is doing, so that it can debug and iterate.

Sometimes it misinterprets logs (it especially struggles to detect timing/race conditions) or persistently reads the wrong memory address. If it gets stuck (e.g. it encounters a previously unseen NPC type that it can't figure out how to engage) then I'll ask it to take a screenshot. If it gets really stuck then the best thing to do is stream the run over VNC and watch what happens. There was a funny example at the start of development (where it was working with limited knowledge of different tile types) where it insisted that its navigation route should be passable and it couldn't understand why it kept getting blocked.

Next steps

I'm hoping that subsequent storyline missions will become faster and faster to execute, although I anticipate some tricky moments when the agent encounters new obstacles for the first time (not looking forward to ice tiles and boulder puzzles...).

When we have enough scripted building blocks, it will also be interesting to recombine them in new ways. It will be possible to make an optimisation harness that tests different Goal sequencing and parameters to find the quickest playthrough time, or to adapt them for different playstyles (like going for Pokédex completion runs). And maybe I can strip out the hardcoding so that this codebase becomes a game-specific harness that an LLM agent can drive from first principles.