Most current "AI plays Pokémon" projects try to make a single agent reason through the entire game from first principles. I wanted something closer to a programmable automation framework: deterministic tools for solved subproblems, with an agent orchestrating higher-level goals.
Making a bot¶
I've always been interested in bots which can play video games. As a kid, I was fascinated by bots like JMacro and RSBot which could seemingly play for hours without being detected, but I had no idea how they worked (I now know that they used scripted macros to walk the player through a loop of predetermined states, monitoring game state with a combination of colour recognition and direct client memory reads).
Getting bots to play the Pokémon games is a pretty classic experiment nowadays. The Gameboy Advance game Pokémon FireRed is a good environment: it has a limited and discrete action space (keypresses), a long-horizon challenge chunked up into clear goals (gyms), and reasonably understandable reward signals (enemy HP, Pokémon levels, Pokédex count).
Unless the bot is a fully deterministic macro, then it needs to have some kind of decision engine which decides what to do based on current environment state. There are examples where the decision engine has been a pre-trained LLM (Claude or GPT), an RL model, or even a Twitch hivemind.
The trend is to go full agentic, harness an LLM, and let it reason through the entire game from first principles. But I wanted to develop something a bit more macro-based, where I could kick off a full run and let it go without any additional prompts, or where a central control LLM would leverage a collage of scripts to complete different types of playthrough. It feels like a good ratchet: once a capability works reliably, freeze it into repeatable code instead of paying tokens to rediscover it every run - for example, why should I have an LLM decide how to playthrough every battle if I can 'solve' all battles and apply an optimal fight policy every time? I don't necessarily want 'LLM Plays Pokémon', I'm more interested in steerable code-based automation.
Orchestrator
↓
Mission System
↓
Goal Executors
↓
Reusable Tools
↓
Data + Memory
Bot architecture: goal framework¶
Taking inspiration from JMacro and other scriptable bots, we abstract the game into a series of missions and goals. This is purely declarative configuration which specifies the tasks we want the bot to complete at a high level.
Mission Registry (scripts/missions/__init__.py)¶
MISSION_SEQUENCE: list[str] = [
"beat_brock",
"beat_misty",
# Future: "beat_surge", "beat_erika", etc.
]
Mission Structure (scripts/missions/beat_brock.py)¶
MISSION_NAME = "beat_brock"
GOALS = [
# Oak's Parcel prerequisite
FetchParcelGoal(),
DeliverParcelGoal(),
# Pewter preparation
HealGoal(city="pewter"),
BuyGoal(item_id=4, quantity=10, city="pewter"), # Poké Balls
BuyGoal(item_id=13, quantity=5, city="pewter"), # Potions
# Party building
CatchGoal(species="Pikachu", target_map=(1, 0)),
# Training
TrainGoal(
condition=MinLevelCondition(slot=0, level=12),
grinding_map=(3, 20), # Route 2
heal_threshold=0.5,
heal_city="pewter",
),
# Gym battle
HealGoal(city="pewter"),
GymGoal(gym_id="pewter"),
HealGoal(city="pewter"),
# Post-badge catch (Route 3 opens)
CatchGoal(species="Geodude", target_map=(1, 1)),
]
Each Goal is then an atomic objective with success criteria.
Base Architecture¶
Goal Abstract Base Class:
class Goal(ABC):
max_retries: int = 3
@abstractmethod
def run(self, ctx: "MissionContext") -> None:
"""Execute goal, raise GoalFailed if success condition not met"""
def key(self) -> str:
"""Stable string for progress tracking (dataclass fields → string)"""
MissionContext Dataclass (threaded through all goals):
progress: MissionProgress for completion trackingdata: GameData with static registriesbattle_handler: Optional callback for goal-specific battle policycheckpoint_fn: Optional callback for mid-goal checkpointsrun_logger: Optional goal-specific logging
MissionProgress:
- Loads/saves to
saves/progress_{mission}.json completed_keys: Set of goal keys already achieved
GameData:
- Centralized access to static game data - loads
data/gyms.json,data/pokemon_centers.json,data/pokemarts.json
Goal Implementations¶
HealGoal (scripts/goals/heal_goal.py):
- Resolve city from current map or parameter
- Look up PC data in
pokemon_centers.json - Navigate to PC, enter via warp
- Interact with Nurse Joy (dialog + YES/NO menu)
- Verify full HP via
FullPartyHPCondition - Exit PC via south warp
BuyGoal (scripts/goals/buy_goal.py):
- Navigate to specified mart
- Enter shop, talk to clerk
- Navigate item list to target
item_id - Confirm purchase, verify bag count
- Exit shop
TrainGoal (scripts/goals/train_goal.py):
- Navigate to
grinding_map - Loop: Walk in grass until encounter
- Battle with
strategy="fight"(flee from wild) - Check
condition.met(state)after each battle - Heal at specified city when HP <
heal_threshold
GymGoal (scripts/goals/gym_goal.py):
- Check if badge already earned (skip if yes)
- Navigate to gym entrance, enter via warp
- Fight each trainer in registry
trainerslist - Navigate to leader tile, press A for battle
- Execute leader battle with
strategy="smart" - Verify badge bit set in flag array
CatchGoal (scripts/goals/catch_goal.py):
- Navigate to
target_map - Walk in grass/cave until target species encountered
- Battle with
policy="catch"(weaken + throw balls) - Verify species is caught
NavigateGoal:
- Thin wrapper around
navigate_to_map() - Success condition: player at target map within tolerance
InteractGoal (scripts/goals/interact_goal.py):
- Navigate to
target_pos(or adjacent if occupied) - Press A to interact
- Advance through dialogue
- Verify
success_condition.met(read_state(client))
Bot architecture: Runtime orchestration¶
Missions and goals are run by orchestration wrappers which handle retries, checkpoints, progress tracking. And obviously these executors have a top-level entry point run_game.py which boots the system and handles startup and shutdown tasks.
Core Execution Loop (run_mission())¶
def run_mission(goals, ctx, name, checkpoint_fn=None):
ctx.progress.mission_name = name
# Deduplicate identical goal keys
effective_keys = _deduplicate_keys([g.key() for g in goals])
for goal, key in zip(goals, effective_keys):
if ctx.progress.is_complete(key):
log_goal(ctx, name, key, 0, "skip", None)
continue
ctx.current_goal_key = key
_run_with_retries(goal, ctx, key, name)
if checkpoint_fn:
checkpoint_fn(key, ctx)
Bot architecture: tools¶
I already mentioned the use of functions (e.g. optimal battle policies) across multiple Goals. Codification feels like a sensible thing to do, instead of asking the decision engine (LLM, system brain) to find an elegant solution every time, provided that we don't go overboard with codifying every tiny detail which would defeat the point of having a bot.
Tool example: Battle Engine (tools/autobattle.py)¶
State Modeling:
BattlePokemon: Species, HP, status, moves, statsBattleState: Current battle snapshot (participants, turn state)BattleAction: FIGHT, BAG, SWITCH, RUN with parameters
Decision Logic (decide_action()):
- Move selection policy: based on scores (e.g.
power × effectiveness × STAB) and other heuristics - Healing policy: HP < threshold + item available →
ItemAction - Switch policy: Bad matchup detection + better alternative available
Strategies:
"fight": Best type-effective move (default)"smart": Type-aware + voluntary switching"flee": Run from wild battles"catch": Weakening moves + Poké Ball throwing
Execution Flow:
def run_battle(client, strategy="fight", objective="win"):
while in_battle(client):
state = read_battle_state(client)
action = decide_action(state, strategy, objective)
execute_action(client, action)
wait_for_action_menu(client)
Tool example: Within-Map Navigation (tools/navigator.py)¶
Core Algorithm: Breadth-First Search (BFS) on collision grids
find_path(): BFS over walkable tiles with complex constraintsexecute_path(): Step execution with verification viaprevCoordsnavigate_to(): Primary primitive (find_path + execute_path with replanning)
Soft-Blocking Recovery:
- Detect blocking NPC via
read_npc_positions() - Attempt replanning around NPC
- Wait timeout for NPC movement
- Raise
NavigationErrorif permanently blocked - Attempt recovery to a neutral state on persistent failures
Goals are then composed of tool calls with sufficient flexibility to explore and set parameters (e.g. target coordinates). For example, within GymGoal.run() there will be calls to:
navigate_to_map()→ bring player to the Gym ofcitystep_through_warp()→ find the door tile and enter the Gymrun_battle()→ for every battle in the Gym, run the battle engine
Bot architecture: data layer¶
Some game information is hard-coded, but everything has been worked out by the agent at some point. My agent's main operating constraint is that it can't actually see the game graphics; it has to get all of its information from programmatic reads of the game's internal memory state. So I typically ask it to write a "probe script" which looks at memory addresses to deduce location waypoints, unique tile behaviours, NPC locations, etc.
My rationale for this approach is that a human player would typically pull up a world map to see which routes connect different cities, rather than blindly navigating. To be honest, I prefer the idea that the agent could probe everything live during a run, and avoid pre-existing hardcoded knowledge, but the bot isn't currently at that point.
Pokémon Centers (data/pokemon_centers.json):
{
"pewter": {
"map_id": [6, 0],
"outdoor_map": [3, 2],
"entrance": [12, 10],
"nurse_tile": [4, 4],
"exit_tile": [4, 8]
}
}
PokéMarts (data/pokemarts.json):
{
"pewter": {
"map_id": [6, 1],
"outdoor_map": [3, 2],
"entrance": [4, 8],
"counter_tile": [5, 4],
"item_list": [4, 13, 14, 15, 24, ...]
}
}
Typical workflow¶
I am typically asking the agent to develop the next Mission or Goal that it requires for progressing through the storyline, with instructions to utilise existing tools where possible, and to use ad hoc probes to bake offline data only where needed for determining 'difficult to see' locations. And of course I'm asking it to use re-usable probe scripts so that its knowledge accumulates.
It runs its prototype script in a Ralph loop against live tests (usually verified by logs or gamestate memory reads) until it passes the objective, at which point the script is added into the overarching game run for deterministic execution. It is so important with agentic development and Ralph loops to ensure that the agent has rich feedback signals - I have to consistently remind it to add verbose logging to everything it is doing, so that it can debug and iterate.
Sometimes it misinterprets logs (it especially struggles to detect timing/race conditions) or persistently reads the wrong memory address. If it gets stuck (e.g. it encounters a previously unseen NPC type that it can't figure out how to engage) then I'll ask it to take a screenshot. If it gets really stuck then the best thing to do is stream the run over VNC and watch what happens. There was a funny example at the start of development (where it was working with limited knowledge of different tile types) where it insisted that its navigation route should be passable and it couldn't understand why it kept getting blocked.

Next steps¶
I'm hoping that subsequent storyline missions will become faster and faster to execute, although I anticipate some tricky moments when the agent encounters new obstacles for the first time (not looking forward to ice tiles and boulder puzzles...).
When we have enough scripted building blocks, it will also be interesting to recombine them in new ways. It will be possible to make an optimisation harness that tests different Goal sequencing and parameters to find the quickest playthrough time, or to adapt them for different playstyles (like going for Pokédex completion runs). And maybe I can strip out the hardcoding so that this codebase becomes a game-specific harness that an LLM agent can drive from first principles.