One-Shot Quest Generation: LLMs Meet Legacy Java

ScapeRune is a RuneScape server emulator. The codebase is about 7.8 million characters of Kotlin and Java spread across a client, server, and world-server. Quests are the game's primary content, and each one is a sprawling implementation: NPC dialogues, item checks, combat encounters, cutscenes, variable tracking, journal entries, and reward logic, all wired together through the server's event system.

Writing a quest by hand takes hours. Understanding the server's quest framework, its dialogue system, its NPC scripting model, the item and variable databases, and then writing hundreds of lines of Kotlin that integrates correctly with all of it. Most private server developers copy existing quests and modify them, which is slow and error-prone.

I wanted to see if an LLM could generate a complete, functional quest implementation from a single prompt. Not a skeleton. Not pseudocode. Actual Kotlin that compiles and runs on the server.

The Data Pipeline

The one-shot prompt needs to contain everything the model needs to produce correct code. For a RuneScape quest, "everything" is a lot:

Server framework documentation: How the quest system works. The dialogue DSL, NPC scripting API, item manipulation, variable storage, journal system.
NPC and item definitions: The model needs to know NPC IDs, item IDs, object IDs. These are numeric constants in the game's cache.
Protocol data from RSProx: RSProx is a packet proxy that sits between the RuneScape client and server. It captures and decodes every packet, giving us the exact data formats, IDs, and behaviors of the live game.
Wiki data: The Old School RuneScape wiki has detailed quest guides with step-by-step walkthroughs, NPC locations, required items, and reward tables. This is the specification for what the quest should do.
Existing quest examples: The model needs to see how existing quests are structured in this specific codebase. Not generic Kotlin, but the exact patterns, imports, and APIs that this server uses.

RSProx Capture Data

RSProx sits as a local proxy between the game client and server. It decodes the binary protocol and logs structured data:

NPC spawn locations, movement patterns, combat stats
Item definitions (names, IDs, examine text, equipment stats)
Object definitions (clickable world objects, their interactions)
Map data (walkable tiles, region boundaries)
Interface layouts (dialogue windows, shop inventories)

This data gets exported and fed into the prompt as structured context. When the model needs to know "what NPC ID is the cook in Lumbridge?", the answer comes from RSProx captures, not from hard-coded guesses.

Wiki Fetching

A scraper pulls the relevant quest guide from the OSRS wiki. The wiki's quest guides are remarkably detailed (the OSRS community is thorough about documentation). The scraper extracts:

Quest requirements (skill levels, prerequisite quests)
Items needed (with quantities)
Step-by-step walkthrough
NPC dialogue transcripts (what the NPCs actually say)
Reward details (XP, items, quest points)

This becomes the functional specification. The model knows what the quest is supposed to do because the wiki tells it exactly what happens at each step.

The Prompt Assembly

The one-shot prompt is assembled from these sources:

The assembled prompt is typically 40-60K tokens. It includes:

A system section explaining the code generation task
The server's quest framework API (the KDialogue DSL, quest state management, NPC scripting patterns)
2-3 complete existing quest implementations as examples
The target quest's wiki specification
Relevant NPC/item/object ID constants from RSProx
Explicit instructions about file structure, imports, and naming conventions

The Dialogue System

ScapeRune uses a Kotlin DSL for NPC dialogues (KDialogue). A dialogue script looks like:

class CookDialogue : KDialogue() {
    override fun npcIds() = arrayOf(NPCs.COOK)
 
    override suspend fun DialogueContext.dialogue() {
        npc(FacialExpression.SAD, 
            "What am I to do? I need ingredients for the Duke's birthday cake!")
        
        val choice = options(
            "I'll help you!",
            "Not my problem.",
            "What ingredients do you need?"
        )
        
        when (choice) {
            1 -> {
                player(FacialExpression.HAPPY, "I'll help you!")
                npc(FacialExpression.HAPPY, "Thank you, thank you!")
                startQuest(Quests.COOKS_ASSISTANT)
            }
            2 -> {
                player(FacialExpression.DEFAULT, "Not my problem.")
                npc(FacialExpression.SAD, "Oh dear, oh dear...")
            }
            3 -> {
                npc(FacialExpression.DEFAULT, 
                    "I need an egg, a pot of flour, and a bucket of milk.")
            }
        }
    }
}

The model sees this pattern (and several more complex examples) in the prompt. It learns the DSL's conventions: npc() for NPC lines, player() for player lines, options() for choice menus, FacialExpression enum for animations.

AI-Powered NPCs

Beyond scripted quests, I integrated Claude directly into the NPC dialogue system. The AIMysticDialogue class in the codebase demonstrates this: an NPC whose responses are generated in real-time by Claude with a personality prompt that keeps them in character.

private const val MYSTIC_PERSONALITY = """
    You are a mysterious mystic NPC in ScapeRune, a medieval fantasy RPG world.
    Your name is Zara the Seer. You live in a small tent near the wilderness border.
    Keep responses brief (1-3 sentences). Never break character.
    Never mention being an AI or language model.
"""

The NPC uses the same KDialogue DSL but calls Claude's API for each response. The integration goes through a ChatterBotFactory that abstracts the LLM provider, so you can swap Claude for a local model (for NPCs that need lower latency or run without API access).

The RS-SDK: Agentic Bots

The companion project (RS-SDK) takes this further. It's a framework for building LLM-powered bots that play the game autonomously. The bot gets a typed SDK with:

sdk.getState() for world state (position, inventory, nearby entities)
bot.chopTree(), bot.attackNpc(), bot.walkTo() for high-level actions
MCP integration so Claude Code can control bots directly

The interesting research angle is goal-directed program synthesis in a complex environment. The game world has hundreds of interacting systems (combat, skilling, quests, economy, pathfinding). An LLM agent needs to understand all of them to make useful decisions. The SDK abstracts the low-level protocol but the strategic decisions (what to do, where to go, how to handle failures) are up to the agent.

What Actually Works

The one-shot quest generation works for simple to medium-complexity quests. "Cook's Assistant" (fetch three items, bring them back) generates correctly on the first try about 80% of the time. "Desert Treasure" (multi-stage quest with combat, puzzles, and branching paths) needs manual fixes about half the time.

The failure modes are predictable:

ID mismatches: The model uses an NPC ID that doesn't exist, or uses the wrong ID for a specific NPC. This happens less with RSProx data in the prompt but still happens when the model "helpfully" fills in IDs it wasn't given.
State machine bugs: Complex quests have many states, and the model sometimes skips transitions or creates unreachable states. The wiki walkthrough helps, but translating "go to location X and talk to Y" into correct quest state checks requires understanding the spatial model.
Framework misuse: The model sometimes calls API methods that don't exist or passes wrong argument types. Having complete examples in the prompt reduces this, but novel interactions (custom cutscenes, special combat encounters) still need manual correction.

The win rate improves significantly when I include error messages from failed compilations in a follow-up prompt. The model is good at fixing its own mistakes when it can see the compiler output.

What I'd Do Differently

The prompt should include a type checker pass. Right now the model generates code, I try to compile it, and if it fails I feed errors back manually. An automated loop that generates, compiles, feeds errors back, and regenerates would push the success rate for complex quests much higher.

The wiki scraper is brittle. Wiki page formats change, and the scraper uses CSS selectors that break when the wiki template gets updated. A more resilient approach would use an LLM to extract structured quest data from the raw wiki HTML rather than relying on fixed selectors.

I should be using the AST-aware chunking from Polaris (CHEMMRAG) for the codebase examples. Right now I manually select which quest implementations to include in the prompt. Polaris's semantic search could automatically find the most relevant examples based on the target quest's characteristics (combat-heavy quest? Include an example with combat scripting. Dialogue-heavy? Include a dialogue-focused example).