AWS AgentCore Part 1: Building a Multi-Agent Error Debugger

AWS AgentCore Part 1: Building a Multi-Agent Error Debugger

This is part one of a two part series on multi-agent systems with AgentCore.

AWS dropped AgentCore is the most exciting AWS AI tool to date. Not because it’s jam packed with features, but because it’s the first time we can build a production ready multi-agent application without having jerry-rig 10 different services together, collectively costing an arm, leg, and first born child.

So i’ve been keen to build something to demonstrate it’s true potential. Not a chatbot. Not a summariser. Something that actually solves a problem I’ve got.

Stack traces are the problem. I spend heaps of time copying errors into Google, scrolling through Stack Overflow, then forgetting what the fix was three weeks later. What if an agent could sort that for me?

Today I’m building a multi-agent error debugger. Paste in a stack trace, get back the root cause and a working fix. The system uses a Supervisor that orchestrates 4 specialist agents, each doing one thing well.

This isn’t a silver bullet though. Multi-agent systems add complexity. You need to understand when they’re the right tool and when they’re overkill.

All source code is on my GitHub.

Table of contents

Why Multiple Agents?

Single-prompt LLM calls hit a ceiling fast. Ask an LLM without thinking to parse, analyse and fix an error in one-shot and you’ll get average results across the board. The model tries to do everything and does nothing particularly well.

I tested this. Threw a gnarly TypeScript error at Claude with a single prompt asking for “complete analysis and fix”. The response was vague. Generic suggestions like “check your types” and “ensure the value isn’t undefined”.

Cheers Claude, real helpful.

But modern AI systems don’t do that? They think, and will iterate and loop back on themselves until they solve every ask you have. This is called a multi-agent system.

Multi-agent system at work (cursor)

Multi-agent system at work (cursor)

Multi-agent system at work (cursor)

Multi-agent systems fix this by splitting the work. Each agent focuses on one job and does it properly.

AgentJob
ParserExtract language, stack frames and error type
SecurityScan for secrets and PII
Root CauseFigure out why the error happened
FixGenerate code that actually solves it

The Supervisor orchestrates these four. It decides what to call, when to call it and whether the results are good enough to return.

This mirrors how senior devs actually debug. You don’t just look at an error and immediately start coding a fix. You gather context first. What language? What framework? Where in the stack did it fail? Then you form a hypothesis. Test it. Refine if wrong. The multi-agent approach lets us encode this workflow into the system.

Architecture

Here’s what we’re building:

Architecture Diagram

Architecture Diagram

Architecture Diagram

The request flow goes:

  1. Frontend sends error text to a Lambda Function URL
  2. Lambda proxy invokes AgentCore Runtime
  3. Runtime executes the Supervisor
  4. Supervisor calls Lambda tools via Gateway and LLM agents directly
  5. Results stream back to the frontend

Why Lambda Function URL instead of API Gateway? Timeout. API Gateway caps at 29 seconds. Error analysis can take two minutes easy. Lambda Function URLs give us 15 minutes. Sweet as.

Why a Lambda proxy at all? AWS APIs don’t support CORS. Browsers can’t call AgentCore directly. The proxy handles that for us.

AgentCore Runtime

The Runtime is where your agent code runs. Think of it as Lambda for agents. You give it a Docker image, it runs your code when invoked. But unlike Lambda, it’s built for long-running agent workflows.

AWS provides the Strands SDK for building agents. Here’s the bare minimum:

from strands import Agent

agent = Agent(
    system_prompt="You are a helpful assistant.",
    tools=[]
)

response = agent("What is 2 + 2?")

That’s it. The SDK handles the conversation loop, tool execution and streaming. You focus on the prompt and tools.

Deploying the Runtime needs a Docker image and some Terraform:

resource "aws_bedrockagentcore_agent_runtime" "main" {
  agent_runtime_name = "${var.prefix}-runtime"
  description        = "Error Debugger Supervisor Runtime"
  role_arn           = aws_iam_role.agentcore_runtime.arn
  
  network_configuration {
    network_mode = "PUBLIC"
  }
}

Cold starts run about 10-15 seconds. Warm invocations are sub-second. That’s actually pretty reasonable for a system doing complex analysis. Not fast enough for a chat interface, but choice for debugging where you’re happy to wait a minute for a good answer.

The Supervisor Pattern

The Supervisor is the brain. It doesn’t do the work. It decides what work to do.

Most agent tutorials show linear pipelines. Call A, then B, then C, done. Real debugging doesn’t work that way. Sometimes you need more context. Sometimes your first guess is wrong.

I use a THINK → ACT → OBSERVE → REFLECT → DECIDE loop:

1. THINK: What do I know? What do I need?
2. ACT: Call a tool
3. OBSERVE: What came back? Useful?
4. REFLECT: Confident enough?
5. DECIDE: If ≥80% confident → output. Otherwise → loop back.

This is baked into the system prompt. The LLM learns to reason iteratively, not just execute a checklist.

Why 80%? Lower and you get rubbish. Higher and the system loops forever second-guessing itself. I tuned this through testing. 80% hits the sweet spot where results are good but the system doesn’t overthink.

AWS AgentCore Runtime

AWS AgentCore Runtime

AWS AgentCore Runtime

Gateway vs Runtime

This is the key architectural decision. AgentCore gives you two places for logic:

Gateway → Lambda functions exposed as MCP tools

Runtime → Code in your Docker container, including LLM calls

When to use Gateway (Lambda):

  • Deterministic stuff (parsing, validation and API calls)
  • Things that don’t need LLM reasoning
  • Operations that should scale independently

When to use Runtime (LLM):

  • Reasoning tasks
  • Analysis that benefits from context
  • When the “tool” is really another agent

Here’s the breakdown for my debugger:

AgentWhereWhy
ParserLambda via GatewayRegex + Comprehend. Deterministic.
SecurityLambda via GatewayPattern matching. Deterministic.
Root CauseRuntimeNeeds Claude reasoning
FixRuntimeNeeds Claude code generation

Cost matters here too. Lambda invocations are cheap. LLM calls are expensive. Moving the deterministic work to Lambda keeps costs down while reserving the expensive LLM capacity for tasks that actually need reasoning.

Building the Lambda Tools

Parser Tool

The Parser extracts structure from raw stack traces. Language, stack frames and error type. It uses regex patterns and AWS Comprehend for language detection.

def lambda_handler(event, context):
    error_text = event.get('error_text', '')
    
    # Detect programming language via patterns
    prog_language = detect_programming_language(error_text)
    
    # Extract stack frames
    stack_frames = extract_stack_frames(error_text, prog_language)
    
    return {
        'language': prog_language,
        'stack_frames': stack_frames,
        'core_message': extract_core_message(error_text)
    }

No LLM needed. Regex handles it reliably and cheaply.

The Gateway exposes this Lambda as a tool:

resource "aws_bedrockagentcore_gateway_target" "parser" {
  gateway_identifier = aws_bedrockagentcore_gateway.main.gateway_id
  name               = "parser_agent_tool"
  
  target_configuration {
    lambda_target_configuration {
      lambda_arn = aws_lambda_function.parser.arn
    }
  }
}

AgentCore Gateway

AgentCore Gateway

AgentCore Gateway

Security Tool

Before storing or processing anything, the Security tool scans for leaked credentials and PII:

SECRET_PATTERNS = {
    'aws_access_key': r'AKIA[0-9A-Z]{16}',
    'github_token': r'gh[ps]_[A-Za-z0-9]{36}',
    'api_key': r'(?i)(api[_-]?key)["\s]*[:=]["\s]*[A-Za-z0-9\-_]{20,}',
}

def lambda_handler(event, context):
    error_text = event.get('error_text', '')
    
    secrets_found = []
    for secret_type, pattern in SECRET_PATTERNS.items():
        if re.findall(pattern, error_text):
            secrets_found.append(secret_type)
    
    # Also use Comprehend for PII
    pii_response = comprehend.detect_pii_entities(
        Text=error_text[:5000],
        LanguageCode='en'
    )
    
    return {
        'risk_level': calculate_risk(secrets_found, pii_response),
        'secrets_found': secrets_found,
        'safe_to_store': len(secrets_found) == 0
    }

Again, deterministic. Pattern matching and Comprehend. It’s fast, it’s cheap and it works.

You might wonder why not just let the LLM detect secrets. Two reasons. First, LLMs hallucinate. Pattern matching doesn’t. Second, this runs on every request BEFORE we even think about storing anything. Speed matters here.

Building the LLM Agents

Now for the interesting bit.

Root Cause Agent

The Root Cause agent takes parsed info and determines why the error happened. This needs LLM reasoning. No way around it.

ROOTCAUSE_PROMPT = """
You are an expert debugger. Given error information, determine:
1. The immediate cause (what triggered the error)
2. The root cause (why that condition existed)
3. Your confidence level (0-100%)

Be specific. Don't say "check your code" - explain WHAT to check and WHY.
"""

@tool
def rootcause_agent_tool(error_text: str, parsed_info: dict) -> dict:
    agent = Agent(system_prompt=ROOTCAUSE_PROMPT, tools=[])
    
    prompt = f"""
    Analyse this {parsed_info.get('language')} error:
    
    Error: {parsed_info.get('core_message')}
    Stack: {parsed_info.get('stack_frames')}
    
    Full text: {error_text[:2000]}
    """
    
    return agent(prompt)

This runs inside the Runtime. The agent considers context and patterns and produces explanations a regex never could.

Fix Agent

The Fix agent generates actual code solutions based on the root cause:

FIX_PROMPT = """
You are an expert software engineer. Generate a specific, actionable fix.

Requirements:
1. Provide BEFORE and AFTER code
2. Explain WHY the fix works
3. Match the detected language
4. Address the identified root cause

Don't give generic advice. Give working code.
"""

@tool
def fix_agent_tool(error_text: str, root_cause: dict, language: str) -> dict:
    agent = Agent(system_prompt=FIX_PROMPT, tools=[])
    
    prompt = f"""
    Fix this {language} error:
    
    Root Cause: {root_cause.get('root_cause')}
    Error: {error_text[:1500]}
    """
    
    return agent(prompt)

The prompt engineering matters heaps here. “Don’t give generic advice” stops Claude from falling back to useless suggestions. “Provide BEFORE and AFTER code” forces concrete output.

Wiring It Together

The Supervisor ties everything together. It’s an Agent with access to all 4 tools:

supervisor = Agent(
    system_prompt=SUPERVISOR_PROMPT,
    tools=[
        parser_agent_tool,
        security_agent_tool,
        rootcause_agent_tool,
        fix_agent_tool,
    ],
)

The SUPERVISOR_PROMPT teaches the iterative reasoning loop. Parse first. Check security. Analyse root cause. Generate fix. Only output when confidence hits 80%.

Error Debugger App - Prompt

Error Debugger App - Prompt

Error Debugger App - Prompt

Frontend and API Proxy

The frontend is dead simple. A textarea for the error, a button to analyse and sections to display results.

The Lambda proxy handles the AgentCore invocation:

def handler(event, context):
    error_text = request.get('error_text', '')
    
    response = agentcore_client.invoke_agent_runtime(
        agentRuntimeArn=AGENT_RUNTIME_ARN,
        payload=json.dumps({'prompt': error_text}).encode('utf-8'),
        runtimeSessionId=f"session-{uuid.uuid4().hex}",
    )
    
    # Process streaming response, extract structured data
    return process_response(response)

The proxy collects streaming events, extracts the structured results from each agent and returns clean JSON to the frontend. Sorted.

Error Debugger App - Results

Error Debugger App - Results

Error Debugger App - Results

Deployment

Everything deploys via GitHub Actions with Terraform:

- name: Terraform Apply
  run: |
    cd terraform/agentcore
    terraform init
    terraform apply -auto-approve \
      -var="container_tag=${{ github.sha }}"

The container_tag ensures each deploy creates a new Runtime version. AgentCore won’t pick up changes to a latest tag automatically. Learned that one the hard way.

CICD: GitHub Actions

CICD: GitHub Actions

CICD: GitHub Actions

Conclusion

AgentCore finally gives us a production-ready way to build multi-agent systems on AWS without duct-taping half a dozen services together. Runtime runs your agents. Gateway routes your tools. You write the logic.

The biggest lesson from this build? Keep each agent focused on one job and let the Supervisor worry about coordination. Shove the deterministic stuff into Lambda where it’s cheap and reliable, save the LLM calls for when you actually need reasoning. And don’t build a pipeline — build a loop. Real debugging doesn’t go in a straight line, and neither should your agents.

Oh, and if your frontend needs to talk to AgentCore, you’re going to need a proxy. Lambda Function URLs are your mate here — no 29 second timeout, no CORS headaches.

In Part 2, we’ll add Memory so the system learns from past errors, pull in external context from GitHub and Stack Overflow and track stats. The goal is a debugger that gets smarter the more you use it.

Full source code: github.com/JeremyRitchie/agentcore-error-debugger

comments powered by Disqus