Series · Amazon Bedrock for Production AI · Part 1 of 8 Foundations · Part 2: RAG with Bedrock Knowledge Bases →
What an AI agent actually is
An AI agent in 2026 is not a chatbot with extra steps. It is a closed-loop system that observes a state, reasons about what to do, decides on an action, executes it against a real system, observes the result, and either declares done or loops again. The "agent" word covers the whole loop — the model is one component of it, not the whole thing.
The architecture that matters is four layers:
- The model — the foundation model that produces the reasoning. Claude, Llama, Mistral, Titan, Cohere — Amazon Bedrock unifies access to them under a single API. The choice of model is a real engineering decision (cost, latency, context window, tool-use quality), but it is one decision among several.
- The tools — the actions the agent can take. Lambda functions, REST APIs, MCP servers, AWS API calls. The agent reasons in tokens but acts through tools.
- The memory — what the agent remembers across turns. Short-term (the current conversation), long-term (what it learned in previous sessions), semantic (the documents and facts the agent has been given access to). Memory is what separates a useful agent from one that hallucinates the same answer every Tuesday morning.
- The orchestration — the loop itself. How the agent decides whether to call a tool, call another model, return to the user, or stop. AWS's managed answer to this is Bedrock Agents and the newer AgentCore platform; the open-source answers (LangChain, LlamaIndex, Strands) are the subject of Part 3.
This series treats each of those layers — and the operational concerns wrapped around them (security, cost, observability, multi-step orchestration) — as its own piece. This first piece is the vocabulary.
What Amazon Bedrock is, narrowly
Amazon Bedrock is two things, usefully distinguished:
- A unified inference API over a catalogue of foundation models. You write code against the Bedrock
InvokeModelorConverseAPI; Bedrock routes the call to Claude, Llama, Mistral, Titan, Cohere, or any other catalogued model. No model-vendor SDKs, no separate billing per provider, no juggling API keys. One IAM role, one API surface, one bill. - A set of managed services that sit on top of that API: Bedrock Agents for tool-using agents, Knowledge Bases for retrieval-augmented generation, Guardrails for content safety, AgentCore for production-grade agent runtime.
The first part is the foundation. The second part is what most teams will actually consume.
Foundation models worth knowing in 2026
The Bedrock catalogue is the working list. The models that matter for production agent work, in the rough shape the current production landscape suggests:
- Claude (Anthropic) — the strongest tool-use and long-context model in the catalogue. Claude 4.x family. Default choice for production agents where reasoning quality matters more than per-token cost. Native support for "extended thinking" — a reasoning mode where the model can take more compute before responding.
- Llama 4 (Meta) — strong general-purpose model, attractive cost profile, available in multiple parameter sizes. Llama 4 Maverick and Scout are the production-relevant variants.
- Mistral Large 3 and Mixtral — strong European models, useful when data-residency or model-provenance arguments matter for regulated EU customers.
- Amazon Titan and Nova — AWS's own families. Titan Embeddings and the multimodal Titan models cover the embedding and image-to-text use cases.
- Cohere Command R+ — strong RAG-tuned model. Useful as the reasoning model in Knowledge-Base-heavy pipelines.
The right architectural posture is model-agnostic by default. Code your agent against the Bedrock Converse API, externalise the model ID as configuration, swap models per environment or per task. The model the agent uses today is not the model it should use in twelve months — pricing and capability move fast.
Bedrock Agents — the managed agent runtime
Bedrock Agents is AWS's managed implementation of the agent loop. You define:
- A foundation model — which model powers the reasoning
- A set of instructions — the system prompt, the agent's role, what it should and should not do
- Action groups — collections of tools the agent can call, defined by an OpenAPI schema or by Lambda function specifications
- Knowledge bases — vector stores the agent can query for retrieved context
- Guardrails — policies that filter input and output
Bedrock runs the loop. When you invoke the agent, Bedrock:
- Sends the user's query plus the agent's instructions to the chosen foundation model
- Parses the model's response — has it decided to call a tool, query a knowledge base, or return to the user?
- If the model wants to call a tool, Bedrock invokes the Lambda function in the action group, takes the response, and feeds it back to the model
- Repeats until the model decides it is done or until a configured turn limit is reached
- Returns the final response to the caller
The advantage is that the orchestration is handled. You do not write the agent loop yourself. The trade-off is reduced control: you cannot intervene mid-loop, the model selection is constrained to Bedrock-catalogued models, and the customisation surface is smaller than what you get with LangChain or Strands. Whether that trade-off is right depends on the use case — the open-source-vs-managed decision is the subject of Part 3.
Action groups, the tools the agent can call
An action group is a named collection of actions the agent can invoke. Each action is either:
- A Lambda function with an OpenAPI schema describing its inputs, outputs, and purpose, or
- A function declared directly in the agent's configuration, again with a schema
The OpenAPI schema is what the model reads to decide whether to call the tool. The schema's description fields matter enormously — the model decides which tool to call based on the description text. A clear description means correct tool selection; a vague description means the agent hallucinates which tool to call. This is the most important piece of prompt engineering in any agent deployment, and it lives in OpenAPI YAML rather than in a system prompt.
The Lambda function the action group points to does the actual work — it queries CloudWatch Logs, calls a payment API, updates a record in DynamoDB. The Lambda is just an action behind a schema. The agent does not know it is calling Lambda; it knows it is calling "the query_logs tool" or "the restart_service tool" with structured inputs.
Knowledge Bases — managed RAG
A knowledge base is a managed retrieval-augmented-generation pipeline. You point it at an S3 bucket of documents; Bedrock orchestrates the chunking, embedding, and indexing into a vector store of your choice (OpenSearch Serverless, pgvector on Aurora, Pinecone, MongoDB Atlas, Redis Enterprise Cloud). At query time, the agent or your application queries the knowledge base, gets the top-k most relevant chunks, and includes them in the prompt context.
Knowledge Bases are the right answer for most production RAG. They handle the embedding-pipeline plumbing that most teams should not be reinventing. They support hybrid search (lexical + vector), filtered retrieval, chunking strategies (fixed-size, semantic, hierarchical), and re-ranking. The depth of what is configurable per knowledge base is the subject of Part 2.
When you wire a knowledge base into an agent, the agent gets a built-in retrieval capability — when the model decides it needs additional context, it queries the knowledge base implicitly through the agent loop. No separate retrieval step you have to write.
Guardrails — content safety as configuration
Bedrock Guardrails is a separate service that filters input and output for content safety. You configure:
- Denied topics — categories the agent must refuse to discuss
- Content filters — categories like hate, violence, sexual content, with thresholds per category
- Sensitive information filters — PII detection and redaction (emails, phone numbers, credit card numbers, custom regex patterns)
- Word filters — blocked terms or patterns
- Contextual grounding — checks that the model's output is supported by the retrieved knowledge-base context
- Automated reasoning checks — formal-logic-based validation that the model's output is consistent with a declared policy
Guardrails apply both before the prompt reaches the model (filtering user input) and after the model responds (filtering output). For regulated workloads, Guardrails is not optional. The detail of how to design Guardrails policies is the subject of Part 6.
AgentCore — the production runtime layer
In 2025 AWS introduced Bedrock AgentCore, a higher-level platform layer that addresses what was missing from Bedrock Agents when you tried to take an agent to production scale. The five core capabilities:
- Runtime — a serverless agent runtime with longer execution windows than Lambda (multi-hour tasks become viable), session isolation, and stateful execution
- Memory — managed short-term and long-term memory for agents, with semantic search across past sessions
- Identity — built-in OAuth flows so the agent can act on behalf of the user against external services, with token management handled
- Tools — built-in tools: a browser the agent can drive (autonomous web actions), a code interpreter (Python execution sandbox), and a Model Context Protocol (MCP) gateway that exposes external tool stacks
- Observability — built-in tracing, metrics, and audit logging mapped to AWS X-Ray and CloudWatch
AgentCore is the layer to reach for when the agent has to do real work over time — research a customer's account history across multiple systems, drive a browser to complete a multi-step process, hold context across hour-long sessions, audit-log every action for compliance. Bedrock Agents alone is sufficient for shorter-running, single-task agents; AgentCore is sufficient for long-running, multi-task agents that need production-grade observability.
The MCP gateway in particular is interesting. Model Context Protocol is the open standard (originated by Anthropic, broadly adopted in 2024-2025) for exposing tools to LLMs in a structured, model-agnostic way. AgentCore's MCP gateway means an agent can use any MCP server in the ecosystem — internal tools the team builds, public MCP servers (filesystem, GitHub, databases), or third-party MCP servers from SaaS vendors — without rewriting the agent's tool integrations per vendor.
A minimal Bedrock Agent — concrete shape
A minimal production-grade Bedrock Agent has four configuration concerns:
agent:
name: customer-support-agent
foundationModel: anthropic.claude-sonnet-4-6-20251022
instruction: |
You are a customer support agent for [Company]. You have access to
tools that let you query a customer's account, recent transactions,
and the knowledge base of FAQs and policies. Always verify the
customer's identity before discussing account-specific information.
If a request involves a refund or account change, do not act —
return a structured handoff to a human agent.
actionGroups:
- name: account_query
lambdaArn: arn:aws:lambda:us-east-1:123:function:account-query
apiSchema: s3://bucket/account-query.openapi.yaml
- name: transaction_history
lambdaArn: arn:aws:lambda:us-east-1:123:function:transaction-history
apiSchema: s3://bucket/transaction-history.openapi.yaml
- name: human_handoff
lambdaArn: arn:aws:lambda:us-east-1:123:function:human-handoff
apiSchema: s3://bucket/human-handoff.openapi.yaml
knowledgeBases:
- knowledgeBaseId: KB-faqs-policies-001
description: Company FAQs and policy documents
guardrailConfiguration:
guardrailId: GR-customer-support-001
guardrailVersion: "DRAFT"
memoryConfiguration:
enabled: true
sessionSummaryConfiguration:
maxRecentSessions: 5
That is roughly the smallest defensible production configuration. The pieces that matter:
- The model is named with a specific version (not "the latest Claude") so production behaviour is reproducible
- Each action group has its own Lambda function and OpenAPI schema (no one big tool — each action is its own narrow tool)
- The knowledge base is referenced by ID, decoupled from the agent definition (so the KB can be updated independently)
- Guardrails are referenced, not inlined
- Memory is enabled with a finite session-history window
The agent definition itself is small. The work is in the OpenAPI schemas, the Lambda functions behind them, the knowledge-base content, and the guardrail configuration.
The deployment reference architecture
Configuration is one half. The other half is where the agent actually sits inside the AWS account. The reference deployment shape we use:
Every line in that diagram is deliberate. The five most important are:
- Identity at the perimeter, not just at the API. IAM Identity Center brokers human and CI/CD access; workload identities are IRSA or IAM Roles, never long-lived access keys. The agent's IAM role permissions are scoped per action group, not granted at the agent level.
- No public IPs on workloads. Action-group Lambdas, Knowledge Base components, and the agent itself live in private subnets. The traffic to AWS APIs (bedrock-runtime, S3, KMS, Secrets) goes through VPC interface endpoints, not the public internet.
- Guardrails wrap every invocation. Input and output. There is no path where a user prompt reaches the model without passing the guardrail, and no path where a model response reaches the user without passing it back through.
- Model invocation logs are first-class. Every call to the model is logged to a dedicated S3 bucket with KMS encryption and Object Lock. This is what makes audit, debugging, and cost attribution possible — and what Part 6 (Security & Observability) builds on.
- Cost tags everywhere. Each component (agent, Lambda, KB, S3 bucket) is tagged with workload, team, and environment so the dashboards in Part 7 can attribute spend by feature, not just by service.
Seven architectural decisions and the reasoning
Every Bedrock agent deployment makes the same seven decisions, knowingly or by drift. The defensible default for each — and the reason — sits in one table for ease of reference:
| # | Decision | Default we recommend | Reasoning | When to revisit |
|---|---|---|---|---|
| 1 | Orchestration substrate — managed loop or open-source | Bedrock Agents for single-task, short-running; AgentCore + Strands for long-running production; LangChain only when cross-cloud or model-vendor portability is a hard requirement | Bedrock Agents has the lowest start cost; AgentCore + Strands has the strongest AWS-native production story; LangChain has the broadest ecosystem but the most plumbing | When the agent has to run for hours, hold state across sessions, or drive a browser → graduate to AgentCore. When the same agent has to run on Azure or GCP → consider LangChain. |
| 2 | Foundation model — which model powers reasoning | Claude 4.x as default for production agents; Llama 4 when cost matters more than reasoning depth; Mistral when EU provenance argument is load-bearing | Claude has the strongest tool-use and long-context behaviour in the catalogue; the other choices are cost or compliance moves | Quarterly. Bedrock's catalogue moves fast; what was state-of-the-art in February may be eclipsed in August. Re-evaluate with the same eval harness against the new model. |
| 3 | Model versioning — pinned or rolling | Always pinned to a specific dated model ID — never "the latest Claude" | A rolling model ID changes production behaviour on AWS's schedule, not yours. Pinned IDs make rollback meaningful and audit defensible. | When a pinned model is deprecated. Bedrock gives 90–120 days notice; use that window to re-evaluate against the successor before rolling forward. |
| 4 | Tool design — broad tools or narrow tools | Many narrow tools, each with its own OpenAPI schema and Lambda, not one tool with branching parameters | The model decides which tool to call based on the schema's description field. Narrow tools with sharp descriptions get correct routing; broad tools with branched parameters get hallucinated routing. |
Never. This pattern holds across every agent we ship. |
| 5 | Knowledge base coupling — inlined or referenced | Referenced by ID, with the KB defined and versioned independently of the agent | Documents change weekly; agent configuration changes monthly. Coupling them means every KB refresh requires an agent redeploy. Decoupling lets each evolve at its own cadence. | When the KB is single-purpose and tightly bound to a single agent (rare). |
| 6 | Guardrails posture — opt-in or default-on | Default-on, with the guardrail referenced (not inlined) so it can be tuned without redeploying the agent | Inline guardrails mean every policy update redeploys the agent and breaks the audit trail. Referenced guardrails have their own versioned lifecycle. | Never reduce below default-on. Tune the policies; don't disable. |
| 7 | Memory configuration — enabled or off | Enabled with a finite session window and explicit cross-session policy | Memory is the biggest PII leak surface in agent deployments. An unbounded memory accumulates personal data across sessions without explicit consent. A finite window plus explicit retention policy is the defensible default. | When the use case explicitly requires unlimited cross-session recall (e.g., a personal assistant). Even then, with explicit retention and erasure controls. |
Each of these decisions has a piece in this series that goes deeper. The table is the map; the parts that follow are the territory.
What this series will cover
This is a foundations piece. The seven subsequent parts go deeper on what production agent deployment actually requires:
- Part 2 — RAG with Bedrock Knowledge Bases: chunking strategies, embedding models, vector store selection (OpenSearch Serverless vs pgvector vs Pinecone), hybrid search, re-ranking, evaluation.
- Part 3 — Open-source Agent Frameworks on Bedrock: when LangChain, LlamaIndex, or Strands beat the managed Bedrock Agents path. Deployment patterns for self-managed agent loops on EC2 or EKS.
- Part 4 — Model Customization on Bedrock: continued pre-training, fine-tuning, distillation, custom model imports, evaluation. When customization beats prompt engineering.
- Part 5 — Multi-step AI Workflows with Step Functions and Bedrock: chaining model calls, integrating Bedrock with the 9,000+ AWS APIs Step Functions covers, the choice between Step Functions workflows and Bedrock Agents.
- Part 6 — Security Guardrails and Observability for Bedrock: Guardrails policy design, IAM patterns for least-privilege Bedrock access, VPC endpoints / PrivateLink, CloudTrail audit, model-invocation logging.
- Part 7 — Cost Optimization on Bedrock: token economics per model, cost allocation tags, prompt and response caching, batch inference, model-tier routing, provisioned throughput vs on-demand.
- Part 8 — Case Study: An SRE AI Agent on Bedrock for CloudWatch Log Triage: the reference implementation that ties everything together — an agent that observes CloudWatch logs, diagnoses incidents, and executes remediation actions through Lambda and Step Functions, under Guardrails, with cost and observability instrumented from the start.
What is hard about agents in production
The pieces that follow address specific surfaces. The honest summary of what makes agent deployment hard, in the order it usually bites:
- The agent does the wrong thing fluently. A misconfigured agent looks like it is working — it returns plausible text, calls tools, completes turns. Whether it is actually doing the right thing is something only evaluation can tell you. Build the evaluation harness before the agent goes near production traffic.
- The cost surface is opaque. Token-by-token pricing across calls, retrieval rounds, retries, and tool invocations adds up quickly. Without instrumentation you do not see the bill coming. Cost tracking is a Phase-1 concern, not a Phase-3 concern.
- The action surface is dangerous. An agent that can call "restart service" can call "restart the wrong service" or "restart the right service at the wrong time." Approval gates, dry-run modes, and human-in-the-loop for destructive actions are not optional. The case study in Part 8 unpacks this.
- The memory becomes a leak. Agents accumulate context. Without explicit policies on what is stored, for how long, and across which sessions, you have built a PII-leaking system without realising it. Guardrails address part of this; memory configuration addresses the rest.
- The model changes underneath you. Bedrock's model catalogue evolves; what worked in February breaks in August because a model version was deprecated. Version pinning, regression testing, and rolling-evaluation pipelines are operational requirements, not nice-to-haves.
Each of those is a piece in this series. The agent that handles them is the agent that ships.
What to do next
If you are reading this as someone who is about to build their first Bedrock-backed agent, three concrete next steps:
- Stand up a sandbox account. Enable Bedrock model access in the AWS console (this is a model-by-model approval flow; it takes a day or two for some models). Pin one specific Claude or Llama version. Provision a Bedrock-enabled IAM role. You are now ready to call the Converse API.
- Build a no-tool agent first. Define a Bedrock Agent with a system prompt and a knowledge base, but no action groups. Test that the model answers questions from the KB correctly before giving it the ability to take actions. Most agent failures come from rushing to actions before the reasoning is reliable.
- Add one tool. The first tool is a read-only tool — query a database, query CloudWatch, fetch a record. Validate that the agent uses the tool correctly and integrates the response into its reasoning. Only then add a tool that can mutate state.
Part 2 picks up at the retrieval layer — what makes Knowledge Bases work in production, and what to choose when the managed path is not the right fit.
This is Part 1 of an eight-part series on Amazon Bedrock for production AI. The series accompanies the Hardening-before-AWS series and the AWS-for-banks architecture series.
