Security Guardrails and Observability for Amazon Bedrock

Series · Amazon Bedrock for Production AI · Part 6 of 8 ← Part 5: Multi-step Workflows with Step Functions · Security Guardrails and Observability · Part 7: Cost Optimization on Bedrock →

Why these two ship together

Security and observability are usually written as separate concerns. For production Bedrock workloads, they aren't — they're two ends of the same chain. Guardrails are what prevents the bad output from leaving the model; observability is what tells you when Guardrails caught something, what the agent did up to the catch, and whether the configuration is drifting toward future incidents. A team that ships one without the other is shipping half the control.

Across Parts 1-5 we've assumed both exist. This piece documents what they actually look like in production: the Guardrails policy structure that meets the NDPA, NIS2, and CBN CSAT bars; the IAM and network controls that constrain Bedrock access; and the observability stack that makes the system auditable in seconds rather than weeks.

The security and observability layer, in one picture

The architecture has five layers. Each is its own configuration and each contributes to the audit-evidence story.

Layer 1 — Identity and network perimeter

Identity is the perimeter, not the network. The IAM patterns:

No long-lived static credentials anywhere in the Bedrock call path. Human callers authenticate via IAM Identity Center (federated from the bank or enterprise IdP). Workloads use IRSA on EKS, IAM Roles on EC2, IAM Execution Roles on Lambda. CI/CD uses OIDC federation. There is no BEDROCK_API_KEY environment variable. Anywhere.
Per-agent IAM role with minimal Bedrock permissions. Each agent has its own role with bedrock:InvokeModel scoped to specific model ARNs, bedrock:InvokeAgent scoped to that agent's ID, and Knowledge Base access scoped by ID. No bedrock:* permissions on production roles, ever.
Action group Lambdas under their own scoped roles. Each action group's Lambda runs with permissions to do exactly that action's work — read from a specific DynamoDB table, write to a specific SQS queue. Not "any DynamoDB read."
Resource-based policies on shared resources. Knowledge Bases, custom models, guardrails — each can restrict invocation to specific principals or accounts. Belt and braces against IAM mistakes.

The network controls:

VPC endpoints for every Bedrock surface. Interface endpoints for bedrock-runtime (model invocation), bedrock-agent-runtime (agent invocation), bedrock (control plane), bedrock-agent (agent management). All traffic stays on AWS's network; no public internet.
Private subnets only for workloads. Action group Lambdas, EKS pods, ECS tasks running agents — no public IPs, no internet gateway for the workload subnets.
Security groups scoped tight. Egress only to the specific VPC endpoints needed. The "default allow all egress" pattern is how data exfiltration happens.
KMS CMK for every encrypted resource. S3 buckets for model invocation logs, Knowledge Base source documents, custom model artefacts, guardrail configurations — all encrypted with customer-managed keys, with key policies that restrict decryption to specific roles.

For regulated workloads — financial services, healthcare, NIS2-scoped operators — these aren't suggestions. They're the baseline a competent examiner will check on the first day.

Layer 2 — Bedrock Guardrails policy design

Guardrails apply at two points in every invocation: input filtering (before the prompt reaches the model) and output filtering (before the response reaches the user). The policy categories matter:

Denied topics

Explicit topics the agent must refuse to discuss, named in plain language with examples.

{
  "denyTopics": [
    {
      "name": "investment_advice",
      "definition": "Specific recommendations to buy, sell, or hold particular financial instruments.",
      "examples": [
        "Should I buy NVDA?",
        "What's a good ETF to invest in?",
        "Which crypto should I put my pension into?"
      ]
    },
    {
      "name": "medical_diagnosis",
      "definition": "Diagnosing medical conditions or recommending treatments.",
      "examples": [
        "What's wrong with me if I have these symptoms?",
        "What medication should I take?"
      ]
    }
  ]
}

The agent sees an attempt to discuss a denied topic and either refuses politely (configurable response) or routes to a human. For regulated industries — banking, healthcare, legal — denied topics are often the regulatory floor: "this agent does not give investment advice / medical diagnosis / legal opinions" is a defensible posture.

Content filters

Categorical filters for hate, violence, sexual content, misconduct, insults — each with a threshold (NONE / LOW / MEDIUM / HIGH). HIGH thresholds block aggressively; NONE disables the category.

For most enterprise workloads, MEDIUM on hate / violence / sexual / misconduct and HIGH on prompt injection (described below) is the working baseline. Tune from there based on the false-positive rate the use case can tolerate.

Sensitive information filters

PII detection and redaction, with per-type configuration. Detected types include name, email, phone, SSN, credit card, IP address, address, age, US passport, IBAN, and many regional identifiers. For each type, the action is BLOCK (refuse the request), ANONYMIZE (redact the PII before processing), or OBSERVE (log but don't act).

The defensible defaults for a Nigerian banking agent:

PII type	Input action	Output action
Name	OBSERVE	OBSERVE
Email	OBSERVE	BLOCK (prevent leak)
Phone number	OBSERVE	BLOCK
NDPA-class personal identifiers (NIN, BVN)	BLOCK (regex via custom pattern)	BLOCK
Credit card	BLOCK	BLOCK
Bank account	BLOCK	BLOCK

Custom regex patterns extend the default detectors. For Nigerian-context workloads, BVN (11-digit pattern) and NIN (11-digit pattern with checksum) deserve their own custom filters because the default detector list is US/EU-centric.

Word filters

Blocked terms or patterns — useful for brand-specific blocking (competitor names, banned phrases) or for preventing the agent from saying specific things internal policy forbids.

Contextual grounding

For RAG workloads, contextual grounding checks that the model's output is actually supported by the retrieved context. The check operates on two dimensions: grounding (does the output follow from the retrieved chunks?) and relevance (is the output relevant to the user's question?). Configurable thresholds; below the threshold, the response is blocked or flagged.

For the RAG pipelines from Part 2, contextual grounding is the layer that catches the hallucination class of failures. An answer with no grounding score is, structurally, an unsupported answer.

Automated reasoning checks

The newest Guardrail category, introduced in 2024-25. Formal-logic-based validation that the model's output is consistent with a declared policy expressed in a structured logic language. Useful for high-stakes workloads where the agent's output must satisfy specific rules (contract terms, compliance constraints, mathematical correctness in financial calculations).

Operationally complex to configure; high value where it fits. The right move is to start without automated reasoning and add it only when the workload demands it — typically legal-document review, financial calculation agents, regulatory compliance assistants.

Prompt attack / jailbreak filter

Built-in filter for known jailbreak patterns (DAN-style attacks, instruction injection, role-play bypass attempts). Set to HIGH for production by default; trust the false-positive rate is low enough that the rare blocked legitimate query is acceptable trade-off for the entire jailbreak attack surface being covered.

How to apply Guardrails — referenced, not inlined

Per the architectural decision from Part 1, Guardrails should be referenced by ID, not inlined per agent. This means:

The guardrail has its own ARN and version history
Multiple agents can share the same guardrail
Policy updates propagate immediately to all consuming agents without redeployment
Version pinning lets you roll back a bad policy change

# Agent configuration referencing a Guardrail by ID
agent_config = {
    "agentName": "customer-support-agent",
    "foundationModel": "anthropic.claude-sonnet-4-6-20251022",
    "guardrailConfiguration": {
        "guardrailIdentifier": "arn:aws:bedrock:us-east-1:123:guardrail/GR-001",
        "guardrailVersion": "DRAFT",  # or specific version number
    },
    # ...
}

Layer 3 — CloudTrail data events for model invocations

Standard CloudTrail captures management-plane events (who created the agent, who updated the guardrail) but does not capture data-plane events (each InvokeModel call) unless explicitly enabled. For Bedrock workloads, data events are what makes invocation-level audit possible.

Enable CloudTrail data events for Bedrock:

resource "aws_cloudtrail" "bedrock_data_events" {
  name           = "bedrock-data-events"
  s3_bucket_name = aws_s3_bucket.audit_bucket.id

  advanced_event_selector {
    name = "Bedrock invocations"

    field_selector {
      field  = "eventCategory"
      equals = ["Data"]
    }

    field_selector {
      field  = "resources.type"
      equals = ["AWS::Bedrock::AgentAlias", "AWS::Bedrock::Model"]
    }
  }
}

This produces one CloudTrail record per InvokeModel and InvokeAgent call, with the caller identity, the model ID, the request and response sizes, and the IAM role used. The record does not include the prompt or response contents — those go to model invocation logs (next layer). The trail records that an invocation happened, by whom, with what scope.

CloudTrail data events are how you answer "which user invoked which model, when, under what role" for a compliance audit. They are how you detect "someone is invoking an expensive model in a region they're not supposed to" for cost-leak detection.

Layer 4 — Model invocation logging

Bedrock model invocation logging captures the actual request/response contents — the prompt sent to the model, the completion returned, the embeddings produced, the guardrail trace. This is the layer that makes debugging possible after the fact and that auditors actually want to see for high-stakes workloads.

Configure once per region:

resource "aws_bedrock_model_invocation_logging_configuration" "main" {
  logging_config {
    s3_config {
      bucket_name = aws_s3_bucket.bedrock_invocation_logs.id
      key_prefix  = "invocation-logs/"
    }

    cloudwatch_config {
      log_group_name = "/aws/bedrock/invocations"
    }

    embedding_data_delivery_enabled = false  # large; enable only if needed
    image_data_delivery_enabled     = true
    text_data_delivery_enabled      = true
    video_data_delivery_enabled     = false
  }
}

Two operational rules:

The invocation log bucket is sensitive. It contains prompts (which often include PII) and responses (which can leak training data, internal documents, or PII regenerated from training). KMS CMK encryption, Object Lock for retention, strict bucket policy. Access only to a small set of audit and IR roles.
Retention policy aligned with regulatory requirement. NDPA / GDPR / NIS2 have specific retention requirements for processing records; CBN CSAT has its own. Object Lock with compliance mode locks the retention duration so even root cannot delete.

The invocation log is what lets you answer "what did the agent actually say to the user about their balance" months later. Without it, the agent is a black box even to the team that built it.

Layer 5 — CloudWatch metrics and X-Ray traces

The metrics that matter to instrument:

Per-agent invocation count — BedrockAgentInvocations with AgentId and AgentAlias dimensions. Alarms on unusual spikes.
Per-model latency — ModelInvocationLatency per model ID. Slow Claude calls indicate either model-side issues or large context windows; tune accordingly.
Per-model error rate — ModelInvocationErrors per model ID, broken out by error type. Throttling vs validation vs context-too-long all have different responses.
Guardrail trip rate — GuardrailIntervention count by guardrail and category. A spike here is either an attack campaign or a content-filter tuning problem.
Token consumption — InputTokens and OutputTokens per agent. Direct input to the cost dashboard from Part 7.
Tool call distribution — per action group, call count and error rate. Helps identify under-used or over-used tools that need design changes.

X-Ray traces tie the user request to every downstream call. Enable X-Ray on the API Gateway / Lambda / agent / Step Functions surfaces; the trace shows the request flow end-to-end. For multi-step workflows from Part 5, X-Ray shows which workflow steps dominated latency or cost — actionable signal for tuning.

The observability layer for AgentCore-hosted agents has additional first-class instrumentation: per-session traces, per-tool execution timings, OpenTelemetry-compatible exports. For long-running agents, AgentCore's observability is materially better than what bolt-on instrumentation can give you on Lambda or EKS.

The regulatory mapping

The security and observability configuration above maps cleanly to the obligations the operator faces:

Obligation	Where the config delivers
NDPA 2023 — Security of processing (Section 39)	KMS CMK encryption, IAM least-privilege, VPC endpoints, model invocation logs as the processing record
NDPA — Breach response (Section 40)	CloudTrail data events + invocation logs answer "what data was processed, by which model, when" within hours of incident notification
NIS2 Article 21 — Incident handling	X-Ray traces + CloudWatch metrics + invocation logs provide the incident-investigation evidence base
NIS2 Article 21 — Supply-chain security	Custom Model Import provenance, Knowledge Base source attribution, action-group Lambda dependency tracking
CBN CSAT — Application security	Guardrails as the input/output control; CloudTrail data events as the access audit
CBN CSAT — Cryptography	KMS CMK on every encrypted resource; key rotation per regulatory cadence
NCC Cyber Resilience Framework — Continuous monitoring	CloudWatch metrics + alarms on Guardrail trips, error rates, anomalous invocations
Pre-IPO disclosure (SEC cybersecurity governance)	Quarterly review of Guardrail interventions, anomalous invocations, and material incident assessment — invocation logs provide the evidence

For each, the configuration produces an answer in minutes, not days, when an examiner asks.

Common pitfalls

Five things that show up in production audits:

Guardrails configured but not actually invoked. The agent has a guardrailIdentifier field set but the version is "DRAFT" and the draft is empty. Always check the production version is the configured one.
Model invocation logging disabled. A team turns it off "for cost reasons" then has no answer when a regulator asks what the agent said. Cost the storage; don't disable.
CloudTrail data events not enabled. Standard CloudTrail captures only management events. The data events configuration above is the explicit opt-in for invocation auditing.
Wildcards in IAM. bedrock:* on a production agent role is a finding. Scope to specific actions and specific model / agent ARNs.
Audit logs in the same account as the workload. A compromised account can delete its own logs. Stream CloudTrail and invocation logs to a separate Security OU account where the workload account has no delete permission.

The Claude-first / multi-model rule applies here too

Guardrails wrap every Bedrock invocation regardless of model tier. The Haiku router, the Sonnet reasoning, the Opus synthesis, the custom-tuned Llama for narrow extraction — each invocation passes through the same Guardrails policy. The observability is unified: invocation logs from all models land in the same S3 bucket, CloudWatch metrics tag every invocation with the model ID, X-Ray traces span the full multi-model topology.

For the cost dashboards in Part 7, this unification is what makes per-model spend attribution possible. For the case study in Part 8, it's what makes the SRE agent's actions defensible — every model call, every tool invocation, every Guardrails intervention is in the audit trail before the agent's decision becomes an action.

What's next

Part 7 takes the observability data this piece installs and turns it into the cost discipline that makes production AI economically defensible: per-tier spend attribution, the cascade routing pattern in depth, prompt and response caching, batch inference, provisioned throughput vs on-demand decisions.

The full series:

Part 1 — Foundations: Building AI Agents on Amazon Bedrock
Part 2 — RAG with Bedrock Knowledge Bases
Part 3 — Open-source Agent Frameworks on Bedrock
Part 4 — Model Customization on Amazon Bedrock
Part 5 — Multi-step AI Workflows with Step Functions and Bedrock
Part 6 — Security Guardrails and Observability for Bedrock (this piece)
Part 7 — Cost Optimization on Bedrock (deepest multi-model routing)
Part 8 — Case Study: An SRE AI Agent on Bedrock for CloudWatch Log Triage

The security and observability substrate from this piece is assumed by every subsequent and prior piece in the series. The case study in Part 8 demonstrates the full stack in operation against a real workload.