The Blueprint for Air-Gapped LLM Deployments on AWS Bedrock

An operator-grade pattern from the CreativeMinds Development (cmdev) AI engineering practice. Companion to the Amazon Bedrock for Production AI series and the Hardening-before-AWS series.

The question we get first

In every conversation with a CISO at a Nigerian Tier 1 bank, every meeting with the head of digital at a European energy operator, every introductory call with a healthcare network's IT leadership — the first technical question is the same. It is not "which model should we use?". It is not "how do agents work?". It is:

"Can we deploy this without our data leaving our controlled boundary?"

The answer matters because most of the AI deployment options on the market in 2026 require sending the data somewhere else. The OpenAI ChatGPT Enterprise API processes prompts on OpenAI's infrastructure. The Anthropic Claude API processes them on Anthropic's. The "AI features" baked into SaaS products typically route prompts through the SaaS vendor's tenancy. For a bank under CBN CSAT, an EU energy operator under NIS2, a Nigerian fintech under NDPA, or a defence-adjacent operator under any number of national-security regimes — "we'll send the prompts to a third party for processing" is not a deployable architecture. The compliance officer says no before the CISO finishes evaluating it. The board never hears the pitch.

Amazon Bedrock's architecture answers this question — if it is configured correctly. The "if" is doing a lot of work in that sentence. A misconfigured Bedrock deployment looks superficially private but routes traffic through the public internet, holds data temporarily outside the customer's KMS boundary, or carries an IAM policy that an examiner can drive through. The pattern this article documents is the configuration we ship for regulated enterprise customers. It is the configuration that survives an examiner's first technical pass, a CISO's red-team review, and the operational reality of multi-thousand-query-per-second production traffic.

This piece is the buying-audience answer to the first question. The architectural diagram is the lead artefact; the regulatory mapping closes the loop; the friction points at the end are the ones we hit in real deployments and engineered past.

The reference architecture

Every line in that architecture is deliberate. Five layers, each with its own configuration discipline.

Layer 1 — Network: PrivateLink endpoints, private subnets, no internet gateway

The architectural commitment that makes "air-gapped" meaningful is that the workload subnets have no internet gateway and no NAT gateway path to the public internet for AI workload traffic. The Lambda functions, EKS pods, ECS tasks, and EC2 instances that invoke Bedrock cannot reach the public internet even if they wanted to. Traffic to Bedrock, KMS, S3, Secrets Manager, CloudWatch — every AWS service the AI workload touches — goes through Interface VPC Endpoints (PrivateLink).

The VPC endpoints we deploy at minimum:

Service	Endpoint	Why
`bedrock-runtime`	`com.amazonaws.<region>.bedrock-runtime`	Model invocation (`InvokeModel`, `Converse`)
`bedrock-agent-runtime`	`com.amazonaws.<region>.bedrock-agent-runtime`	Agent invocation, knowledge-base retrieval
`bedrock`	`com.amazonaws.<region>.bedrock`	Model and Guardrails management
`bedrock-agent`	`com.amazonaws.<region>.bedrock-agent`	Agent and Knowledge Base management
`kms`	`com.amazonaws.<region>.kms`	Encryption key operations
`s3`	`com.amazonaws.<region>.s3` (gateway endpoint)	Model invocation log storage, KB source bucket
`secretsmanager`	`com.amazonaws.<region>.secretsmanager`	Credential retrieval
`logs`	`com.amazonaws.<region>.logs`	CloudWatch Logs ingestion
`monitoring`	`com.amazonaws.<region>.monitoring`	CloudWatch metrics
`sts`	`com.amazonaws.<region>.sts`	Workload identity assumption

Each endpoint is deployed in private subnets only with a security group that allows ingress on port 443 from the workload subnets and nothing else. The endpoint policy on each VPC endpoint restricts the actions and resources further — bedrock-runtime permits InvokeModel and Converse only, against specific model ARNs the workload is authorised to use.

What this configuration eliminates:

The "exfiltration via DNS" attack — there is no DNS resolver path to the public internet
The "compromised dependency calls home" attack — outbound TCP/443 to anywhere except the allowed VPC endpoints is dropped at the security group
The accidental "prompt logged to a third party" leak — there is no network path to a third party

Egress filtering through AWS Network Firewall is the belt-and-braces option for environments where the security model assumes the workload itself may be compromised. For most regulated deployments, the security-group + endpoint-policy combination is the production baseline.

Layer 2 — Encryption: KMS customer-managed keys on every persistent artefact

PrivateLink keeps the data inside AWS's network. KMS keeps it inside the customer's cryptographic boundary. Every artefact Bedrock writes or reads is encrypted with a customer-managed key (CMK) held in the customer's account, with a key policy that restricts decryption to specific IAM principals and operational contexts.

The artefacts that need CMK:

Model invocation logs — the S3 bucket where Bedrock writes the full prompt, response, and Guardrails trace per invocation. CMK encryption at rest; Object Lock in Compliance mode for the regulatory retention period.
Knowledge Base source documents — the S3 bucket of documents ingested into the KB. CMK encryption; bucket policy restricting access to specific roles.
Knowledge Base vector store — OpenSearch Serverless, pgvector on Aurora, or whichever store the deployment uses. Encryption at rest with CMK.
Custom model artefacts — if Custom Model Import is in scope, the imported model's storage is CMK-encrypted.
Secrets Manager entries — any credentials the workload needs (third-party API keys, database credentials) are CMK-encrypted.
CloudWatch Logs — log groups that may contain prompt or response fragments are CMK-encrypted.

The KMS key policy itself is the most important configuration. The principals authorised to decrypt are explicitly named — a small set of workload roles, the audit and IR roles, and the break-glass administrative role. Wildcards do not appear in production CMK policies. The policy includes condition keys that restrict use to the specific VPC endpoints and source AWS accounts.

Key rotation is automatic on an annual cycle for most regulated workloads; some compliance regimes require shorter rotation periods, which the CMK supports natively. Cross-region replication of the key is the option to consider when the deployment needs multi-region failover and the regulator permits the secondary region.

The data residency question — "does our data ever leave Nigeria / the EU / the country specified in the regulator's directive?" — is answered by the combination of region selection (the deployment runs entirely in regions inside the regulatory boundary) and the CMK boundary (the key never leaves, so even if data did, it would be cryptographically meaningless). For Nigerian banks, the deployment in eu-central-1 (Frankfurt) or af-south-1 (Cape Town) with a Nigeria-pinned KMS key satisfies the NDPA cross-border processing requirements when the cryptographic boundary argument is documented properly. For EU operators, the same pattern in eu-central-1 or eu-west-1 with an EU-pinned KMS key satisfies NIS2 and GDPR data-residency expectations.

Layer 3 — Identity: federated workload identities, no static credentials

The IAM architecture for an air-gapped Bedrock deployment has one rule: no long-lived static credentials anywhere in the call path. No BEDROCK_API_KEY environment variable. No AWS access keys in Lambda configuration. No service-account JSON files committed to repositories. The unit of access is a short-lived federated session, scoped to a specific workload's permissions — the identity-first migration pattern we apply across every regulated engagement.

The implementation:

Human identities federate from the customer's IdP (Azure AD / Microsoft Entra, Okta, ADFS) into IAM Identity Center. Sessions are 15 minutes for production, 1 hour for non-production. MFA is enforced at the IdP level — hardware security keys for privileged users, platform authenticators for everyone else.
Workload identities on EKS use IRSA (IAM Roles for Service Accounts). Each microservice gets its own role with permissions scoped to its function. The Bedrock-invoking service has bedrock:InvokeModel on specific model ARNs; the knowledge-base-querying service has bedrock-agent-runtime:Retrieve on a specific KB ID; neither has anything else.
Workload identities on Lambda use the Lambda execution role, again scoped to the specific Bedrock actions and resources the function needs.
CI/CD systems authenticate via OIDC federation — GitHub Actions and GitLab CI both support this directly. No long-lived AWS access keys in CI secrets.

The IAM policy pattern for a Bedrock-invoking workload role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:Converse"
      ],
      "Resource": [
        "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6-20251022",
        "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-haiku-4-5-20251001"
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceVpce": "vpce-0abc123def456789a"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock-runtime:InvokeAgent"
      ],
      "Resource": [
        "arn:aws:bedrock:eu-central-1:123456789012:agent-alias/AGENT_ID/prod"
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceVpce": "vpce-0abc123def456789a"
        }
      }
    }
  ]
}

The condition on aws:SourceVpce is the line that closes the loop. Even if a credential leaks, it cannot be used from outside the customer's VPC endpoint. The condition makes the workload role's authority context-bound to the network topology.

Layer 4 — Audit: data events, model invocation logs, cross-account log forwarding

The audit-evidence layer is what makes the air-gapped deployment defensible to an examiner. Standard CloudTrail captures the management plane (who created the agent, who updated the guardrail) but not the data plane (each InvokeModel call). For regulated Bedrock workloads, data events must be enabled explicitly:

resource "aws_cloudtrail" "bedrock_data_events" {
  name           = "bedrock-data-events-prod"
  s3_bucket_name = aws_s3_bucket.audit_bucket.id
  kms_key_id     = aws_kms_key.audit_cmk.arn

  advanced_event_selector {
    name = "Bedrock invocations"
    field_selector {
      field  = "eventCategory"
      equals = ["Data"]
    }
    field_selector {
      field  = "resources.type"
      equals = ["AWS::Bedrock::AgentAlias", "AWS::Bedrock::Model"]
    }
  }
}

That gives you the access audit: who invoked which model at what time under which IAM role from which VPC endpoint. The data event record does not include the prompt or response — those go to model invocation logging, configured once per region:

resource "aws_bedrock_model_invocation_logging_configuration" "main" {
  logging_config {
    s3_config {
      bucket_name = aws_s3_bucket.invocation_logs.id
      key_prefix  = "invocation-logs/"
    }
    text_data_delivery_enabled  = true
    image_data_delivery_enabled = true
    embedding_data_delivery_enabled = false  # large; enable only if needed
  }
}

The invocation log bucket holds the full prompt, completion, and Guardrails trace per invocation. It is one of the most sensitive buckets in the entire deployment — prompts include the questions users ask (often PII), responses can contain regenerated training data and PII, the Guardrails trace shows what the safety system caught. KMS CMK encryption is mandatory. Object Lock in Compliance mode locks the retention period so even the root account cannot delete the records. Bucket policy restricts access to a small set of audit and IR roles.

Cross-account log forwarding is the configuration that closes the compromised-account scenario. CloudTrail data events and model invocation logs forward to a separate AWS account in the Security OU. The workload account has no delete permission on the audit bucket — even if the workload account is compromised, the attacker cannot destroy the audit trail. This is the configuration that turns "we have logs" into "we have logs an examiner can rely on."

For regulated workloads, the Object Lock retention is set to match the regulatory requirement: 7 years for NDPA processing records, 6 years for NIS2 incident evidence, the bank's specific CSAT retention period for CBN-supervised data.

Layer 5 — Stronger isolation: AWS Nitro Enclaves when the threat model demands it

For the highest-trust workloads — defence-adjacent, sovereign-wealth, ultra-high-value financial — the regulatory threat model sometimes assumes the cloud-provider operator itself is in the threat model. AWS Nitro Enclaves provide cryptographic attestation that workload code is running in a tamper-resistant isolation environment, with no operator-side debugging hooks, no console access, and the host's own root operator unable to inspect the enclave's memory.

For Bedrock specifically, Nitro Enclaves matter when the workload calling Bedrock needs to process the response in an attestable isolation environment before any other code sees it — for example, when applying a customer-side decryption step to a Bedrock response that contained encrypted PII, or when running a regulatory-required validation step that must demonstrate it was the only code with access to the plaintext response. The broader AWS security posture for AI workloads provides the surrounding controls.

Nitro Enclaves are not a default for typical regulated AI workloads. They are the right choice for the specific subset where the threat model assumes adversary capability that goes beyond standard cloud security. Most cmdev engagements stop at Layers 1-4; Nitro Enclaves get added when the customer's threat model demands them.

The regulatory mapping

The architecture above maps to specific obligations:

Obligation	Where the architecture delivers
NDPA 2023 — Section 39 (security of processing)	KMS CMK on every artefact + IAM least-privilege + VPC endpoints + model invocation logs as the processing record
NDPA — Section 40 (72-hour breach notification)	CloudTrail data events + invocation logs answer "what data was processed, by which model, when, under whose authority" within minutes — well within the 72-hour clock
NDPA — Cross-border processing	KMS key boundary + region selection inside the regulatory perimeter + documented cryptographic argument satisfies the cross-border test
NIS2 Article 21 — Incident handling	CloudTrail data events + invocation logs + X-Ray traces provide the incident-investigation evidence base, sealed against tampering
NIS2 Article 21 — Supply-chain security	Custom Model Import provenance, Knowledge Base source attribution, action-group Lambda dependency tracking
NIS2 Article 21 — Cryptography	KMS CMK with documented key policies, annual rotation, condition-bound use
CBN CSAT — Application security	Guardrails as input/output control; CloudTrail data events as access audit; the entire VPC-endpoint perimeter
CBN CSAT — Cryptography	KMS CMK on every encrypted resource; key policies that the bank's risk team can review
NCC Cyber Resilience Framework — Continuous monitoring	CloudWatch alarms on Guardrail interventions, model invocation anomalies, IAM role usage
GDPR Article 32 (security of processing)	Same controls; EU-region deployment with EU-pinned KMS keys
HIPAA Security Rule	Same controls + HIPAA-eligible region selection + signed Business Associate Agreement with AWS

The mapping matters because every one of those obligations is something an examiner will ask about. The architecture produces audit-grade evidence per obligation in minutes, not weeks.

The friction points — what bites in real deployments

The reference architecture is the easy part. The friction shows up at deployment time, in production, when the configuration meets reality. Five frictions cmdev engineers have hit and engineered past:

1. VPC endpoint service quotas

A regional account has a default soft quota on the number of VPC endpoints per VPC and on the number of network interfaces per endpoint. For a deployment with ten or more services consuming Bedrock + KMS + S3 + Secrets + CloudWatch + Logs + STS + monitoring across multiple subnets, the default quotas are hit before the deployment finishes. The fix is straightforward (open a quota-increase request) but the lead time is 24-72 hours and it is the kind of friction that breaks a launch timeline if it surfaces late. We pre-file the quota increases at engagement start, before the architecture is built.

2. KMS key policy complexity grows non-linearly

A simple deployment has three IAM principals authorised on the workload CMK. A real production deployment has the workload role, the audit role, the IR role, the break-glass admin role, the backup role, the cross-region replication role, the CloudTrail role, the model invocation log writer, the KB ingestion role, and the secrets-rotation Lambda role. Each needs its own statement. The policy quickly hits AWS's key-policy size limit (32 KB) for complex deployments.

The architectural fix is per-purpose CMKs rather than one CMK for everything: one for invocation logs, one for KB sources, one for vector store, one for general application secrets. Each policy stays under 8 KB. Operationally cleaner, audit-trail per purpose, and rotation can happen per-key on different cadences.

3. Cold-start latency on the Bedrock invocation path adds up

The VPC endpoint adds ~5-10ms per request on the wire. KMS decryption of the inbound credential context adds another 3-8ms. For latency-sensitive workloads — real-time conversational agents, sub-second classification pipelines — these add up to a perceptible "the air-gapped path is slower" experience.

The fixes that work: connection pooling on the Bedrock client (boto3's botocore.config.Config with tcp_keepalive=True and a sensible connect_timeout), reusing the Bedrock client across invocations rather than constructing one per call, and prompt caching (per Part 7 of the Bedrock series) to keep the recurring input-token portion of the latency near zero. For workloads where every millisecond matters, provisioned throughput on the model converts on-demand variance to predictable steady-state latency.

4. Cross-account log forwarding has subtle delete-permission edge cases

The pattern of "audit logs go to the Security OU account" is straightforward in concept. The subtlety: the workload account's CloudTrail service-linked role needs s3:PutObject on the Security OU's audit bucket but not s3:DeleteObject. AWS's default IAM templates sometimes grant both. We've shipped deployments where a misconfiguration meant the workload account's compromised role could have deleted its own audit trail — the kind of finding that surfaces in a penetration test six months in and triggers a mid-engagement remediation.

The defensive pattern: explicit Deny on s3:DeleteObject and s3:DeleteObjectVersion in the workload account's CloudTrail role, plus Object Lock in Compliance mode on the audit bucket so even the Security OU's own admin cannot delete records before retention expires.

5. The "we need internet for one specific dependency" pressure

Every air-gapped deployment we've shipped has, at some point in the engagement, faced the request: "Can we just add an internet path for [X]? It's only used for [Y]." The X is usually a Python package installation, a customer support library that calls home for usage analytics, or an OS package update. The pressure to compromise the air-gap is constant and the compromises compound.

The architectural answer is the internal artefact mirror: AWS CodeArtifact for Python and npm packages, a private container registry (ECR) for container images, an internal yum/apt mirror for OS packages. Every dependency the workload needs is mirrored inside the customer's perimeter. The air-gap holds because there is nothing the workload needs that requires breaching it. This is one of the highest-leverage decisions in the deployment — every regulated customer cmdev has worked with eventually needs the artefact mirror, and getting it in place during initial deployment avoids the much more painful retrofit later.

What this taught us about enterprise scaling

Five things hold up across the deployments cmdev has shipped under this pattern:

1. Air-gapped is a configuration discipline, not a product feature. Bedrock can run air-gapped. So can OpenSearch Serverless. So can Lambda. The difference between a deployment that satisfies an examiner and one that doesn't is not the choice of services — it's the discipline applied to the network, KMS, IAM, and audit configuration. The architecture above is the minimum discipline; the deployments that survive examiner scrutiny have additional discipline layered on top per regulatory regime.

2. The audit trail is the deployable artefact. The architecture is just plumbing. The thing that converts a deployment into a regulatory asset is the audit trail it produces — CloudTrail data events, model invocation logs, X-Ray traces, all sealed against tampering and forwarded to a separate account. We design audit-trail-first now: what queries will the regulator run, what evidence will those queries return, what does the trail look like in a worst-case incident scenario.

3. The buying conversation is shorter when the architecture is published. Customers we engage with under this pattern read the architecture before the first meeting. The meeting becomes about their specific data residency, their specific examiner relationship, their specific operational constraints — not about whether air-gapped AI is possible. The published architecture is doing customer-acquisition work continuously.

4. The friction points compound if engineering teams haven't seen them before. The VPC quota, KMS policy size, cold-start latency, cross-account forwarding edge cases, and dependency mirror — each is a 1-3 day issue in isolation, but compounded with a launch deadline they become a release-blocking incident. The cmdev engagement model includes the standing list of these frictions in the Phase 0 diagnostic specifically because we have hit them all in production.

5. The cost premium of air-gapped is smaller than the discount of bad assumptions. A common pre-engagement assumption is that air-gapped AI must cost 2-3× managed-API AI because of "infrastructure overhead." In practice, the VPC endpoint charges plus the KMS operations plus the storage overhead come to single-digit-percentage of the model invocation costs that dominate the bill. The optimisations from Bedrock cost-optimization apply identically. The economics are not the obstacle. The implementation discipline is.

Engaging with cmdev

CreativeMinds Development (cmdev) is the engineering studio behind this architecture. We ship production-grade AI for regulated enterprises in Africa and EU — banks under CBN CSAT, energy operators under NMDPRA and NIS2, fintechs under NDPA, healthcare networks under HIPAA-equivalent regional regimes. Our engagement model is a four-phase pattern: diagnostic, foundation build, co-managed operations, optional full managed services. The architecture in this article is the substrate; the engagement is what makes it production at your scale.

Email: [email protected]
Cloud security services: /services/cloud-security
Companion architecture series: Amazon Bedrock for Production AI, AWS-for-banks, Hardening before AWS

Mayowa Adewole is CTO and Principal AI Engineer at CreativeMinds Development. He leads cmdev's AI engineering practice for regulated enterprises across Africa and the EU, with deployments in production for banking, energy, and critical-infrastructure customers.