Case Study: Compliance Automator — Building Audit-Grade AI for Regulated Markets in Public

An operator-grade case study from the CreativeMinds Development (cmdev) AI engineering practice. The companion open-source repository is live: github.com/Samueladewole/compliance-automator.

The buying question this answers

The regulator's letter arrives on a Tuesday. The subject line is polite — "Request for cybersecurity governance evidence under NCPS 2021 CNII obligations, Q1 2026." The body asks for a list of specific evidence: privileged-access changes in production over the prior quarter, the approval trail for each, the IAM policy diffs that produced them, the audit log entries that confirm execution, the regulatory-control mapping. Response deadline: fourteen days.

The compliance team starts on Wednesday. By the end of week one, they have an extract from CloudTrail Lake covering the right time window. By the middle of week two, they have a draft mapping of events to NDPA Section 39 and CBN CSAT control IDs. By Friday afternoon, an analyst is hand-formatting the evidence pack into a PDF the regulator will accept. The team works the weekend. The pack ships on day twelve.

It will happen again next quarter. And the quarter after. Every regulated enterprise in Africa and the EU is now in a steady-state cadence of regulator queries — NDPA processing-record requests, CBN CSAT examination cycles, NMDPRA posture reviews, NIS2 Article 21 evidence requests, GDPR data-subject inquiries, sector-regulator one-off probes. The work is repetitive, time-pressured, expert-intensive, and a poor use of senior compliance time.

The compliance-automator is our open-source answer. A regulator's evidence query goes in. A structured, citation-rich, audit-grade evidence pack comes out — in twelve minutes, not twelve days. This piece is the case study: what we are building, the architecture, the repo, the build roadmap, and the engineering decisions a CISO needs to see before trusting the output.

What it actually does

A side-by-side comparison portal. Left pane: the regulator's query, or a draft policy / contract / operational document. Right pane: the AI agent's response — non-compliant clauses highlighted, deep-linked to the exact page and section of the official government regulation, with the evidence pack auto-generated underneath.

Three concrete examples that exercise the pipeline end-to-end:

"Show me all privileged-access changes in production for the past 90 days, with the approval trail." The agent queries CloudTrail Lake for IAM policy changes against production-scoped resources, joins them with the approval workflow records, maps each change to NDPA Section 39 / CBN CSAT controls, returns a PDF evidence pack with citations.
"Review this draft third-party vendor contract for NIS2 Article 21 supply-chain compliance." The agent retrieves the relevant NIS2 supply-chain clauses, highlights mismatches in the draft contract, deep-links each finding to the regulatory source.
"Produce the quarterly CSAT board-level evidence pack for Q1 2026." The agent runs the standing CSAT query set against the bank's evidence sources, formats the output to the regulator's preferred template, signs the artefact with KMS.

Each of these is a query type we have heard from real compliance officers as the thing that eats their week.

The architecture

Every component is documented in docs/architecture.md in the repo. The architecture composes pieces from the prior cmdev articles:

Air-gapped Bedrock deployment — the air-gapped pattern is the substrate. The agent never sends customer data outside the customer's VPC. PrivateLink endpoints to bedrock-runtime, bedrock-agent-runtime, KMS, S3, and Secrets. CMK encryption on every persistent artefact. CloudTrail data events forwarded to a separate Security OU account.
Evaluation harness — the eval-driven engineering pattern ships alongside the agent. 300-item golden set across the three regulatory regimes, LLM-as-judge calibration against human SME labels, drift detection in production.
Strands + AgentCore — the open-source agent harness from Part 3 of the Bedrock series. Hooks for audit, steering handlers for safety, event.interrupt() gates on the evidence-pack generation step.
Multi-model routing — Claude Haiku as the router that picks evidence sources; Claude Sonnet for synthesis; Cohere Embed v3 and Rerank v3 for retrieval. The Bedrock cost-optimization pattern keeps per-query cost predictable.
Security + observability — Guardrails wrap every invocation with the PII filters Part 6 documents, plus a custom denied-topic for "production-mutating-action-without-approval." Model invocation logs are the regulatory artefact.

What's in the repo right now

The repository is live at github.com/Samueladewole/compliance-automator. Current shape:

git clone https://github.com/Samueladewole/compliance-automator
cd compliance-automator
make install
make run     # returns a structurally valid scaffold evidence pack

Shipped now:

README with quickstart, repo structure, and live status table
LICENSE (MIT)
Python project skeleton (pyproject.toml, Makefile, ruff + mypy + pytest configured)
agent/cli.py — runnable CLI returning a valid-shape scaffold evidence pack so the end-to-end path is exercisable from day one
docs/architecture.md — system overview, component map, ADR index
docs/local-aws-setup.md — Bedrock model access, IAM, region selection, cost expectations
terraform/README.md and cdk/README.md — parallel infrastructure-as-code roadmaps
Folder structure ready to populate (agent/tools/, agent/hooks/, agent/prompts/, eval/, data/regulations/, data/synthetic/)

Building toward:

Working Strands agent in agent/pipeline.py with the five action tools wired
Terraform modules and CDK constructs for the full air-gapped deployment
300-item evaluation golden set with LLM-as-judge harness and signed monthly PDF for the regulator
Side-by-side comparison portal web UI (Next.js)
Public regulatory corpus: NDPA 2023, CBN CSAT extracts (where publicly available), EU NIS2 Article 21, NIST SP 800-53 subset
Synthetic CloudTrail + Security Lake data so end-to-end runs work without prod data

Target ship date: end of June 2026. Watch the repo or creativeminds.dev/blog for the milestone announcements.

Why we are building it in public

Three reasons, all of which a CISO will recognise as the right shape of trust signal:

1. The architecture is the trust signal. A CISO does not buy a compliance system based on a vendor's marketing deck. They buy based on reading the architecture, asking whether the security properties hold under their threat model, and watching the deployment behave under real load. An open-source repo is the architecture, fully visible, immediately auditable. Buying decisions accelerate when the code is open.

2. The buyer's data never leaves the buyer's tenancy. The compliance-automator deploys inside the customer's AWS account using the air-gapped Bedrock pattern. There is no cmdev-hosted SaaS to send queries through. The customer's CloudTrail Lake, Security Lake, IAM events, and Knowledge Base of policies stay within their cryptographic boundary. The open-source architecture is what makes this credible — the customer can read every line of what touches their data.

3. Build-in-public compounds. Every commit, every ADR, every eval-result publication is a signal that cmdev is doing real engineering. The repo's commit history is a continuous credibility surface that no marketing campaign can match. For a consulting / project-delivery practice, this kind of asset compounds — by the time a buying conversation reaches a deal review, the buyer has already evaluated us on the work.

The four-week build roadmap

The work is sequenced to ship a working end-to-end agent by end of June 2026:

Week	Milestone	Repo signal
Week 1 (this week)	Repo bootstrap, architecture documented, sample regulatory corpus ingested	Scaffold + first ADRs land
Week 2	Strands agent + Knowledge Base wired against real Bedrock + Cohere; CloudTrail-query and retrieve-regulation tools shipped	`make run` returns a real evidence pack against synthetic data
Week 3	Terraform modules and CDK constructs deployable to a fresh AWS account; air-gapped pattern validated	`terraform apply` produces a working deployment in a clean test account
Week 4	300-item golden set + LLM-as-judge eval harness + drift detection; PDF evidence-pack template	`make eval` produces the audit-grade quality report

Each weekly milestone is a tagged release in the repo. Each ships with a short blog post in this case-study series — what we built, what surprised us, what we engineered past.

What this teaches us about enterprise scaling — so far

Three things have already surfaced in the bootstrap week that warrant flagging for the buying audience:

1. The repo structure matters more than the architecture diagram. A CISO evaluating an open-source compliance tool will spend their first ten minutes in the repository. The first ten minutes need to convey: clear quickstart, honest status table (what is shipped vs. what is not), runnable scaffold so the end-to-end path is exercisable from day one. We rewrote the README three times in week one to land the structure that doesn't waste those ten minutes.

2. Building parallel Terraform and CDK costs more than either alone — but the cost is small and the trust signal is large. Most teams have a strong preference. Shipping both means meeting them where they are. The cost shows up in maintaining two infrastructure expressions of the same system, which we are mitigating by treating Terraform as the canonical source and CDK as the synthesised equivalent (with terraform plan snapshots committed against the CDK synth output as a regression check).

3. The "case study" framing is itself the wrong product framing. The compliance-automator is not a one-off engagement we documented after the fact. It is a reference implementation that we and our customers can fork. The case-study article you are reading is a snapshot of an evolving product, not a retrospective. We are adjusting the cmdev publishing voice to reflect this — the reference architecture series and this case study compose as the front door of a working open-source practice, not as portfolio items.

How to engage

Three concrete moves you can make if this is the shape of work your team needs:

Star the repo. github.com/Samueladewole/compliance-automator — the star count is the public signal of demand and helps the project compound. Watching gives you the milestone-release notifications without inbox noise.

Read the air-gapped Bedrock article and the eval harness article. They are the architectural substrate. If those resonate with your CISO and compliance leadership, the compliance-automator is the natural fit.

Email [email protected] for a deployment consultation. We engage with regulated enterprises in Africa and the EU on a four-phase model (diagnostic → foundation build → co-managed operations → optional MSSP). The compliance-automator is the substrate; the engagement is what makes it production at your scale. Direct, no sales fluff.

Companion content

Live repo: github.com/Samueladewole/compliance-automator
Architecture substrate: Air-Gapped LLM Deployments on AWS Bedrock, Custom Evaluation Frameworks for Enterprise LLMs
Reference series: Amazon Bedrock for Production AI (8 parts)
Banking reference: AWS Architecture for Nigerian Banks (3 parts) and Hardening before AWS (3 parts)

Mayowa A. is CTO of CreativeMinds Development. CreativeMinds Development (cmdev) ships production AI for regulated enterprises across Africa and the EU.