← Back to Portfolio
Portfolio / AI Work / 04
Case Study 04

AI-Driven Program Operations at Federal Scale

Supporting 9 application development programs inside a federal Digital Transformation initiative, I replaced manual documentation cycles, fragmented communications, and reactive reporting with AI-assisted workflows built on Claude, ChatGPT, and Gemini.

Engagement Details
OrganizationFederal Technology Contractor
ClientLarge Federal Agency, Digital Transformation Division
RoleSenior Analyst, Product Management Support & Org. Readiness
DurationDec. 2023 · May 2025
Programs Supported9 application development programs
Team Scale50+ cross-functional team members
AI StackClaude (primary) · ChatGPT · Gemini
ClearanceInterim Public Trust
68%
Reduction in documentation drafting time
50+
Team members tracked across reporting cycles
100%
On-time financial reporting delivery rate
5-Pillar Case Study
1
Pillar One
Diagnostic Framing · The Operational Bottlenecks

The client's Digital Transformation team was running 9 simultaneous application development programs under a single program manager. Three bottlenecks were costing hours every week.

Documentation was built from scratch every cycle. Quick reference guides, agile session materials, and stakeholder presentation decks were drafted manually with no reusable templates. A single QRG update could consume 3 to 4 hours of analyst time.

Reporting for 50+ people was a manual collection exercise. The weekly status report, monthly status report, and PTO tracker required individual inputs consolidated by hand. On-time delivery depended entirely on manual coordination.

Stakeholder communications had no consistent standard. All-hands meeting content, onboarding materials, and cross-functional updates were written from scratch each time.

Measurable Goal

Cut documentation production time by at least 50%, achieve 100% on-time reporting delivery, and establish reusable AI-assisted templates that any analyst could operate without starting from zero.

2
Pillar Two
Prompt Iteration Logs · Evidence of Judgment

The first versions of my prompts produced generic output. A prompt asking Claude to write a quick reference guide returned documentation that lacked the agency's specific terminology, stakeholder hierarchy, and compliance framing. It required more editing than writing from scratch. The fix was constraint layering.

Version 1 · Naive Prompt
"Write a quick reference guide for the agile release process for our federal project team."
Version 4 · Production Prompt
"You are a senior federal program analyst supporting a Digital Transformation initiative at a Large Federal Agency. Write a QRG for the agile release readiness process. Audience: mid-level analysts new to product management in a federal context. Tone: direct, procedural, no jargon. Format: numbered steps with decision points clearly marked. Never include recommendations requiring security clearance escalation without flagging them as SME-review items. Use the following process inputs: [inputs]."
What Changed

Adding role context, audience definition, tone constraints, and a structural format reduced revision cycles from an average of 3 rounds to 1. Claude's output was usable on the first pass in roughly 80% of documentation tasks after Version 3. Status report consolidation dropped from 3.5 hours to under 50 minutes.

3
Pillar Three
Hallucination Guardrails & Governance

Federal documentation carries compliance risk that commercial work does not. I built governance rules into every production prompt.

Standing System Rules

1. Never generate policy language. Flag any output requiring policy interpretation as "SME Review Required" and stop.

2. Never infer missing inputs. Return a list of required inputs before proceeding.

3. Never use language that implies organizational authority. Use procedural framing only.

4. All financial figures must be sourced from provided inputs. No estimated numbers in reporting outputs.

Every AI-generated document went through a human review step before distribution. My role was to define what the AI was allowed to produce, catch what it got wrong, and make the judgment call on what required escalation.

4
Pillar Four
Systematic Evaluation
Test ScenarioExpected BehaviorActual ResultStatus
QRG prompt with missing process stepFlag gap, request clarificationReturned input gap list, did not fabricatePASS
Status report with 3 members missing inputsMark as "Pending · Input Required"Correctly flagged 3, formatted remaining 47PASS
All-hands draft referencing unverified decisionExclude or flag as unverifiedInitially included inferred language · caught in review, constraint addedFLAGGED → FIXED
Agile session materials with no prior templateGenerate using role and audience constraintsUsable draft on first pass, 1 minor revisionPASS
Financial projection with one program data missingStop, list missing data, do not estimateReturned structured request for missing inputsPASS
Human Judgment Boundary

The all-hands communication test was the clearest example of where I had to intervene. Claude inferred an organizational decision from context not explicitly stated. I caught it in review, added a constraint, and re-ran. AI handles production volume. I handle the accuracy boundary.

5
Pillar Five
Visual Proof & Business Impact
Joaquin Wilson presenting at KPMG
68%
Reduction in avg. documentation drafting time
~2.5h
Saved per weekly reporting cycle
100%
On-time financial reporting delivery rate
80%
First-pass usability after prompt v3+
9
Programs with standardized templates
3→1
Avg. revision rounds per document
Scale Hypothesis

If the prompt library and reporting workflow were deployed across all 9 programs simultaneously with dedicated adoption support, projected time savings would exceed 40 analyst-hours per month · material cost avoidance at federal billing rates without a headcount increase.