Play: Navigating the AI Autonomy Continuum
Executive Summary (The Play in Brief)
AI in the SDLC isn't a binary leap from "manual" to "fully automated." It's a continuum of AI autonomy patterns — from assistive tools embedded in workflows to self-optimizing ecosystems capable of continuous innovation.
This play provides a DoD-aligned framework for understanding four AI autonomy patterns that exist along this continuum, helping leaders architect secure, mission-aligned systems while maintaining calibrated trust. Unlike traditional maturity models that suggest linear progression, these patterns can be implemented simultaneously across different SDLC workflows based on mission requirements, risk tolerance, and operational context.
Each pattern along the autonomy continuum has distinct architectural, governance, and workforce implications. Organizations can operate multiple patterns in parallel — using assistive tools for sensitive code while deploying orchestrated agents for CI/CD pipelines. Selecting the wrong pattern for a given workflow, or implementing patterns without proper readiness, can create mission risk, security gaps, and compliance challenges.
-
Intended audience: CIOs, Chief Engineers, Program Managers, Software Factory Architects, DevSecOps Leads, AI/ML Engineers, Policy Makers
-
Key takeaway: The AI autonomy continuum offers multiple patterns that can coexist. Success depends on matching the right pattern to the right workflow — autonomy is earned through readiness, not granted through technology adoption.
1. Why This Play Matters
- DoD and federal agencies face pressure to leverage AI for speed, adaptability, and scale.
- Selecting inappropriate AI autonomy patterns for specific workflows risks security, compliance, and operational stability.
- A structured continuum of patterns helps teams make mission-aligned decisions and implement AI capabilities with confidence.
2. What is Autonomy and What is an AI Agent?
2.1 What is AI Autonomy?
AI Autonomy is the ability of an artificial intelligence system to operate and make decisions with minimal or no human intervention. Autonomy describes the degree to which a system can sense its environment, plan actions, and execute them without continuous human input.
Over the past fifty years, multiple taxonomies have emerged to describe levels of autonomy across domains such as avionics, manufacturing, and power systems. These taxonomies generally range from no automation (full reliance on humans) to full automation (a system acting entirely on its own within expected contexts).
- NIST ALFUS Framework: NIST’s Autonomy Levels for Unmanned Systems (ALFUS) describes autonomy in terms of context complexity, human independence, and mission capability 2.
- DoD Levels of Autonomy (LOA): The Department of Defense defines autonomy in operational systems with an emphasis on governance, human–machine teaming, and mission risk.
📌 Key insight: Most existing taxonomies emphasize “automation” rather than “teaming.” For the SDLC, our focus is on autonomy as part of human–AI collaboration, not replacement.
2.2 What is an AI Agent?
For the purpose of this play, we define an AI Agent as:
A software system that can interact with its environment, gather information, and use that information to plan and execute actions in pursuit of goals, often with some degree of autonomy.
Key properties:
- Observation: Agents interface with APIs, data, metrics, or user inputs, often with memory of past interactions.
- Planning: Agents reason about goals, constraints, and roles, using models such as LLMs for reasoning or domain-specific logic.
- Action: Agents act by calling tools, updating code or artifacts, delegating to other agents, or prompting humans for clarification.3

Expert Perspectives
Different communities frame AI agents through complementary lenses:
Researchers like Masterman et al. define agents as “entities able to plan and take actions to execute goals.”1 MITRE experts build on this, with Dr. Peter Schwartz emphasizing that agents should be judged by what they do, not how they do it. Dr. Keith Miller extends this view, describing agentic systems as ranging from simple copilots to orchestrated multi-agent collectives with shared awareness of state, beliefs, and goals.
For this play, we focus on AI Agents that leverage LLMs, while recognizing that smaller or domain-specific agents may not require LLMs.
Evolution of Intelligent Agents
The progression of intelligent agents parallels the SDLC’s increasing automation — from static helpers to adaptive collaborators.
- 1950s–1970s: Rule-Based Systems - Early expert systems like MYCIN followed hard-coded if–then rules. They were predictable but brittle, with no learning.4
SDLC Parallel: Early linters and static checkers that flagged rule violations without adapting to context.
- 1980s–1990s: Adaptive Agents - Machine learning and reinforcement learning introduced adaptability, enabling agents to refine behavior from data and feedback.5
SDLC Parallel: Bug prediction models that learned from historical commit data to anticipate likely error-prone code.
- 2000s–2010s: Deep Learning and Context Awareness - Deep neural networks allowed agents to process unstructured data (text, images, speech) and recognize patterns across large contexts.6
SDLC Parallel: Automated anomaly detection in CI/CD pipelines, identifying unusual build failures or performance regressions.
- 2018–Present: LLMs and Multi-Agent Systems - Large Language Models unlocked reasoning, planning, and natural language interaction. Agents now collaborate in multi-agent systems, coordinating tasks across the SDLC.7
SDLC Parallel: Code copilots and orchestrators that generate, review, and integrate code, or coordinate multiple agents in DevSecOps pipelines.
📌 Key Insight: The trajectory is clear: from deterministic rule-followers → adaptive learners → context-aware analyzers → reasoning collaborators. This same arc defines the autonomy continuum and the patterns we now apply to the SDLC.
How are AI Agents Used Today?
Modern AI agents often mimic the cognitive loop humans follow: observe, plan, act.
- Observe: Gather data from APIs, sensors, logs, or prior context.
- Plan: Define goals, sequence steps, allocate tasks, or query reasoning models.
- Act: Execute via digital systems, delegate to other agents, or request clarification from humans.
This cycle is continuous, enabling agents to adapt to new information in near real time.
When Are AI Agents Beneficial?
- Automating repetitive tasks to free humans for higher-value work.
- Operating in complex or dynamic environments where context must be tracked across interactions.
- Enhancing customer or user support with consistent, context-aware responses.
When to Avoid AI Agents
- Tasks requiring human judgment, creativity, or ethics (e.g., strategic decision-making, empathy-driven contexts).
- High-consequence domains where errors may cause material harm and are difficult to detect.
- Scenarios demanding emotional intelligence, which AI cannot authentically replicate.
📌 Key Insight "The concept of software agents has been around for a long time, but of course now they’re supercharged by LLMs. LLMs suddenly give them the ability to interact with humans through natural language and pictures, provide reasonable zero-shot responses, perform commonsense reasoning, construct their own plans, perform in-context learning, generate/proofread/summarize documents, generate/debug/execute code, etc. This opens the door to a much higher degree of task complexity than they were capable of just a few years ago."
- Dr. Peter Schwartz, MITRE
For the purpose of this play, AI Agents will be limited to those that utilize LLMs bearing in mind, emerging technology and processes may change the landscape.
Understanding these agent capabilities helps teams select appropriate autonomy patterns for different SDLC workflows - matching agent sophistication to mission requirements and risk tolerance.
3. AI Autonomy Patterns for the Software Development Lifecycle
Understanding the AI Autonomy Continuum
AI autonomy in the SDLC isn't a single implementation choice—it's a continuum of capabilities that spans from simple assistive tools to self-optimizing ecosystems. Rather than treating this as a linear progression where teams must "advance" from one level to the next, real-world adoption follows a pattern-based approach where multiple autonomy levels operate simultaneously across different parts of the software delivery pipeline.
This section introduces four foundational AI autonomy patterns that can be applied independently or in combination, based on mission requirements, risk tolerance, and the specific characteristics of each SDLC workflow.
Why Patterns, Not Stages?
Traditional maturity models suggest organizations should progress through sequential stages, eventually "graduating" to the highest level. But AI autonomy in mission-critical environments works differently:
Real-world complexity demands flexibility:
- Authentication code may require Pattern 1 (human-controlled tools) for security assurance
- Test generation can safely use Pattern 2 (delegated agents) for efficiency
- CI/CD pipelines might benefit from Pattern 3 (orchestrated systems) for coordination
- Performance optimization could experiment with Pattern 4 (adaptive ecosystems) in controlled environments
Risk varies by context, not organizational maturity: A highly mature DevSecOps team might deliberately choose Pattern 1 for cryptographic code while simultaneously running Pattern 3 for deployment orchestration. The "best" pattern depends on the specific workflow, data sensitivity, and consequences of failure—not the team's overall AI sophistication.
Patterns enable parallel adoption: Teams can introduce new patterns incrementally without abandoning proven approaches. A successful Pattern 1 implementation doesn't become obsolete when Pattern 2 is introduced—it continues operating in the contexts where human oversight provides the most value.
The Four AI Autonomy Patterns
Each pattern represents a distinct approach to human-AI collaboration in the SDLC, with different architectural requirements, governance needs, and risk profiles:
- Pattern 1 - Assistive Tools: AI augments human capabilities with bounded, human-triggered assistance
- Pattern 2 - Delegated Agents: AI executes complete workflows within defined scopes and human oversight
- Pattern 3 - Orchestrated Systems: Multiple AI agents coordinate across SDLC phases under policy-driven governance
- Pattern 4 - Adaptive Ecosystems: Self-optimizing AI systems that continuously improve based on real-world feedback
Selecting the Right Pattern
The most important architectural decision isn't which pattern to implement—it's which pattern fits each specific use case. This requires teams to evaluate:
- Mission criticality: How much risk can this workflow tolerate?
- Human expertise: What level of AI oversight can the team provide?
- Governance requirements: What audit, compliance, and traceability standards apply?
- Technical boundaries: What are the integration points and failure modes?
The following sections detail each pattern's characteristics, architectural implications, and guidance for responsible implementation. Remember: the goal isn't to achieve the "highest" pattern—it's to match the right pattern to the right problem while maintaining mission assurance and operational control.
Pattern 1 – AI as a Tool (AI-Augmented Tools)
Definition: AI assists humans by automating bounded, repetitive tasks under direct human control. Tools in this pattern operate within predefined rules or narrow functions. They require a human to initiate, supervise, and validate their use. They improve efficiency but do not alter team dynamics or overall workflows.
SDLC Lens: At this level, AI appears as assistive add-ons inside existing tools and environments. Examples include autocomplete in an IDE, inline documentation, or static analysis hints. Importantly, all usage is human-triggered and human-reviewed.
Examples:
- Code completion (e.g., Copilot-style suggestions).
- Inline documentation and comment generation.
- Backlog grooming aids (AI-generated user stories).
- Static analysis augmentation with AI explanations.
Architectural & Risk Implications:
- Integration: Minimal complexity — typically IDE plug-ins, SaaS integrations, or extensions to existing tools.
- Validation: Humans must validate all outputs; unchecked adoption risks code quality and security.
- Auditability: Logging of prompts, outputs, and tool usage is essential to ensure traceability.
- Governance: The biggest risk at this pattern is "Shadow AI" — when developers use unapproved AI tools outside organizational visibility and control. 8
This aligns with the DoD DevSecOps Reference Design, which requires human-in-the-loop review for all code and configuration changes as part of CI/CD pipelines.
Shadow AI Risk at Pattern 1: Shadow AI refers to the use of AI tools, plug-ins, or services outside approved governance, visibility, and security controls.
- Example: A developer pastes source code into ChatGPT, Copilot, or an unvetted SaaS tool, unintentionally leaking proprietary, export-controlled, or classified data.
- Analogy: This is the AI version of "shadow IT," where teams spin up unsanctioned cloud services without organizational approval.
- Implication: Without governance, organizations lose visibility into how AI is being used, what data it touches, and what risks are being introduced.
Pattern 1 Selection Guidance
When Pattern 1 is the Right Choice:
Pattern 1 is ideal for workflows that benefit from AI assistance but require human judgment, involve sensitive data, or operate in high-assurance environments where human accountability cannot be delegated.
Optimal Use Cases:
- Authentication and authorization code development
- Cryptographic implementations (review assistance only)
- Requirements analysis and documentation
- Code review and static analysis augmentation
- Learning and skill development for developers
- Prototyping and experimentation in sandboxed environments
Risk-Benefit Analysis:
- Low to moderate risk tolerance contexts
- High human expertise available for review and validation
- Sensitive or classified data handling requirements
- Regulatory compliance environments requiring human accountability
- Mission-critical systems where AI errors could have significant consequences
Integration Considerations:
- Fits well into existing DevSecOps pipelines without major architectural changes
- Minimal training overhead for development teams
- Can be implemented incrementally across different SDLC phases
- Maintains clear human accountability for all outputs
When Pattern 1 May Not Be Sufficient:
- High-volume, repetitive tasks where human review becomes a bottleneck
- Workflows requiring complex coordination across multiple tools or systems
- Teams seeking to automate entire workflows rather than augment individual tasks
- Environments where the overhead of constant human review outweighs the benefits
Pattern 1 Implementation
For detailed readiness assessment, implementation checklists, monitoring guidance, and health metrics, see the companion AI Autonomy Implementation Guide.
Pattern 1 Success Indicator: Teams demonstrate calibrated trust in AI assistive tools—leveraging them effectively for appropriate tasks while maintaining rigorous human oversight and accountability. AI enhances productivity without compromising security, quality, or mission assurance.
Pattern 2 – AI as a Team Member (Human-Triggered, Finely Scoped Agent)
Definition: An AI agent assigned to a specific role or bounded task, executing an entire workflow only after a human trigger. Unlike Pattern 1 tools, these agents operate across a slightly broader context (multiple steps instead of one), but remain narrow in scope. They are still subordinate to human decision-making and validation.
SDLC Lens:
Pattern 2 AI feels like a "copilot teammate" in the workflow. It doesn't just suggest a line of code — it can carry out a full slice of work inside the SDLC once instructed.
Examples include:
- In an IDE: generating unit tests for a new module.
- In requirements management: drafting a traceability matrix or generating test cases from requirements.
- In a CI/CD pipeline: executing a dependency update, opening a pull request, and running regression tests.
- In system design: creating stubbed APIs from user stories.
Crucially, humans initiate the task and retain review/approval responsibility.
Architectural & Risk Implications:
- Governance: Agents must be registered, sanctioned, and monitored within approved platforms (IDEs, CI/CD). Policies should explicitly prohibit unsanctioned SaaS agents. Governance must also enforce human-in-the-loop review and traceability of all agent outputs.
- Scoped Permissions: Agents should operate with least-privilege access (e.g., can open PRs but not merge).
- Audit Logging: Every action must be logged and attributed to the triggering human.
- Explainability & Feedback Loops: Outputs must be observable and reviewable, with structured acceptance/rejection cycles to build calibrated trust.
Shadow AI Risk at Pattern 2: As in Pattern 1, Shadow AI remains a concern — but the stakes are higher. At Pattern 1, unauthorized use mostly risked exposing sensitive data. At Pattern 2, unsanctioned agents can directly modify SDLC artifacts (code, test scripts, configuration files).
This introduces new risks:
- Supply Chain Vulnerability — a rogue or unsanctioned agent could insert unvetted dependencies or insecure code that propagates downstream.
- Compliance Gaps — activity outside sanctioned channels may bypass audit trails, reviews, and security checks.
Example: A developer connects an unapproved SaaS test-generation agent to a repo. The agent commits code that passes functional tests but introduces a hidden security flaw, bypassing governance because its activity wasn't logged or reviewed.
Pattern 2 Selection Guidance
When Pattern 2 is the Right Choice:
Pattern 2 works well for bounded, repeatable workflows that benefit from end-to-end automation but still require human oversight and accountability.
Optimal Use Cases:
- Test suite generation and maintenance
- Documentation creation and updates
- Dependency management and updates
- API scaffolding and boilerplate code generation
- Code refactoring within defined parameters
- Requirements traceability matrix generation
Risk-Benefit Analysis:
- Moderate risk tolerance with structured oversight
- Well-defined workflows with clear success criteria
- Repetitive tasks where human review overhead is justified by efficiency gains
- Non-critical systems where agent errors can be caught and corrected
- Teams with agent supervision experience or strong review processes
Integration Considerations:
- Requires more sophisticated tooling than Pattern 1
- Benefits from standardized prompt libraries and agent templates
- Needs robust testing and validation pipelines
- Should integrate with existing code review and approval processes
When Pattern 2 May Not Be Appropriate:
- High-stakes, mission-critical code where errors have severe consequences
- Workflows requiring creative problem-solving or strategic decision-making
- Environments with limited capacity for comprehensive agent output review
- Systems handling classified or highly sensitive data without proper controls
Pattern 2 Implementation
For detailed readiness assessment, implementation checklists, monitoring guidance, and health metrics, see the companion AI Autonomy Implementation Guide.
Pattern 2 Success Indicator: Teams effectively delegate bounded workflows to AI agents while maintaining appropriate human oversight, accountability, and trust calibration. Agents enhance productivity for repetitive tasks without compromising quality or introducing unacceptable risk.
Pattern 3 – AI as a Fully Autonomous Team (Orchestrated Multi-Agent Systems)
Definition: Pattern 3 introduces multiple autonomous agents working together across the SDLC. These agents coordinate to complete interconnected workflows end-to-end, under the supervision of human operators. While humans no longer trigger every task, they still provide oversight, policy boundaries, and final accountability.
SDLC Lens (with Examples): Instead of a single teammate agent, Pattern 3 resembles an AI team within the team. Different agents specialize in roles (testing, compliance, deployment, monitoring) and are coordinated by an orchestration layer.
Examples:
- A requirement-change event triggers coordinated updates across test plans, security scans, and documentation — each handled by different agents.
- A CI/CD pipeline has one agent managing builds, another running security scans, and another validating compliance before release.
- Agents cross-check each other's outputs to detect inconsistencies before delivery.
Humans no longer initiate every action but instead monitor orchestration dashboards, approve exceptions, and intervene if the system escalates issues.
Architectural & Risk Implications:
- Orchestration Layer: Needed to sequence workflows, resolve conflicts, and provide observability into agent interactions.
- Inter-Agent Trust: Outputs from one agent must be verified before another consumes them.
- Error Containment: Safeguards must prevent cascading errors across the SDLC.
- Governance: Clear escalation rules define when humans are reintroduced into the loop.
- Audit Logging: Must capture not just agent actions, but cross-agent interactions for accountability.
Shadow AI Risks at Pattern 3: At this level, Shadow AI introduces systemic risks. If unsanctioned agents are introduced into the orchestration layer, they may:
- Bypass trust models and inject unvalidated outputs.
- Break observability by operating outside governance and logging.
- Amplify propagation risk where rogue outputs cascade through workflows.
Example: An unapproved SaaS deployment agent bypasses compliance checks and pushes builds into production, leaving no trace in the sanctioned audit system.
Pattern 3 Selection Guidance
When Pattern 3 is the Right Choice:
Pattern 3 is suitable for complex, interconnected workflows that benefit from specialized agent coordination but can operate under human supervision rather than direct control.
Optimal Use Cases:
- End-to-end CI/CD pipeline automation with multiple validation stages
- Coordinated testing across multiple environments and test types
- Multi-stage compliance and security scanning workflows
- Integrated documentation, code generation, and testing pipelines
- Cross-system integration testing and validation
- Automated incident response with multiple containment and notification steps
Risk-Benefit Analysis:
- Moderate to high risk tolerance with robust oversight mechanisms
- Complex workflows where agent coordination provides clear efficiency gains
- Mature DevSecOps environments with strong observability and governance
- Repeatable processes with well-defined success criteria and error handling
- Teams experienced with Pattern 1 and Pattern 2 implementations
Integration Considerations:
- Requires sophisticated orchestration and monitoring infrastructure
- Benefits from standardized agent interfaces and communication protocols
- Needs comprehensive testing of agent interaction scenarios
- Should implement gradual rollout with fallback to human-controlled processes
When Pattern 3 May Not Be Appropriate:
- Organizations without proven success with Pattern 1 and Pattern 2
- High-stakes environments where multi-agent failures could have severe consequences
- Workflows requiring frequent human judgment or creative problem-solving
- Systems with limited observability or debugging capabilities
- Teams lacking experience with complex distributed system management
Pattern 3 Success Indicator: Organizations successfully coordinate multiple specialized AI agents to complete complex workflows while maintaining human oversight, system resilience, and full accountability. Multi-agent orchestration delivers efficiency gains without sacrificing governance or introducing unacceptable systemic risks.
Pattern 4 – AI as a Self-Optimizing Engine (Software Flywheel)
Definition: Pattern 4 represents the culmination of AI autonomy in the SDLC: a self-sustaining AI ecosystem that continuously designs, builds, tests, deploys, monitors, and improves software with minimal human intervention. At this level, humans set strategic direction and guardrails, but the system adapts and optimizes itself in near real time.
SDLC Lens (with Examples): Pattern 4 envisions a closed-loop flywheel: telemetry, user feedback, and mission context feed directly into the design and deployment cycle. Unlike Patterns 1–3, humans do not trigger or oversee every action. Instead, the system continuously evolves based on real-world signals.
Examples (aspirational):
- An AI system releases a security patch minutes after detecting a new exploit in production traffic.
- Performance monitoring drives automatic refactoring: inefficient code paths are rewritten and redeployed autonomously.
- User feedback is ingested at scale, reprioritizing backlog items and generating features without human tasking.
At this pattern, humans transition from operators to stewards and governors, ensuring the flywheel remains aligned with mission objectives, compliance standards, and ethical principles.
Architectural & Risk Implications:
- Adaptive Governance: Policies and oversight must operate continuously, not periodically. Governance becomes a dynamic system that can adapt as agents learn and optimize.
- Compliance Enforcement: Automated checks for regulatory, security, and export control compliance must be embedded into every loop.
- Emergent Behavior Monitoring: Systems must actively detect and interpret behaviors not explicitly trained for — from anomalous optimizations to goal drift.
- Rollback & Resilience: Circuit breakers, rollback pipelines, and human override switches must be first-class citizens, tested as rigorously as production code.
- Explainability at Scale: Decisions must remain interpretable to humans for mission assurance, auditing, and trust.
Shadow AI Risks at Pattern 4: In a self-optimizing loop, unauthorized tools or agents can destabilize the entire system:
- Skewing optimization by injecting false telemetry or manipulated test results.
- Breaking compliance by introducing unapproved features or updates into the loop.
- Undermining trust by evolving the system in ways misaligned with mission priorities.
Example: An unsanctioned monitoring agent feeds corrupted metrics into the flywheel, causing the system to prioritize the wrong performance goals while bypassing compliance checks.
Pattern 4 Selection Guidance
When Pattern 4 Might Be Appropriate:
Pattern 4 is suitable only for organizations with proven mastery of Patterns 1-3 and specific use cases where continuous optimization provides mission-critical value.
Potential Use Cases (Experimental):
- High-frequency, low-risk optimization tasks (performance tuning, resource allocation)
- Continuous security monitoring and response in well-defined threat environments
- Automated experimentation and A/B testing in non-critical systems
- Self-healing infrastructure in controlled environments
- Adaptive user experience optimization with safety guardrails
Risk-Benefit Analysis:
- High risk tolerance with extensive safeguards and human oversight
- Mission-critical optimization needs that justify the complexity and risk
- Mature AI/ML capabilities across the entire organization
- Proven success with sophisticated Pattern 3 implementations
- Regulatory environments that can accommodate adaptive autonomous systems
Critical Prerequisites:
- Demonstrated success with Pattern 3 multi-agent orchestration
- Robust governance frameworks proven effective across complex AI systems
- Advanced monitoring and explainability capabilities
- Comprehensive risk management and incident response procedures
When Pattern 4 Should Be Avoided:
- Organizations without proven success with Patterns 1-3
- High-stakes environments where autonomous optimization errors could have severe consequences
- Regulatory environments requiring explicit human approval for all changes
- Systems with limited observability or explainability capabilities
- Teams lacking deep expertise in AI safety and autonomous system governance
Pattern 4 Implementation
For detailed readiness assessment, implementation checklists, monitoring guidance, and health metrics, see the companion AI Autonomy Implementation Guide.
Current Reality Check (Aspirational):
Pattern 4 is not yet real at operational scale — it remains a vision for the future:
- Research Prototypes: Universities and labs are experimenting with autonomous repair loops, adaptive MLOps, and self-healing pipelines. These systems remain brittle, limited in scope, and require close human oversight.
- Commercial Building Blocks: Tools like GitHub Copilot, OpenAI's o1 reasoning models, LangGraph, and MLOps orchestration platforms provide fragments of the flywheel, but not a fully self-sustaining ecosystem.
- Governance Gaps: Current DoD and NIST frameworks assume explicit human accountability at critical decision points, a direct contradiction to Pattern 4 minimal-intervention autonomy.
- Trust and Safety: Multi-agent orchestration creates emergent risks that no governance model has yet fully contained.
📌 Key Insight: Pattern 4 is best treated as an aspirational North Star — valuable for anticipating risks, guiding governance design, and preparing workforce evolution. It is not a current maturity requirement, nor should it be assumed to be inevitable.
Pattern 4 Success Indicator: Organizations successfully operate self-optimizing AI ecosystems that continuously improve software quality, performance, and user value while maintaining mission alignment, regulatory compliance, and human stewardship. Autonomous optimization delivers transformational capabilities without compromising safety, governance, or accountability.
4. Cross-Cutting Considerations
While the autonomy patterns describe how AI evolves in the SDLC, certain considerations cut across every stage. These ensure that autonomy does not compromise mission assurance, compliance, or security.
Cybersecurity by Design
AI systems must be designed with security-first principles. This includes threat modeling against AI-specific attack vectors (prompt injection, data poisoning), enforcing Zero Trust architectures for agent communication, and ensuring that AI supply chains are secure. At all stages, AI must be subject to the same secure coding, hardening, and accreditation processes as other mission software.
Calibrated Trust
Trust in AI outputs should never be absolute. Teams must set confidence thresholds, establish escalation triggers, and ensure explainability of decisions. Early stages may require near-100% human review, while later stages can adopt risk-based review strategies. Trust must be measurable and iteratively calibrated, not assumed.
Data Stewardship
AI systems rely on high-quality, well-governed data. Provenance tracking, classification, and secure handling are essential to prevent misuse or leakage. In DoD contexts, this also means aligning with data sovereignty, export control, and classification handling rules. Data integrity is foundational: corrupted or untrusted data undermines autonomy at every level.
Workforce Evolution
As autonomy increases, human roles shift from being direct executors to acting as supervisors, auditors, and curators. This demands new skills in governance, oversight, and AI literacy. Workforce design must anticipate changes in responsibility — for example, engineers becoming reviewers of AI outputs, or acquisition officers becoming stewards of AI-enabled contracts.
Governance Alignment
Every stage of autonomy must remain aligned with evolving governance frameworks. This includes NIST AI RMF, DoD AI Ethical Principles, and acquisition policies that mandate human accountability. Governance is not static: it must adapt in parallel with technology, ensuring compliance and safety nets as systems become more autonomous.
📌 Key Insight: These cross-cutting considerations are not optional add-ons — they are the safety rails that keep autonomy from becoming fragility. Whether operating with pattern 1 or experimenting with pattern 4 proto-flywheel systems , these dimensions must be addressed continuously.
5. How to Use This Play
This play is not a maturity roadmap. It is a pattern selection guide: a way for organizations to match the right AI autonomy approach to each workflow while maintaining mission assurance and operational control. Teams will likely implement multiple patterns simultaneously across different parts of their SDLC.
1. Map Your Current AI Usage
Before selecting new patterns, understand what's already in use:
- Inventory existing AI tools across all SDLC phases (development, testing, deployment, operations)
- Identify which patterns are already active (even informally)
- Assess governance coverage - are all AI uses visible and approved?
- Map Shadow AI risks - what unauthorized tools might teams be using?
This creates a baseline reality of your current AI autonomy footprint.
2. Match Patterns to Workflows
Different SDLC workflows have different risk profiles, complexity needs, and human oversight requirements:
Workflow-Based Selection:
- High-stakes code (auth, crypto, safety-critical): Pattern 1 (human-controlled assistance)
- Repetitive tasks (testing, documentation): Pattern 2 (delegated agents)
- Complex orchestration (CI/CD pipelines): Pattern 3 (multi-agent coordination)
- Continuous optimization (performance tuning): Pattern 4 (experimental, if at all)
Risk-Based Selection:
- Sensitive data handling: Patterns 1-2 maximum
- Mission-critical systems: Pattern 1 preferred, Pattern 2 with extensive review
- Development/testing environments: Higher pattern experimentation acceptable
- Production systems: Conservative pattern selection with proven governance
3. Plan Multi-Pattern Architecture
Most organizations will operate multiple patterns simultaneously:
Architectural Considerations:
- Different trust boundaries for each pattern
- Varied governance controls based on pattern risk levels
- Integrated audit trails across all patterns
- Consistent human oversight models that scale across patterns
Implementation Strategy:
- Start with Pattern 1 broadly across low-risk workflows
- Introduce Pattern 2 selectively for bounded, repeatable tasks
- Consider Pattern 3 only after demonstrating success with Patterns 1-2
- Treat Pattern 4 as experimental research, not operational deployment
4. Build Pattern-Specific Readiness
Each pattern requires different organizational capabilities:
- Pattern 1: Human review processes, Shadow AI detection, prompt governance
- Pattern 2: Agent supervision skills, scoped permission management, escalation procedures
- Pattern 3: Multi-agent orchestration, system resilience, complex monitoring
- Pattern 4: Autonomous system stewardship, emergent behavior detection, advanced AI safety
Focus readiness efforts on the patterns you plan to implement, not on abstract "AI maturity."
5. Monitor Pattern Health
Success is measured by how well each pattern serves its intended workflows:
- Pattern effectiveness: Is the pattern solving the problem it was designed for?
- Governance coverage: Are all pattern implementations visible and controlled?
- Risk management: Are pattern-specific risks being adequately managed?
- Team confidence: Do operators trust and effectively supervise each pattern?
📌 Key insight: The goal is not to "advance" through patterns, but to optimize pattern selection for mission outcomes while maintaining appropriate control and assurance.
✅ Visual Checklist for Using This Play
Before implementing any AI autonomy pattern:
- Map Current State
- [ ] Have we inventoried all existing AI tool usage across the SDLC?
- [ ] Do we understand which patterns are already in use (formally or informally)?
-
[ ] Have we identified and addressed Shadow AI risks?
-
Match Patterns to Workflows
- [ ] Have we assessed the risk profile and requirements of each target workflow?
- [ ] Is the selected pattern appropriate for the data sensitivity and mission criticality?
-
[ ] Do we understand why this pattern is better than alternatives for this use case?
-
Build Readiness
- [ ] Do we have the infrastructure, governance, and skills needed for this specific pattern?
- [ ] Are human oversight and escalation procedures designed and tested?
-
[ ] Have we established pattern-specific success metrics and monitoring?
-
Deploy with Safeguards
- [ ] Are we starting with controlled, low-risk implementations?
- [ ] Do we have rollback plans and circuit breakers for pattern failures?
- [ ] Are we measuring pattern effectiveness and adjusting as needed?
6. Measures and Success Indicators
Implementing AI autonomy patterns requires evidence of effectiveness, not just adoption. This section provides the strategic measurement framework that leaders need to evaluate whether each pattern is serving its intended purpose, operating safely, and delivering value to mission outcomes.
📋 Strategic vs. Tactical Measurement: This section focuses on what to measure and why - the executive-level indicators needed for pattern selection, organizational readiness assessment, and cross-pattern governance. For how to measure and when - including specific thresholds, operational dashboards, monitoring procedures, and continuous improvement actions - see the AI Autonomy Implementation Guide.
Key Distinction: * This section (Strategic): Quarterly reviews, pattern effectiveness, organizational health indicators * Implementation Guide (Tactical): Daily monitoring, operational thresholds, alerting procedures, process adjustments
Pattern Success Philosophy
Each pattern has distinct success criteria reflecting its intended use and risk profile:
Pattern 1 (Assistive Tools) Success: Teams demonstrate calibrated trust in AI assistive tools—leveraging them effectively for appropriate tasks while maintaining rigorous human oversight and accountability. AI enhances productivity without compromising security, quality, or mission assurance.
Pattern 2 (Delegated Agents) Success: Teams effectively delegate bounded workflows to AI agents while maintaining appropriate human oversight, accountability, and trust calibration. Agents enhance productivity for repetitive tasks without compromising quality or introducing unacceptable risk.
Pattern 3 (Orchestrated Systems) Success: Organizations successfully coordinate multiple specialized AI agents to complete complex workflows while maintaining human oversight, system resilience, and full accountability. Multi-agent orchestration delivers efficiency gains without sacrificing governance or introducing unacceptable systemic risks.
Pattern 4 (Adaptive Ecosystems) Success: Organizations successfully operate self-optimizing AI ecosystems that continuously improve software quality, performance, and user value while maintaining mission alignment, regulatory compliance, and human stewardship. Autonomous optimization delivers transformational capabilities without compromising safety, governance, or accountability.
Cross-Pattern Health Indicators
These organizational-level metrics apply across all patterns and provide baseline health indicators for multi-pattern environments:
Governance Effectiveness: * Pattern Selection Appropriateness: % of workflows using the most suitable pattern based on risk, complexity, and oversight requirements * Cross-Pattern Governance Consistency: Uniform application of security, audit, and compliance controls across different autonomy patterns * AI Implementation Coverage: % of AI implementations operating within approved boundaries and oversight mechanisms
Organizational Readiness: * Multi-Pattern Coordination: Successful isolation and coordination between different pattern implementations to prevent conflicts or security issues * Resource Allocation Efficiency: Appropriate distribution of human oversight and technical resources based on pattern risk profiles * Workforce Adaptation: Effectiveness of role transitions from direct executors to supervisors, auditors, and curators
Trust and Safety: * Trust Calibration Maturity: Alignment between team confidence in AI outputs and actual reliability across all implemented patterns * Shadow AI Prevention: Declining trend in unauthorized AI tool use as governance matures and approved alternatives meet user needs * Incident Response Effectiveness: Speed and quality of response to AI-related security, quality, or compliance events
Leading vs. Lagging Indicators
Leading Indicators (predict future performance): * Training completion rates for pattern-specific skills * Governance policy coverage for new AI implementations * Readiness assessment scores before pattern deployment * Pilot program success rates in controlled environments
Lagging Indicators (measure actual outcomes): * Production incident rates attributed to AI implementations * Compliance audit findings related to AI governance * Mission outcome improvements attributable to AI patterns * Long-term trust calibration trends across teams
🔴Warning Signs Across All Patterns
Monitor these risk indicators regardless of which patterns are in use:
- Trust Miscalibration: Teams over- or under-trusting AI capabilities relative to actual performance
- Governance Gaps: AI implementations operating outside approved oversight mechanisms
- Review Fatigue: Declining quality of human oversight due to volume or repetition
- Pattern Drift: Gradual expansion of AI autonomy beyond intended boundaries
- Shadow AI Growth: Increasing use of unauthorized tools despite approved alternatives
- Cross-Pattern Conflicts: Different autonomy patterns interfering with each other or creating security vulnerabilities
Measurement Strategy Guidelines
Start with Baseline Measurement:
- Capture pre-implementation metrics for workflows targeted for AI augmentation
- Establish current trust calibration and governance coverage levels
- Document existing human effort and error rates for comparison
Focus on Pattern Effectiveness:
- Measure whether each pattern is solving the problems it was designed to address
- Evaluate pattern selection decisions based on actual outcomes vs. intended benefits
- Adjust pattern implementation based on evidence of effectiveness
Scale Monitoring to Risk:
- Apply more intensive monitoring to higher autonomy patterns
- Tailor governance controls to pattern-specific risk profiles
- Adjust measurement frequency based on pattern maturity and observed stability
Evolve Metrics with Maturity:
- Regularly reassess whether patterns are delivering intended value
- Update success criteria as patterns mature and organizational needs change
- Refine measurement approaches based on lessons learned across patterns
AI Autonomy Health Index
This index provides a snapshot of organizational readiness for operating AI-enabled workflows with any combination of autonomy patterns. Unlike a static maturity model, it measures operational health in key domains that can go up or down over time.
| Category | ✅ Green (Healthy) | ⚠️ Yellow (At Risk) | ❌ Red (Unhealthy) |
|---|---|---|---|
| Governance & Compliance | All AI tools/agents are sanctioned, registered, and monitored. NIST RMF/DoD AI guidance explicitly applied. | Most AI tools governed, but occasional Shadow AI reported. Logging/monitoring incomplete. | Significant Shadow AI use; no consistent policy for agent approval or monitoring. |
| Trust Calibration | AI outputs consistently reviewed & validated; structured feedback loops; trust scores improving. | Outputs inconsistently reviewed; feedback loops ad hoc; uneven team trust. | AI outputs bypass review; blind acceptance or outright rejection without calibration. |
| Workforce Readiness | Roles evolving into supervisors, auditors, curators. Training on AI oversight & ethics complete. | Workforce understands tools but lacks formal oversight/ethics training. Role shifts unclear. | Workforce either over-relies on AI without oversight or resists adoption entirely. |
| Risk Management | Playbooks for rollback, error containment, human overrides tested regularly. AI incident response defined. | Some safeguards exist but untested or inconsistent. AI risks not fully in playbooks. | No structured risk management. AI errors cause disruption without mitigation. |
How to use this index: Leaders can score themselves (✅ / ⚠️ / ❌) across categories quarterly to assess pattern implementation health. This avoids the trap of chasing "advanced" patterns and instead ensures effectiveness at whatever patterns are currently in use.
📌 Key Insight: Success is not measured by implementing "higher" patterns, but by optimizing pattern selection and implementation for mission outcomes. The best measurement strategy focuses on whether each pattern is solving the problems it was designed to address while maintaining appropriate risk controls.
For detailed metrics, monitoring procedures, and operational thresholds for each pattern, see the AI Autonomy Implementation Guide.
7. Recommendations & Next Best Play
- Treat AI autonomy patterns not as maturity levels, but as operational choices — reflecting how well different approaches serve specific workflows while maintaining mission assurance and appropriate control.
- A "higher" pattern is not "better" by default. The right pattern depends on workflow characteristics, risk tolerance, and operational context — not organizational sophistication.
- Use Pattern 1 and Pattern 2 as proving grounds for governance and trust calibration: they provide the safest environments to build AI oversight capabilities, measure effectiveness, and refine human-AI collaboration models before considering more complex implementations.
- Continuously reassess "AI health" with a pattern effectiveness index — evaluating whether each pattern implementation is solving its intended problems while maintaining security, governance, and mission alignment.
- Recognize that Pattern 4 is aspirational. Treat it as a design horizon for governance planning and workforce development, but focus operational efforts on optimizing Patterns 1-3 for current mission needs.
- Plan for multi-pattern architectures from the start — most mature organizations will operate multiple patterns simultaneously, requiring integrated governance, consistent oversight models, and coordinated risk management.
📌 Key Insight: By shifting from "maturity ladder" to health index, autonomy becomes about operating safely, responsibly, and mission-aligned with the patterns you've chosen — not about reaching the "highest" level of automation.
8. Closing Thoughts
The AI Autonomy Continuum is not a prescriptive path or an inevitable end state — it is a navigation aid. Each pattern provides a lens for understanding how autonomy shapes the SDLC, the risks that emerge, and the safeguards required.
Leaders should treat autonomy as a dynamic health index, not a maturity ladder. The real measure of success is whether teams can operate safely, securely, and effectively with their chosen patterns, while making thoughtful decisions about when and how to evolve their approach.
Different patterns serve different purposes — Pattern 1 may be the optimal long-term choice for sensitive workflows, while Pattern 3 might be ideal for complex orchestration needs. There is no "graduation" requirement, only the imperative to match the right pattern to the right problem while maintaining mission assurance.
This play is intentionally open-ended. Future plays may expand on topics such as governance models for multi-agent systems, workforce transformation, or testing methods for emergent behavior. Until then, the charge is clear: advance only as fast as your guardrails allow — and never confuse speed for readiness.
End of Play
-
Masterman, T., Besen, S., Sawtell, M., & Chao, A. (2024). The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey (No. arXiv:2404.11584). arXiv. https://doi.org/10.48550/arXiv.2404.11584 ↩
-
National Institute of Standards and Technology, Autonomy Levels for Unmanned Systems (ALFUS) Framework, Volume I: Terminology, NIST Special Publication 1011, Version 1.1, Sep. 2004. [Online]. Available: https://www.nist.gov/system/files/documents/el/isd/ks/NISTSP_1011_ver_1-1.pdf. [Accessed: Aug. 18, 2025]. ↩
-
Boston Consulting Group, "AI Agents: What They Are and Their Business Impact," BCG, Apr. 2025. [Online]. Available: https://www.bcg.com/capabilities/artificial-intelligence/ai-agents. [Accessed: Aug. 18, 2025]. ↩
-
E. H. Shortliffe, Computer-Based Medical Consultations: MYCIN. New York, NY, USA: Elsevier, 1976. ↩
-
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998. ↩
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 25, 2012, pp. 1097–1105. ↩
-
T. B. Brown et al., “Language models are few-shot learners,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 1877–1901. ↩
-
Department of Defense Chief Information Officer (DoD CIO), DoD Enterprise DevSecOps Reference Design, Version 1.0, Aug. 12, 2019. [Online]. Available: https://dodcio.defense.gov/Portals/0/Documents/DoD%20Enterprise%20DevSecOps%20Reference%20Design%20v1.0_Public%20Release.pdf. [Accessed: Aug. 18, 2025] ↩