The Agentic Manifesto: Engineering in the Era of Autonomy

Canonical URL: This manifesto is also available at agenticmanifesto.ai.

The Agentic Manifesto

We are uncovering better ways of developing intelligent systems by doing it and helping others do it. Through this work we have come to value (adapting the original Agile Manifesto):

Emergent behavior over predefined logic
Dynamic goals and guardrails over static requirements
Continuous tuning over binary testing
Automated governance over manual management

That is, while there is value in the items on the right, we value the items on the left more.

Principles behind the Agentic Manifesto
The Breaking Point
The Determinism Gap
Agentic Delivery Lifecycle (ADLC)
The New Mandate

Principles behind the Agentic Manifesto

We follow these principles, adapted from the original Agile principles:

Our highest priority is to satisfy the user through continuous delivery of trustworthy, goal-oriented behavior.
Welcome dynamic goals, even late in operation. Agentic systems harness contextual shifts for the user’s advantage.
Deliver verifiable actions frequently, from seconds to minutes, with a preference for smaller, safer steps.
Humans and agents must work together continuously throughout the task.
Build systems as societies of agents. We govern the interactions and emergent culture of these digital organizations, not just individual actions.
The most efficient method of conveying intent to an agent is natural language, and from an agent is observable action.
Successful outcomes are the primary measure of progress.
Agentic processes promote sustainable oversight. We prefer asynchronous “Human-on-the-Loop” review over blocking “Human-in-the-Loop” intervention whenever safety permits.
Continuous attention to behavioral safety and ethical alignment enhances autonomy.
Economy of Action—maximizing goals achieved with minimal computational and cognitive load—is essential for scalable autonomy.
The best architectures are not built, but cultivated. We design the constraints and incentives that allow effective solutions to emerge from agent interactions.
At every interaction, the system reflects on its effectiveness, then automatically tunes its approach in real-time.

The Breaking Point

Your unit tests are a security blanket. For fifty years, we’ve relied on a single, comforting assumption: that if we write the logic, we control the outcome. That era is over.

From Waterfall’s rigidity to DevOps’ velocity, every framework we use is built on that now-obsolete assumption.

The arrival of Agentic AI—where systems act not as passive tools but as autonomous collaborators—represents a paradigm shift as profound as the move to cloud computing. These systems are non-deterministic by design. They do not just execute instructions; they interpret goals, form plans, and exhibit emergent behaviors that no human explicitly coded.

Attempting to build these systems with a traditional Software Delivery Lifecycle (SDLC) is not just inefficient; it’s dangerous. The 2024 DORA State of DevOps report provided the first data-driven warning: while AI boosted individual coding speed, it failed to improve—and often hurt—overall software delivery performance. We were using 20th-century tools to manage 21st-century entities, and the cracks were showing.

By the 2025 report, the nuance had become clear: organizations that adapt their practices to this new reality can revolutionize their delivery, while those that don’t will continue to struggle.

To build trustworthy agents at scale, we must move beyond the SDLC. A new operating model is needed that embraces uncertainty rather than trying to legislate it away. We need the Agentic Delivery Lifecycle (ADLC).

The Determinism Gap

The chasm between traditional software and agentic systems is what we call the “Determinism Gap.” It is the difference between a system whose output is known in advance and one whose output is discovered in real-time. This gap renders our most trusted engineering practices insufficient.

The Failure of Binary Testing

In the SDLC, a test passes or fails. In an agentic system, success is a spectrum. An agent’s response might be factually accurate but tonally disastrous. It might achieve a goal but use an inefficient or expensive path. Traditional unit tests cannot measure these qualitative nuances. The industry is moving from a world of verification (did it do what I said?) to validation (did it do what I wanted?), a shift that challenges the very foundation of some long-held testing dogmas (see: “Integrated Tests Are a Scam”).

The Invisibility of Regression

In deterministic code, we can trace the impact of a change. In agentic systems, a seemingly minor tweak to a master prompt or the addition of a single document to a RAG (Retrieval-Augmented Generation) knowledge base can radically alter the system’s “personality.” These are not code changes, yet they cause cascading, unpredictable regressions. It’s like “dependency hell,” but for behavior. A single new document in your RAG is effectively a silent, breaking API change to your entire system.

The Accountability Void

DevOps gave us observability for code. AgentOps must give us observability for decisions. When an autonomous agent takes an action that causes financial or reputational harm, “the model hallucinated” is not an acceptable root cause analysis. Without a new layer of automated governance, these “black box” decision-makers become unacceptable liabilities in an enterprise environment. This is a modern take on an old theme: “Your Software Is Made of People.”

The Multi-Agent Exponential

As we move from single agents to Multi-Agent Systems (MAS), complexity doesn’t grow linearly; it grows exponentially. We face entirely new classes of “sociological” failure modes: runaway feedback loops between agents, resource hoarding, and conflicting goal-states. Traditional APM (Application Performance Monitoring) cannot debug a negotiation between three autonomous agents; we need distributed behavioral tracing—observability tools that can visualize the chain of thought, tool use, and inter-agent dialogue across the entire system.

Dimension	Traditional SDLC (Deterministic)	Agentic Delivery Lifecycle (ADLC)
Core Unit	Code / Features	Behavior / Goals
Primary Goal	Functionality (Does it work?)	Goal-Achievement (Is it effective?)
System Behavior	Deterministic (Predictable)	Non-deterministic (Emergent)
Core Activity	Coding / Implementation	Tuning / Optimization
Planning Focus	Fixed Requirements	Goals & Guardrails
Testing Focus	Verification (Binary Pass/Fail)	Validation (Qualitative / Robustness)
Key Artifact	Executable Binary / Container	Versioned Agent (Prompts, Tools, Model)
Primary Risk	Logic Bugs / Scope Creep	Emergent Behaviors / Unpredictable Actions

Agentic Delivery Lifecycle (ADLC)

The Agentic Delivery Lifecycle (ADLC) is a continuous, tuning-centric methodology for governing autonomous AI. It shifts engineering focus from static code to dynamic behavior, from fixed requirements to flexible goals, and from binary testing to real-time oversight.

ADLC is not a rejection of established engineering principles, but an evolution of them. It does not replace the SDLC; it wraps it. The deterministic tools and APIs our agents rely on must still be built with rigorous SDLC discipline. ADLC extends our practices to govern the emergent behavior of the agents themselves.

It is a continuous, non-linear lifecycle composed of five core phases, synthesizing frameworks from Salesforce, among others, with real-world engineering experience.

graph TD
    P1[Phase 1: Ideation & Guardrails] --> P2

    subgraph "The Inner Loop"
        P2[Phase 2: Development & Empowerment] <-->|Iterate| P3[Phase 3: Validation & Robustness]
    end

    P3 -->|Pass| P4[Phase 4: Deployment & Release]
    P4 --> P5[Phase 5: Monitoring & Tuning]

    P5 -->|"The Outer Loop (Tuning)"| P2
    P5 -.->|"New Goals (Ideation)"| P1

    style P2 stroke-width:2px,stroke:#0288d1
    style P3 stroke-width:2px,stroke:#0288d1
    style P5 stroke-width:2px,stroke:#e65100

Phase 1: Ideation & Guardrails

Shift: From Requirements to Boundaries

We stop writing rigid functional specifications and start defining flexible goal-states. The critical engineering work here shifts from construction to urban planning. We do not define exactly how the agent will work; we define the zoning laws (what it must never do) and the incentive structures (e.g., task completion rates, user satisfaction scores) that it should strive for. We establish ethical guardrails, persona constraints, and escalation paths—both blocking “Human-in-the-Loop” for high-stakes decisions and asynchronous “Human-on-the-Loop” for continuous oversight—before a single user prompt is written.

Phase 2: Development & Empowerment (The Inner Loop)

Shift: From Coding Logic to Engineering Intent

The “inner loop” of development evolves from writing procedural code to cultivating the agent’s environment. This involves meticulous prompt engineering—from foundational principles to the art of meta-prompting, as detailed in “Unlock Elite Agents: The Art of Evolving LLM Prompts into System Masterpieces”—curating the knowledge base (e.g., RAG), and securely integrating the tools (e.g., APIs) the agent will use. We are not programming the agent’s every move; we are equipping it to plan its own moves.

Phase 3: Validation & Robustness

Shift: From Test Coverage to Behavioral Evaluation

We replace simple code coverage metrics with a “Well-Curated Evaluation Suite.” This is the new gold standard artifact of engineering—a massive, version-controlled set of scenarios, adversarial prompts (“red teaming”), and qualitative scoring rubrics that measure semantic intent rather than just exact string matches. This suite is the only way to objectively measure a non-deterministic system and detect the “invisible regressions” caused by prompt tuning.

Phase 4: Deployment & Release

Shift: From Binary Launch to Phased Introduction

Because agent behavior is emergent, we cannot fully trust it on Day 1. Deployments must use canary releases and phased rollouts, exposing the new agent version to a tiny fraction of real-world traffic while closely monitoring its behavioral metrics before a full release.

Phase 5: Monitoring & Tuning (The Outer Loop)

Shift: From Cost Center to Value Engine

This is the heart of the ADLC. In traditional software, “maintenance” is a cost center. In agentic systems, this “outer loop” is not maintenance; it is the primary engine of ROI. We continuously monitor production interactions to identify high-leverage tuning opportunities. Every hallucination fixed and every goal optimized directly increases the value of the digital employee. The system is never “finished”; it is a continuously appreciating asset.

The New Mandate

The transition to the Agentic Delivery Lifecycle is not optional for organizations that intend to lead in the AI era. It requires engineering leaders to make uncomfortable shifts: investing in governance as heavily as features, and valuing a robust evaluation suite as highly as production code.

If you are a technical leader, your job has changed. You are no longer just a builder of machines; you are a governor of digital societies. Build accordingly.