Agentic AI

The Agentic AI Spectrum

There's a wide spectrum between asking ChatGPT a question and a full agentic system.

The difference in outcomes is dramatic.

Level 0

Prompting

The starting point. You open ChatGPT or Claude, describe what you need, and get an answer. Modern AI tools have expanded well beyond Q&A. You can upload files, generate documents, write and run code, and work through multi-step problems in a single conversation.

But the AI still has no persistent context about your project, your domain, or your team's patterns. Every session starts from zero. It's powerful for one-off tasks. It's not a system.

You → ChatGPT → Answer

Level 1

Assisted Work

You give the AI access to your materials and a set of instructions. It can now read your files, follow your guidelines, and produce work directly. This feels like a big leap, and it is. But it's still missing the infrastructure that makes a real system.

Instructions → AI Tool → Produces output → You review → Done

Most teams stop here and think they've built an agent system.

Level 2

Structured Agents

You define specialized agents with distinct roles: a researcher, a producer, a reviewer, a quality checker. Each agent has a specific job and doesn't do the others'. The system starts to have structure. But without conventions, skills, and per-project context, the agents still operate without the full picture.

Researcher: context gathering before any work begins
Producer: the specialist who does the work
Reviewer: quality and accuracy assessment
Quality: standards and consistency checks

Level 3

Full Agentic System

Every task starts with exploration. Conventions and skills are loaded per-project. Specialized agents handle production, quality, and review. Nothing is finalized without passing automated gates and explicit human approval. The system compounds; each project gets better as conventions accrete.

Explore → Produce → Quality → Review → Human Gate → Done ↑ | └───────────────── loop back ───────────────┘

Exploration first

Every task starts with research and context gathering before any work begins. The agent understands the existing patterns, conventions, and domain before proposing anything.

Conventions & skills

Project-specific rules and reusable skill libraries that agents follow. This is what makes the output consistent and aligned with your team's standards.

Quality gates

Automated checks and critic review run on every piece of work. Nothing reaches the human gate without passing defined quality standards first.

Human in the loop

Nothing is finalized without explicit human approval at a review gate. The human stays in control and can reject, request changes, or approve with confidence.

The Machinery

Two-Level Architecture

Toolkit-level org control and project-level specialization, with a template system for new projects.

↓

The system operates at two levels. The toolkit level defines how work gets done everywhere: Primary agents orchestrate, Router agents dispatch to the right specialist, global skills encode reusable workflows, global tools are available to all agents. This is org-wide control: consistent process across every project.

The project level defines what the work looks like for a specific codebase: Specialist agents (the actual implementers), project-specific skills, conventions that accrete as real patterns emerge, and project tools. Project-level rules override or extend toolkit defaults, so a .NET project and a React project can share the same orchestration layer while having completely different implementation specialists.

New projects of a known type (static web, .NET API, mobile app) are stamped out from templates. The right specialist agents, convention stubs, and configuration are installed automatically. The new project inherits org patterns from day one and adds its own conventions as work accretes.

┌─────────────────────────────────────────────────────┐ │ TOOLKIT LEVEL │ │ Primary agents · Router agents · Global skills │ │ Org-wide standards · Template system │ └──────────────────────┬──────────────────────────────┘ │ inherits + overrides ┌──────────────────────▼──────────────────────────────┐ │ PROJECT LEVEL │ │ Specialist agents · Project skills │ │ Project conventions · Project tools │ └─────────────────────────────────────────────────────┘

Automatic Looping

Inner loops run fully automated: Quality↔Produce↔Critic. The human only enters at the gate.

↓

The outer loop (Explore → Produce → Quality → Review → Human Gate) is visible in the flow diagram above. But inside that loop, there's an automated inner loop that runs without human involvement.

When Quality finds a problem, it triggers a re-delegation back to the producer. When a Critic flags a blocking issue, the fix is delegated and the critic re-runs. This inner loop can execute multiple times automatically. The human only sees work that has already passed all automated gates. No babysitting retries.

[ automated inner loop ] Quality → flag → Produce → fix → Quality → pass Critic → flag → Produce → fix → Critic → pass ↓ Human Gate

MCP

A standard protocol connecting agents to external tools and data sources.

↓

Model Context Protocol is a uniform standard for giving AI agents access to external tools: file systems, databases, APIs, browsers, and code execution environments. Any MCP server can connect to any MCP-compatible agent without custom integration work.

What makes it powerful in a full agentic system is the authorization model. The agent never holds API keys. The MCP layer handles credentials out-of-band, so the agent can only access what it's been explicitly given, and the code it writes can't leak secrets it was never shown.

Agent → MCP Client → MCP Server → Tool / Data source ↑ auth handled here, not in agent code

Embeddings

Semantic context loading: the right information at the right time, not everything at once.

↓

Embeddings convert text (code, documentation, tasks, conventions) into vectors that capture meaning. This lets the system find semantically relevant context for any given task, rather than loading everything and hoping the model figures it out.

The balance matters. Too little context and the agent misses patterns it needs. Too much and the signal drowns in noise and the model loses track of what's actually relevant. A well-designed system loads just enough: the conventions for this layer, the files most similar to the task, the prior decisions that apply here.

Too little → agent misses patterns, makes inconsistent decisions Just right → relevant context loaded, accurate and consistent output Too much → noise overwhelms signal, model loses focus

Code Mode

LLMs write typed code to call tools, handling complexity that raw tool calls can't.

↓

Most agents expose MCP tools directly to the LLM as JSON function calls. The LLM has to figure out which tool to call, with what parameters, in what order. It struggles with complexity because it has almost no real-world training data for tool calls.

The LLM writes code against that API, something it's extraordinarily good at having trained on millions of real codebases. The result handles far more tools, far more complexity, and chains calls together without feeding every intermediate result back through the model.

Old: LLM → JSON tool call → result → LLM → JSON tool call → result → ... New: LLM → writes TypeScript → executes in sandbox → final result → LLM

The difference is compounding

A full agentic system doesn't just make you faster. It makes you more consistent. Every task follows the same quality gates, the same review process, the same conventions. Over time, this consistency compounds into a process and output that's easier to maintain, easier to extend, and easier to hand off.

You also get speed. Not just from the AI working faster, but from the automation removing friction. No manual checking, no guessing whether the work is aligned with your standards. The system tells you.

And you get trust. Because the human is always in the loop, and the system is transparent about what it's doing and why. You're not handing your work to an AI. You're augmenting your team with one.