Agent Harness Capabilities

The agent harness is the core runtime that provides Deep Agent with its advanced capabilities. It wraps AI SDK's ToolLoopAgent with built-in tools and features that enable complex, multi-step reasoning.

Think of the harness as an "agent operating system" - it provides the environment and tools agents need to tackle complex tasks.

Overview

The harness provides:

File system access - Six tools for file operations
Task planning - Built-in write_todos tool for decomposition
Subagent spawning - task tool for delegating work
Tool result eviction - Automatic context management
Human-in-the-loop - Approval workflows for sensitive operations
Event streaming - Real-time observability

File System Access

The harness provides six tools for file system operations, making files first-class citizens in the agent's environment:

Available Tools

Tool	Description
`ls`	List files in a directory with metadata (size, modified time)
`read_file`	Read file contents with line numbers, supports offset/limit for large files
`write_file`	Create new files
`edit_file`	Perform exact string replacements in files (with global replace mode)
`glob`	Find files matching patterns (e.g., `*/.ts`)
`grep`	Search file contents with multiple output modes (files only, content with context, or counts)

Tool Usage Examples

import { createDeepAgent } from 'deepagentsdk';
import { anthropic } from '@ai-sdk/anthropic';

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
});

// Agent can use all filesystem tools
const result = await agent.generate({
  prompt: `
  1. List all TypeScript files in the src directory
  2. Read the main.ts file
  3. Search for "TODO" comments across all files
  4. Create a summary file
  `,
});

Tool Result Eviction

The harness automatically dumps large tool results to the file system when they exceed a token threshold, preventing context window saturation.

How it works:

Monitors tool call results for size (default threshold: 20,000 tokens)
When exceeded, writes the result to a file instead
Replaces the tool result with a concise reference to the file
Agent can later read the file if needed

Configuration:

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  toolResultEvictionLimit: 20000, // Default: 20,000 tokens
});

Token Savings: Without eviction, reading a 10,000-line file would consume ~50,000 tokens. With eviction, it uses ~500 tokens (file reference) + the specific lines the agent actually reads.

Pluggable Storage Backends

The harness abstracts file system operations behind a protocol, allowing different storage strategies for different use cases.

Built-in Backends

Backend	Description	Use Case
StateBackend	Ephemeral in-memory storage	Temporary working files, single-thread conversations
FilesystemBackend	Real filesystem access	Local projects, CI sandboxes, mounted volumes
PersistentBackend	Cross-conversation storage	Long-term memory, knowledge bases
CompositeBackend	Route different paths to different backends	Hybrid storage strategies

Backend Configuration

import {
  StateBackend,
  FilesystemBackend,
  PersistentBackend,
  CompositeBackend
} from 'deepagentsdk';

// Example 1: Simple filesystem access
const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  backend: new FilesystemBackend({ rootDir: './workspace' }),
});

// Example 2: Hybrid storage (ephemeral + persistent)
const backend = new CompositeBackend(
  new StateBackend(), // Default: ephemeral
  {
    '/memories/': new PersistentBackend({ store: myStore }),
  }
);

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  backend,
});

See: Backends Documentation for complete backend guide.

Task Delegation (Subagents)

The harness allows the main agent to create ephemeral "subagents" for isolated multi-step tasks.

Why Use Subagents?

Context isolation - Subagent's work doesn't clutter main agent's context
Parallel execution - Multiple subagents can run concurrently
Specialization - Subagents can have different tools/configurations
Token efficiency - Large subtask context is compressed into a single result

How It Works

Main agent has a task tool
When invoked, creates a fresh agent instance with its own context
Subagent executes autonomously until completion
Returns a single final report to the main agent
Subagents are stateless (can't send multiple messages back)

General-Purpose Subagent

In addition to any user-defined subagents, Deep Agent has access to a general-purpose subagent at all times:

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  // No subagents needed - general-purpose is always available
});

// Agent can delegate complex tasks automatically
const result = await agent.generate({
  prompt: 'Analyze this codebase and find all API endpoints',
  // Agent may use task tool to delegate to general-purpose subagent
});

See: Subagents Documentation for complete subagent guide.

To-Do List Tracking

The harness provides a write_todos tool that agents can use to maintain a structured task list.

Features

Track multiple tasks with statuses (pending, in_progress, completed)
Persisted in agent state
Helps agent organize complex multi-step work
Useful for long-running tasks and planning

Example Usage

const result = await agent.generate({
  prompt: 'Build a REST API with authentication',
});

// Access the todo list
result.state.todos.forEach(todo => {
  console.log(`[${todo.status}] ${todo.content}`);
});

// Output:
// [completed] Design API endpoints
// [completed] Set up project structure
// [in_progress] Implement authentication middleware
// [pending] Add input validation
// [pending] Write tests

Automatic Planning: Agents are prompted to use write_todos before starting complex tasks. This happens automatically - you don't need to instruct them to do it.

Human-in-the-Loop

The harness pauses agent execution at specified tool calls to allow human approval/modification.

Configuration

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  interruptOn: {
    write_file: true,  // Pause before every write
    edit_file: true,
    execute: true,
  },
});

Approval Workflow

for await (const event of agent.streamWithEvents({
  prompt: 'Delete all test files',
  onApprovalRequest: async ({ toolName, args }) => {
    console.log(`\n⚠️  Tool "${toolName}" requires approval`);
    console.log('Arguments:', JSON.stringify(args, null, 2));

    // Prompt user for approval
    const answer = await promptUser('Approve? (y/n): ');
    return answer.toLowerCase() === 'y';
  },
})) {
  // Handle events...
}

See: Human-in-the-Loop Documentation for complete approval workflow guide.

Conversation History Summarization

The harness automatically compresses old conversation history when token usage becomes excessive.

Configuration

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  summarization: {
    tokenThreshold: 170000,  // Trigger at 170k tokens
    keepMessages: 6,          // Keep 6 most recent messages
    model: anthropic('claude-haiku-4-5-20251001'), // Model for summarization
  },
});

How It Works

Monitors conversation token count
When threshold exceeded, summarizes old messages
Keeps recent messages intact (default: 6)
Replaces old messages with a summary
Transparent to agent (appears as special system message)

Benefit: Enables very long conversations without hitting context limits while preserving recent context for continuity.

Interrupt Message Repair

The harness fixes message history when tool calls are interrupted or cancelled before receiving results.

The Problem

Agent requests tool call: "Please run X"
Tool call is interrupted (user cancels, error, etc.)
Agent sees tool_call in AIMessage but no corresponding ToolMessage
This creates an invalid message sequence

The Solution

The harness detects AIMessages with tool_calls that have no results and creates synthetic ToolMessage responses indicating the call was cancelled, then repairs the message history before agent execution.

Why This Matters: Prevents agent confusion from incomplete message chains and gracefully handles interruptions and errors, maintaining conversation coherence.

Prompt Caching (Anthropic)

The harness enables Anthropic's prompt caching feature to reduce redundant token processing.

How It Works

Caches portions of the prompt that repeat across turns
Significantly reduces latency and cost for long system prompts
Automatically skips for non-Anthropic models

Configuration

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  enablePromptCaching: true, // Default: true for Anthropic models
});

Performance: System prompts (especially with filesystem docs) can be 5k+ tokens. Caching provides ~10x speedup and cost reduction for cached portions.

Event Streaming

The harness provides real-time events for observability and debugging.

Event Types

for await (const event of agent.streamWithEvents({
  prompt: 'Build a web app',
})) {
  switch (event.type) {
    case 'text':
      // Streaming text chunks
      process.stdout.write(event.text);
      break;

    case 'step-start':
      // New reasoning step
      console.log(`\n--- Step ${event.step} ---`);
      break;

    case 'tool-call':
      // Tool being executed
      console.log(`Tool: ${event.toolName}`);
      break;

    case 'todos-changed':
      // Todo list updated
      console.log(`Todos: ${event.todos.length} total`);
      break;

    case 'file-written':
      // File created
      console.log(`Created: ${event.path}`);
      break;

    case 'subagent-start':
      // Subagent spawned
      console.log(`Subagent: ${event.subagentType}`);
      break;

    case 'done':
      // Task complete
      console.log('\n✅ Done!');
      break;
  }
}

All Event Types

Event Type	Description
`text`	Streaming text chunks
`step-start`	New reasoning step began
`step-finish`	Reasoning step completed
`tool-call`	Tool was called
`tool-result`	Tool returned a result
`todos-changed`	Todo list was updated
`file-write-start`	File write is starting
`file-written`	File was written successfully
`file-edited`	File was edited
`file-read`	File was read
`ls`	Directory was listed
`glob`	Glob search completed
`grep`	Grep search completed
`web-search-start`	Web search started
`web-search-finish`	Web search completed
`http-request-start`	HTTP request started
`http-request-finish`	HTTP request completed
`subagent-start`	Subagent was spawned
`subagent-finish`	Subagent completed
`error`	An error occurred
`done`	Agent finished successfully

Web Tools (Optional)

When TAVILY_API_KEY is set, the harness automatically adds web search and HTTP request tools.

Available Web Tools

Tool	Description	Requires
`web_search`	Search the web using Tavily API	`TAVILY_API_KEY`
`http_request`	Make HTTP requests	`TAVILY_API_KEY`
`fetch_url`	Fetch and read URL content	`TAVILY_API_KEY`

Configuration

# Set environment variable
export TAVILY_API_KEY=tvly-your-key-here

// Web tools are automatically available
const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
});

// Agent can now search the web
const result = await agent.generate({
  prompt: 'Search for recent AI news and summarize',
});

Command Execution (Optional)

When using LocalSandbox backend, the harness adds an execute tool for running shell commands.

Configuration

import { LocalSandbox } from 'deepagentsdk';

const agent = createDeepAgent({
  model: anthropic('claude-sonnet-4-5-20250929'),
  backend: new LocalSandbox({
    cwd: './workspace',
    timeout: 60000, // 60 second timeout
  }),
});

// execute tool is automatically added
const result = await agent.generate({
  prompt: 'Initialize a Node.js project and install dependencies',
});

Safety: The execute tool is powerful. Consider enabling interruptOn for approval workflows when using it.

Best Practices

1. Choose the Right Backend

// ✅ Good: Match backend to use case
const agent = createDeepAgent({
  backend: new FilesystemBackend({ rootDir: './project' }),
  // Use for: Working with existing codebases

  backend: new StateBackend(),
  // Use for: Temporary scratch space

  backend: new CompositeBackend(stateBackend, {
    '/memories/': persistentBackend,
  }),
  // Use for: Hybrid ephemeral + persistent storage
});

2. Enable Tool Result Eviction for Large Files

// ✅ Good: Prevent context bloat
const agent = createDeepAgent({
  toolResultEvictionLimit: 20000,
});

3. Use Human-in-the-Loop for Destructive Operations

// ✅ Good: Require approval for dangerous tools
const agent = createDeepAgent({
  interruptOn: {
    write_file: true,
    edit_file: true,
    execute: true,
  },
});

4. Leverage Subagents for Complex Tasks

// ✅ Good: Define specialized subagents
const agent = createDeepAgent({
  subagents: [
    {
      name: 'researcher',
      description: 'Conducts in-depth research',
      systemPrompt: 'You are a research specialist...',
      tools: [webSearchTool],
    },
    {
      name: 'coder',
      description: 'Writes and reviews code',
      systemPrompt: 'You are a software engineer...',
      tools: [executeTool, fileTools],
    },
  ],
});

5. Monitor Events for Observability

// ✅ Good: Stream events for debugging
for await (const event of agent.streamWithEvents({
  prompt: complexTask,
  onEvent: (event) => {
    // Log all events for debugging
    console.log('[EVENT]', event.type, event);
  },
})) {
  // Handle UI updates...
}

Summary

The agent harness provides:

Capability	Tool/Feature	Benefit
File operations	`ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep`	Persistent context management
Task planning	`write_todos`	Automatic decomposition
Subagent spawning	`task` tool	Context isolation
Storage abstraction	Backend system	Flexible persistence
Human-in-the-loop	`interruptOn`	Safety controls
Context management	Tool result eviction	Token efficiency
Observability	Event streaming	Real-time monitoring
Long conversations	Summarization	Unlimited context
Web access	`web_search`, `http_request`	Live information
Code execution	`execute` (with LocalSandbox)	Project automation

Key Insight: The harness transforms a basic tool-calling agent into a sophisticated system capable of planning, context management, delegation, and long-running tasks.

Next Steps

Backends Documentation - Deep dive into storage options
Subagents Documentation - Master subagent patterns
Human-in-the-Loop Documentation - Implement approval workflows
Middleware Documentation - Extend harness with custom behavior

Agent Harness Capabilities

On this page