Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion content/Agents/creating-agents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,15 @@ All implement [StandardSchema](https://github.com/standard-schema/standard-schem

### Type Inference

TypeScript automatically infers types from your schemas:
TypeScript automatically infers types from your schemas. Don't add explicit type annotations to handler parameters:

```typescript
// Good: types inferred from schema
handler: async (ctx, input) => { ... }

// Bad: explicit types can cause issues
handler: async (ctx: AgentContext, input: MyInput) => { ... }
```

```typescript
const agent = createAgent('Search', {
Expand Down Expand Up @@ -364,6 +372,10 @@ handler: async (ctx, input) => {

## Next Steps

<Callout type="tip" title="AI-Assisted Development">
The [OpenCode plugin](/Reference/CLI/opencode-plugin) provides AI-assisted development for full-stack Agentuity projects, including agents, routes, frontend, and deployment.
</Callout>

- [Using the AI SDK](/Agents/ai-sdk-integration): Add LLM capabilities with generateText and streamText
- [Managing State](/Agents/state-management): Persist data across requests with thread and session state
- [Calling Other Agents](/Agents/calling-other-agents): Build multi-agent workflows
46 changes: 46 additions & 0 deletions content/Agents/evaluations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,21 @@ description: Automatically test and validate agent outputs for quality and compl

Evaluations (evals) are automated tests that run after your agent completes. They validate output quality, check compliance, and monitor performance without blocking agent responses.

## Why Evals?

Most evaluation tools test the LLM: did the model respond appropriately? That's fine for chatbots, but agents aren't single LLM calls. They're entire runs with multiple model calls, tool executions, and orchestration working together.

Agent failures can happen anywhere in the run—a tool call that returned bad data, a state bug that corrupted context, and more. Testing just the LLM response misses most of this.

Agentuity evals test the whole run—every tool call, state change, and orchestration step. They run on every session in production, so you catch issues with real traffic.

**The result:**

- **Full-run evaluation**: Test the entire agent execution, not just LLM responses
- **Production monitoring**: Once configured, evals run automatically on every session
- **Async by default**: Evals don't block responses, so users aren't waiting
- **Preset library**: Common checks (PII, safety, hallucination) available out of the box

Evals come in two types: **binary** (pass/fail) for yes/no criteria, and **score** (0-1) for quality gradients.

<Callout type="info" title="Where Scores Appear">
Expand Down Expand Up @@ -426,6 +441,37 @@ export const politenessCheck = agent.createEval(politeness({

All preset evals use a default model optimized for cost and speed. Override `model` when you need specific capabilities.

### Lifecycle Hooks

Preset evals support `onStart` and `onComplete` hooks for custom logic around eval execution:

```typescript
import { politeness } from '@agentuity/evals';

export const politenessCheck = agent.createEval(politeness({
onStart: async (ctx, input, output) => {
ctx.logger.info('Starting politeness eval', {
inputLength: input.request?.length,
});
},
onComplete: async (ctx, result) => {
// Track results in external monitoring
if (!result.passed) {
ctx.logger.warn('Politeness check failed', {
score: result.score,
reason: result.reason,
});
}
},
}));
```

**Use cases for lifecycle hooks:**
- Log eval execution for debugging
- Send results to external monitoring systems
- Track eval performance metrics
- Trigger alerts on failures

### Schema Middleware

Preset evals expect a standard input/output format:
Expand Down
23 changes: 23 additions & 0 deletions content/Agents/schema-libraries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,29 @@ type User = s.infer<typeof User>;
// { name: string; age: number; role: 'admin' | 'user' }
```

### JSON Schema Generation

Convert schemas to JSON Schema for use with LLM structured output:

```typescript
import { s } from '@agentuity/schema';

const ResponseSchema = s.object({
answer: s.string(),
confidence: s.number(),
});

// Generate JSON Schema
const jsonSchema = s.toJSONSchema(ResponseSchema);

// Generate strict JSON Schema for LLM structured output
const strictSchema = s.toJSONSchema(ResponseSchema, { strict: true });
```

<Callout type="tip" title="Strict Mode for LLMs">
Use `{ strict: true }` when generating schemas for LLM structured output (e.g., OpenAI's `response_format`). Strict mode ensures the schema is compatible with model constraints and produces more reliable outputs.
</Callout>

<Callout type="info" title="When to Use">
Use `@agentuity/schema` for simple validation needs. For advanced features like email validation, string length constraints, or complex transformations, consider Zod or Valibot.
</Callout>
Expand Down
65 changes: 46 additions & 19 deletions content/Agents/standalone-execution.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,20 @@ import { createAgentContext } from '@agentuity/runtime';
import chatAgent from '@agent/chat';

const ctx = createAgentContext();
const result = await ctx.invoke(() => chatAgent.run({ message: 'Hello' }));
const result = await ctx.run(chatAgent, { message: 'Hello' });
```

The `invoke()` method executes your agent with full infrastructure support: tracing, session management, and access to all storage services.
The `run()` method executes your agent with full infrastructure support: tracing, session management, and access to all storage services.

For agents that don't require input:

```typescript
const result = await ctx.run(statusAgent);
```

<Callout type="info" title="Legacy invoke() Method">
The older `ctx.invoke(() => agent.run(input))` pattern still works but `ctx.run(agent, input)` is preferred for its cleaner syntax.
</Callout>

## Options

Expand All @@ -45,10 +55,7 @@ await createApp();
// Run cleanup every hour
cron.schedule('0 * * * *', async () => {
const ctx = createAgentContext({ trigger: 'cron' });

await ctx.invoke(async () => {
await cleanupAgent.run({ task: 'expired-sessions' });
});
await ctx.run(cleanupAgent, { task: 'expired-sessions' });
});
```

Expand All @@ -58,35 +65,33 @@ For most scheduled tasks, use the [`cron()` middleware](/Routes/cron) instead. I

## Multiple Agents in Sequence

Run multiple agents within a single `invoke()` call to share the same session and tracing context:
Run multiple agents in sequence with the same context:

```typescript
const ctx = createAgentContext();

const result = await ctx.invoke(async () => {
// First agent analyzes the input
const analysis = await analyzeAgent.run({ text: userInput });

// Second agent generates response based on analysis
const response = await respondAgent.run({
analysis: analysis.summary,
sentiment: analysis.sentiment,
});
// First agent analyzes the input
const analysis = await ctx.run(analyzeAgent, { text: userInput });

return response;
// Second agent generates response based on analysis
const response = await ctx.run(respondAgent, {
analysis: analysis.summary,
sentiment: analysis.sentiment,
});
```

Each `ctx.run()` call shares the same session and tracing context.

## Reusing Contexts

Create a context once and reuse it for multiple invocations:

```typescript
const ctx = createAgentContext({ trigger: 'websocket' });

// Each invoke() gets its own session and tracing span
// Each run() gets its own session and tracing span
websocket.on('message', async (data) => {
const result = await ctx.invoke(() => messageAgent.run(data));
const result = await ctx.run(messageAgent, data);
websocket.send(result);
});
```
Expand All @@ -104,6 +109,28 @@ Standalone contexts provide the same infrastructure as HTTP request handlers:
- **Session events**: Start/complete events for observability
</Callout>

## Detecting Runtime Context

Use `isInsideAgentRuntime()` to check if code is running within the Agentuity runtime:

```typescript
import { isInsideAgentRuntime, createAgentContext } from '@agentuity/runtime';
import myAgent from '@agent/my-agent';

async function processRequest(data: unknown) {
if (isInsideAgentRuntime()) {
// Already in runtime context, call agent directly
return myAgent.run(data);
}

// Outside runtime, create context first
const ctx = createAgentContext();
return ctx.run(myAgent, data);
}
```

This is useful for writing utility functions that work both inside agent handlers and in standalone scripts.

## Next Steps

- [Calling Other Agents](/Agents/calling-other-agents): Agent-to-agent communication patterns
Expand Down
4 changes: 4 additions & 0 deletions content/Agents/streaming-responses.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ Streaming requires both: `schema.stream: true` in your agent (so the handler ret

Enable streaming by setting `stream: true` in your schema and returning a `textStream`:

<Callout type="info" title="AI SDK Integration">
The `textStream` from AI SDK's `streamText()` works directly with Agentuity's streaming middleware. Return it from your handler without additional processing.
</Callout>

```typescript
import { createAgent } from '@agentuity/runtime';
import { streamText } from 'ai';
Expand Down
13 changes: 13 additions & 0 deletions content/Agents/workbench.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ description: Use the built-in development UI to test agents, validate schemas, a

Workbench is a built-in UI for testing your agents during development. It automatically discovers your agents, displays their input/output schemas, and lets you execute them with real inputs.

## Why Workbench?

Testing agents isn't like testing traditional APIs. You need to validate input schemas, see how responses format, test multi-turn conversations, and understand execution timing. Using `curl` or Postman means manually constructing JSON payloads and parsing responses.

Workbench understands your agents. It reads your schemas, generates test forms, maintains conversation threads, and shows execution metrics. When something goes wrong, you see exactly what the agent received and returned.

**Key capabilities:**

- **Schema-aware testing**: Input forms generated from your actual schemas
- **Thread persistence**: Test multi-turn conversations without manual state tracking
- **Execution metrics**: See token usage and response times for every request
- **Quick iteration**: Test prompts display in the UI for one-click execution

## Enabling Workbench

Add a `workbench` section to your `agentuity.config.ts`:
Expand Down
8 changes: 8 additions & 0 deletions content/Get-Started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,14 @@ After your first deployment, the App populates with:

## What's Next?

<Callout type="tip" title="AI-Assisted Development">
Install the [OpenCode plugin](/Reference/CLI/opencode-plugin) for AI-assisted agent development. Get help writing agents, debugging, and deploying directly from your editor:

```bash
agentuity ai opencode install
```
</Callout>

**Learn the concepts:**

- [Understanding How Agents Work](/Learn/Cookbook/Tutorials/understanding-agents): Tools, loops, and autonomous behavior
Expand Down
Loading