Guardrails

Private Beta — Contact us to get set up.

Guardrails inspect tool arguments and responses. They catch sensitive data before it leaves, detect injection attacks, and enforce org-specific policies.

What guardrails catch

PII detection finds personally identifiable information—social security numbers, credit cards, email addresses, phone numbers, and 50+ other entity types. You can block requests containing PII, redact it before the tool executes, or just log a warning. Secret detection catches API keys, tokens, and credentials that shouldn’t be in tool arguments. AWS keys, GitHub tokens, Stripe secrets—patterns for common providers, plus generic high-entropy string detection. URL filtering controls which domains tools can reference. Allowlists for approved domains, blocklists for known-bad ones. Jailbreak detection uses an LLM to identify prompt injection in tool arguments. Attackers try to embed instructions in data that gets passed to tools—guardrails catch these before execution. Topical alignment ensures requests stay within intended bounds. If a tool is meant for customer support, guardrails can flag requests that drift into unrelated territory. Custom guards let you define org-specific policies as LLM prompts. Compliance rules, brand safety, domain-specific constraints—whatever your organization needs.

How guardrails respond

Action	What happens
Block	Request rejected, user gets error
Redact	Sensitive content masked, execution continues
Warn	Violation logged, execution continues

The action depends on the guard and the content. PII in a tool argument might get redacted. A jailbreak attempt gets blocked. A topical drift gets logged but allowed.

Configuration layers

Guardrails are configured at three levels: Org level sets defaults for all applications. Enable PII detection for everything, set the jailbreak threshold, configure which custom guards apply. App level overrides or extends org defaults. A customer-facing app might have stricter guardrails than an internal tool. An app handling financial data might enable additional compliance guards. Request level passes context that guards can use. Department, user tier, request source—information that helps guards make better decisions.

LLM-based guards

Jailbreak detection, topical alignment, and custom guards need an LLM. You configure an endpoint—OpenAI, Azure OpenAI, a self-hosted model, anything OpenAI-compatible. The guard sends content to the LLM with a system prompt that defines what to check. The LLM returns a judgment. If confidence exceeds your threshold, the guard triggers. This means you control:

Which LLM runs your guards (cost, latency, privacy tradeoffs)
What prompts define your policies
How sensitive the triggers are

Introduction

Getting Started

How-to Guides

Explanation

Private Beta

What guardrails catch

How guardrails respond

Configuration layers

LLM-based guards

Introduction

Getting Started

How-to Guides

Explanation

Private Beta

​What guardrails catch

​How guardrails respond

​Configuration layers

​LLM-based guards

What guardrails catch

How guardrails respond

Configuration layers

LLM-based guards