Private Beta — Contact us to get set up.
What guardrails catch
PII detection finds personally identifiable information—social security numbers, credit cards, email addresses, phone numbers, and 50+ other entity types. You can block requests containing PII, redact it before the tool executes, or just log a warning. Secret detection catches API keys, tokens, and credentials that shouldn’t be in tool arguments. AWS keys, GitHub tokens, Stripe secrets—patterns for common providers, plus generic high-entropy string detection. URL filtering controls which domains tools can reference. Allowlists for approved domains, blocklists for known-bad ones. Jailbreak detection uses an LLM to identify prompt injection in tool arguments. Attackers try to embed instructions in data that gets passed to tools—guardrails catch these before execution. Topical alignment ensures requests stay within intended bounds. If a tool is meant for customer support, guardrails can flag requests that drift into unrelated territory. Custom guards let you define org-specific policies as LLM prompts. Compliance rules, brand safety, domain-specific constraints—whatever your organization needs.How guardrails respond
| Action | What happens |
|---|---|
| Block | Request rejected, user gets error |
| Redact | Sensitive content masked, execution continues |
| Warn | Violation logged, execution continues |
Configuration layers
Guardrails are configured at three levels: Org level sets defaults for all applications. Enable PII detection for everything, set the jailbreak threshold, configure which custom guards apply. App level overrides or extends org defaults. A customer-facing app might have stricter guardrails than an internal tool. An app handling financial data might enable additional compliance guards. Request level passes context that guards can use. Department, user tier, request source—information that helps guards make better decisions.LLM-based guards
Jailbreak detection, topical alignment, and custom guards need an LLM. You configure an endpoint—OpenAI, Azure OpenAI, a self-hosted model, anything OpenAI-compatible. The guard sends content to the LLM with a system prompt that defines what to check. The LLM returns a judgment. If confidence exceeds your threshold, the guard triggers. This means you control:- Which LLM runs your guards (cost, latency, privacy tradeoffs)
- What prompts define your policies
- How sensitive the triggers are

