Live demo

You're on hold.
Let the agents entertain you.

Cat-Herding AI is a multi-agent chat backend I built to explore what happens when an LLM “customer-service queue” isn't elevator music — it's a rotation of specialist AI agents whose job is to keep you entertained until the thing you were actually waiting for arrives. Jokes. GIFs. YouTube picks. A round of 20 Questions. A bedtime story if it's that kind of night.

Sign in through the bubble in the corner and the hold-flow bootstraps. The agents pass you around between themselves, each staying in character, each aware of the shared conversation — a small but honest multi-agent system with handoffs, goal-seeking, and rich tool use. The sign-in itself runs against my Rust OAuth2 Server over OAuth2 + PKCE, entirely in the browser.

The scenario: 20 minutes on hold

Pretend you called support, or you're queued for a long-running AI task, or a batch job kicks off at 3am. You have dead time. The classic product answer is “show a spinner.” The better product answer is “give people something to do.” This demo is me asking: what if the hold experience was itself the product?

Concretely, when you sign in on this page the widget goes into mode: 'demo' and the backend runs its hold-flow bootstrap: a welcome, an introduction of the agents on shift, and a proactive opener from whichever agent matches the moment. Every other page on this site mounts the same widget in mode: 'lean' , so it sits quietly in the corner without pinging you until you click it.

The agents on shift

Joke Teller

Clean-ish one-liners, callbacks to earlier jokes in the conversation, won't run out.

YouTube Guru

Tool-driven: picks a curated video based on your vibe, embeds it inline via youtube-nocookie.

Game Host

Runs 20 Questions, Would You Rather, trivia. Keeps score. Routes from ‘play a game’ intents.

Story Teller

Short interactive fiction. Will not hallucinate images — that was a bug fix (PR #172, if you're curious).

GIF Buddy

Reacts with a curated GIF when the vibe calls for one. Attachment flows through as an inline image.

Orchestrator

Picks which agent speaks next, issues handoff_event messages, keeps the conversation coherent.

Under the hood these are separate prompt+tool bundles behind a router. Streaming is Socket.IO; each token arrives individually so you see the response typed out. Handoffs surface as handoff_event frames and show up as a small “Transferring you to Agent Name…” message in the transcript.

Goal-seeking without a monolith

Each agent has its own lane — jokes, games, videos, stories — but the system as a whole is goal-seeking: keep the user engaged, and track any explicit goal they mention (“I just need a status update”, “I actually want to book a meeting”) so the right escalation path is always one intent away. The router and the per-agent tools make it possible to add a new persona without teaching every other agent about it.

I wrote about the broader pattern — small, cooperating, goal-aware agents instead of one giant prompt — in Goal-Seeking AI Architecture . This widget is one of the reference implementations.

What's going on under the hood

OAuth2 + PKCE, all client-side

The widget runs the full Authorization Code + PKCE flow in the browser: popup opens roauth2.cat-herding.net, user signs in, the popup posts the code back, and a same-origin proxy on the chat backend exchanges it for a JWT.

Multi-LLM backend

Claude (via Azure AI Foundry) + OpenAI for routing, tools, and rich media. Each agent is a prompt + tool bundle; the orchestrator picks who speaks next.

Kubernetes-native

Rust OAuth2 server and chat backend both run in AKS behind Istio — managed certs, JWT-aware policies, RequestAuthentication binding.

The sign-in request path

  1. 1 You click the floating chat bubble, then Sign in. The widget generates a PKCE verifier and challenge in the browser and opens a popup.
  2. 2 Popup lands on roauth2.cat-herding.net/oauth/authorize. You authenticate (optionally via GitHub or Google federation) and approve the scopes.
  3. 3 The auth server redirects the popup to chat.cat-herding.net/embed/callback.html with an authorization code. Callback postMessages the code back to the widget.
  4. 4 Widget POSTs verifier data to exchange it for a JWT. No client secrets are shipped in the client bundle.
  5. 5 Widget stores the token, connects to Socket.IO, and starts the multi-agent chat.

Embed it anywhere

The widget is a ~20 KB gzipped IIFE that mounts inside a Shadow DOM so host CSS can't bleed in. Drop it into any site:

<script src="https://chat.cat-herding.net/embed/cat-herding-chat.js" defer></script>
<script>
  window.addEventListener('load', () => {
    window.CatHerdingChat.init({
      apiUrl: 'https://chat.cat-herding.net',
      mode: 'lean',
      auth: {
        type: 'oauth2',
        issuer: 'https://roauth2.cat-herding.net',
        clientId: 'cat-herding-chat-embed',
        scopes: 'openid profile email',
      },
    });
  });
</script>