# Patterns & anti-patterns

Recurring shapes we've seen work — and ones that quietly break the loop.

## Patterns that work

### One project per use case

Resist the urge to multiplex unrelated decisions through one project. "Support replies — EU" and "Support replies — US" can share a project if the policy is identical; "Support replies" and "Invoice tagging" should not.

### Risk flags are nouns, not sentences

`pii`, `legal`, `high_value`, `new_user`. Not `"this looks like it might be a refund request from a high value customer"`. The router does string matching.

### Reviewers see the input first, suggestion second

Counterintuitive but important: showing the AI suggestion first anchors the reviewer to it. The queue UI shows the original input prominently and the suggestion as the answer being evaluated.

### Calibrate before tightening

Don't raise `auto_threshold` because the queue is too big. Raise it because the calibration plot says items above 0.95 are reliable. The first move is data; the second is policy.

### Treat learnings as a dataset, not a log

Schedule a weekly review of `learnings`. Tag clusters. Ship guideline updates. Re-export for evals. If nobody owns the dataset, you don't have a loop — you have a queue.

### Sign every webhook, always

HMAC-SHA256 over the raw body, verify with `timingSafeEqual`. Skipping signature verification on a webhook endpoint that mutates state is a security incident waiting to happen.

## Anti-patterns to avoid

### Putting a human on the hot path "to be safe"

If 100% of items go to a human, you have built a slow AI, not an AI with oversight. The loop only adds value when the policy auto-approves the boring middle.

### Using confidence as the only signal

Confidence is a number your model produces. It is wrong sometimes. Risk flags are your safety net for the cases where the model is confidently wrong.

### Free-text reasons as the primary feedback channel

Free-text is fine as a *supplement*. The primary feedback is the structured override: what the AI said vs. what the human shipped. Reasons are unparseable at scale.

### One giant reviewer pool for everything

Reviewers develop pattern recognition for the project they work. Mixing all projects into one pool dilutes that expertise. Use `reviewer_pool` per project.

### Skipping the escalation tier

"Senior reviewer" exists because some decisions deserve a specific accountable person. Routing `legal` flags to the general pool is functionally the same as not having the flag.

### Letting the queue age silently

`max_queue_age_minutes` exists for a reason. If items routinely sit for hours, you have an SLA problem — either staff up, raise `auto_threshold`, or both.

### Storing roles on `profiles`

Don't. Roles live in `user_roles`, checked through `has_role()`. Anything else is a privilege-escalation bug waiting to happen.

### Treating the model as static

The model that calibrated at 0.92 last quarter does not calibrate at 0.92 this quarter. Re-check the calibration plot after every model swap, prompt change, or major guideline update.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hitl-01.gitbook.io/hitl-docs/human-in-the-loop/patterns.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
