The pluggable side-channel hook for observing a run as it happens, with fail-open semantics so a broken notifier never kills a run.
Notifier
A notifier is an object you pass to agent.run(...) that gets called at every point where the loop mutates conversation history, completes an LLM call, finishes a tool, or emits a governance event. It is the official extension point for terminal output, metrics, logs, Slack/webhook fanout, real-time dashboards, and anything else that wants to watch a run without owning its durability.
Two rules apply everywhere: a notifier never blocks the run, and a notifier that raises never kills the run. Both are enforced by the framework, not by convention.
Where notifier fits next to recorder
Dendrux has two observers on the same set of hooks: the recorder and the notifier. They look similar but they do different jobs and have opposite failure policies.
See Recorder for the other side of this pair.
The hook surface
LoopNotifier and LoopRecorder share the same ten methods. The runner fires the run-level hooks; the loop fires everything else. Every hook takes run_id as its first positional argument so a single notifier instance can disambiguate concurrent runs without contextvars or implicit state.
The pairing is deliberate: every "started" hook has either a matching "completed" or "failed" hook. The two are mutually exclusive — exactly one of the pair fires per call. Tool failures arrive on on_tool_completed as ToolResult(success=False) because the loop already converts every dispatch error into a result; there is no separate on_tool_failed.
The Protocol lives in dendrux/loops/base.py:
@runtime_checkable
class LoopNotifier(Protocol):
async def on_run_started(self, run_id, *, agent_name=None, agent_model=None): ...
async def on_run_finished(self, run_id, result): ...
async def on_run_failed(self, run_id, error, *, iteration=None): ...
async def on_message_appended(self, run_id, message, iteration): ...
async def on_llm_call_started(self, run_id, iteration, *, semantic_messages=None, semantic_tools=None): ...
async def on_llm_call_completed(self, run_id, response, iteration, *, ...): ...
async def on_llm_call_failed(self, run_id, iteration, error, *, duration_ms=None): ...
async def on_tool_started(self, run_id, tool_call, iteration): ...
async def on_tool_completed(self, run_id, tool_call, tool_result, iteration): ...
async def on_governance_event(self, run_id, event_type, iteration, data, *, correlation_id=None): ...runtime_checkable means a class that implements all ten methods structurally satisfies isinstance(obj, LoopNotifier). You do not need to inherit. But for everyday work you should — see the next section.
BaseNotifier: the easy way to subclass
Implementing ten methods on every notifier is tedious, and it makes future protocol additions a breaking change for every existing implementation. dendrux.loops.base.BaseNotifier is a concrete class that ships no-op defaults for every hook. Subclass it and override only what you care about:
from dendrux.loops.base import BaseNotifier
class CaptureNotifier(BaseNotifier):
"""Minimal notifier — records every callback that fires."""
def __init__(self):
self.log = []
async def on_run_started(self, run_id, *, agent_name=None, agent_model=None):
self.log.append(f"on_run_started run={run_id} agent={agent_name!r}")
async def on_message_appended(self, run_id, message, iteration):
self.log.append(f"on_message_appended iter={iteration} role={message.role.value}")
async def on_llm_call_completed(self, run_id, response, iteration, **kwargs):
u = response.usage
self.log.append(f"on_llm_call_completed iter={iteration} input={u.input_tokens}")
async def on_tool_completed(self, run_id, tool_call, tool_result, iteration):
self.log.append(f"on_tool_completed iter={iteration} tool={tool_call.name!r}")
async def on_run_finished(self, run_id, result):
self.log.append(f"on_run_finished status={result.status.value}")Five overrides, ten hooks supported. The other five (on_run_failed, on_llm_call_started, on_llm_call_failed, on_tool_started, on_governance_event) inherit no-op defaults from BaseNotifier — they fire, your subclass ignores them. When the protocol grows in a future release, your code keeps working without changes.
Plugging one in
Unlike the recorder, the notifier is a per-call argument, not a constructor keyword. You pass it each time you call agent.run() or a resume method:
from dendrux import Agent
from dendrux.llm.anthropic import AnthropicProvider
from dendrux.notifiers import ConsoleNotifier
async with Agent(
provider=AnthropicProvider(model="claude-haiku-4-5"),
prompt="You are a calculator. Use the add tool.",
tools=[add],
database_url="sqlite+aiosqlite:///demo.db",
) as agent:
await agent.run("What is 15 + 27?", notifier=ConsoleNotifier())Per-call makes sense: a batch run wants metrics, a dev-loop invocation wants terminal output, and a production webhook wants Slack fanout. The same Agent can serve all three with different notifiers on different calls.
All submit_* and resume methods accept the same notifier= argument. If you pass a notifier to run() and a different one to submit_approval(), both are applied on their respective turns; nothing is carried over between calls.
What a real run looks like
Running a two-iteration ReAct query ("What is 15 + 27?" with an add tool) and printing capture.log afterwards:
on_run_started run=01HXX... agent='calculator'
on_message_appended iter=0 role=user
on_llm_call_completed iter=1 input=585
on_message_appended iter=1 role=assistant
on_tool_completed iter=1 tool='add'
on_message_appended iter=1 role=tool
on_llm_call_completed iter=2 input=667
on_message_appended iter=2 role=assistant
on_run_finished status=successNine callbacks for a two-iteration run. on_run_started opens it, on_run_finished closes it, and the body is interleaved messages, LLM completions, and tool completions in the order the loop executed them. Everything a human dashboard or an OpenTelemetry tracer needs to render a live transcript is here.
Why both started and completed
The lifecycle pairing was added so external observers can do correct work even when calls fail.
A pure completion-only surface looks tidy but breaks down under three real conditions:
- Provider exceptions never reach completion.
provider.complete()can raise — network blip, rate limit, malformed schema. With onlyon_llm_call_completed, the notifier never hears about that call. Withon_llm_call_startedandon_llm_call_failed, the notifier opens a span on start, marks it errored on failure, and the trace tells the truth. - Span timing is wrong without start. OpenTelemetry and similar tracers want a real start time, not "completion minus duration." The lifecycle pair gives them one.
- Some observers care about prefetch state. A dashboard that highlights "LLM call in progress" needs to know the call started before it ends.
The same logic applies to runs (on_run_started / on_run_finished / on_run_failed) and to tools (on_tool_started / on_tool_completed). Tool failures take a slightly different shape — the loop catches them and emits on_tool_completed with ToolResult(success=False) — because the loop already wraps every dispatch in error handling and there is nothing the notifier could do that the loop has not already done. LLM failures, by contrast, propagate, so the notifier needs an explicit failed hook.
ConsoleNotifier and CompositeNotifier
Dendrux ships two notifiers in dendrux.notifiers. Both subclass BaseNotifier.
ConsoleNotifier uses rich to render the run as a terminal panel with per-iteration steps:
╭──────────────────────────────────────────────────────╮
│ What is 15 + 27? │
╰──────────────────────────────────────────────────────╯
llm 654 tokens in 2.3s
Step 1
calling add a=15, b=27
done add 0.0s
llm 676 tokens in 0.8s
Step 2It overrides on_message_appended, on_llm_call_completed, on_tool_completed, and on_governance_event. The lifecycle hooks (on_run_started etc.) fire and are ignored — ConsoleNotifier does not yet render them, but you could subclass it to add a banner.
CompositeNotifier fans a single set of callbacks out to a list of inner notifiers. If you want both terminal output and a metrics sink, wrap them: CompositeNotifier([console, metrics]). It implements every hook by forwarding to each child, swallowing per-child exceptions so one broken notifier does not prevent the others from running.
For OpenTelemetry, install dendrux[otel] and pass OpenTelemetryNotifier() from dendrux.notifiers.otel. It emits a GenAI-semconv span tree (invoke_agent → chat → execute_tool) on the host application's existing TracerProvider. This is a V1 integration. See the OpenTelemetry recipe for setup, span shape, and what's left out for now (cross-process trace continuity, native metrics, and log signals are deferred until real usage validates the design).
Fail-open semantics
The loop does not call your notifier directly. It calls a thin wrapper in dendrux/loops/_helpers.py that swallows exceptions:
async def notify_message(notifier, run_id, message, iteration, warnings=None):
"""Notify notifier of a message append, swallowing exceptions."""
if notifier is None:
return
try:
await notifier.on_message_appended(run_id, message, iteration)
except Exception:
logger.warning("Notifier.on_message_appended failed", exc_info=True)
if warnings is not None:
warnings.append(f"on_message_appended failed at iteration {iteration}")Four things to pull out of that wrapper:
- None is fine. Passing no notifier is the common path. The wrapper short-circuits.
- Exceptions do not propagate. If your Slack webhook times out, the run carries on.
- The warning is logged. You will see the traceback in your log stream at warning level, so the bug is not silent, it is just not fatal.
- Warnings are collected on the run. A text label is appended to a per-run warnings list, which surfaces on the final
RunResult.meta["notifier_warnings"]and, when persisted by the runner, onrun_events. The run still succeeds, and the operator can see which callback failed where.
Every notifier hook is wrapped the same way (notify_run_started, notify_llm_started, notify_tool_started, etc.).
Why a side-channel at all
You might ask: if the recorder already writes run_events and an SSE client can read them back in order, do you need a notifier?
Yes, for three reasons.
- Latency.
run_eventsis a DB round-trip, then an SSE poll interval, then a client render. A notifier runs in the same event loop as the LLM call, and sees the event within a coroutine await. For terminal output, live metrics, or anything that wants sub-millisecond reaction, the notifier is the right channel. - Richer payloads. A notifier receives the full
Message,LLMResponse,ToolCall, andToolResultobjects. The DB event log stores a condensed projection (token counts, tool name, iteration, a correlation id). If you want the entire prompt, the entire response, or the full tool result, you get it in-memory in the callback. Persisting all of that would bloat the DB; putting it in the notifier avoids the tradeoff. - Out-of-band destinations. Slack, Datadog, OpenTelemetry, a custom websocket, a
tqdmprogress bar: none of these are storage. They do not want SSE. A notifier lets them hook in without pretending to be a durability layer.
The recorder is the run's canonical record. The notifier is its live broadcast. They coexist because their jobs are different, and the failure policies match those jobs: the canonical record refuses to drop rows, the live broadcast refuses to block the source.