Skip to content

How Hermes decides

Hermes makes three choices on every place_layer call: type, comment, intent. It bases those on two pieces of input.

  1. State snapshot (returned by wait_for_my_turn): every layer placed so far, with type, who placed it, 3D position, frequency in Hz, and any comment. The agent sees the trajectory of the composition — “we’ve hit the root three times in a row, time for a tension move”.
  2. Initial handshake context: the descent’s scale, mode “feel”, and the layer-type / intent guidance. See MCP tool surface → Handshake context.

drone, texture, pulse, glitch, breath, bell, drip, swell, chord. Picked based on:

  • what’s already in the mix — does it need a low anchor (drone), a rhythmic body (pulse), a mark in the high register (bell, drip)?
  • descent depth — early layers usually establish; late layers resolve or release.
  • the mood the agent wants to build — comments from earlier placements (its own and the player’s) show the trajectory.

Under 80 characters. Floated into the HUD as the layer lands so the player feels the agent is responding to the composition, not just emitting moves. Comments are also preserved in the placement log and passed to Kimi at end-of-descent as “poetic intent” — they directly shape the glyph (see Glyph generation).

intent — the optional compositional bias

Section titled “intent — the optional compositional bias”

Five values, each mapping to a set of scale degrees:

IntentMaps toUse it for
tension♭2 / tritone / leading tonerestless, unresolved moves
releaseroot or fifthstable, grounded moves
color♭6 / 6th / 9thflavour notes, modal character
emphasisthirddeclares major/minor of the mode
hushlow root onlyrecede, hand back to silence

The bridge intersects the intent’s candidate degrees with the scale’s actual degree set, so e.g. tension in Pentatonic Minor (no half-steps) falls back to the available colour notes instead of failing.

Why the agent has musical agency, not just type-picking

Section titled “Why the agent has musical agency, not just type-picking”

Crucially, the agent doesn’t pick a frequency in Hz. It picks a role (tension, release, …) and the bridge maps that role onto the descent’s specific key. This means:

  • Hermes can be told “you’re in F♯ Lydian” and immediately understand what tension means in that mode (the ♯4 tritone) without needing to do music-theory arithmetic mid-tool-call.
  • Two descents with the same agent moves but different scales sound completely different. The agent’s musical intent is portable; the realisation is per-session.
  • The agent gets harder to “break” — there is no way for it to pick a pitch outside the scale, because pitches are computed by the bridge from intent, not chosen by the model.

Drone ignores intent and is pinned to root or fifth. Its job is to anchor the descent’s harmonic floor; tension or colour on a low fundamental doesn’t read musically and pushes saw harmonics into unpleasant resonance bands. Intent still drives the other 8 types.

The agent the player hears is their own Hermes, on their own machine. The bridge never calls an LLM during gameplay (Kimi only runs after the 15th layer). This means:

  • The player’s local agent setup (model variant, system prompt customisation, Hermes config) directly affects what they hear.
  • Different players hear genuinely different agents. The MCP context block is identical given the same session scale, but the model’s reasoning, vocabulary, and pacing are local to that player’s Hermes installation.
  • Latency is bounded by wait_for_my_turn long-poll plus model inference time, both within the player’s network. No round-trips to Anthropic / Nous / OpenAI.