Annie meets gcx: cloning Grafana observability with audited runbooks

We write about the tools in Anyshift's ecosystem: the CLIs and platforms that Annie integrates with. This one is about gcx, Grafana Labs' agent-first CLI. It's the companion to Annie meets pup, where the same annie do muted a fleet through Datadog.

You have one service fully instrumented in Grafana: SLOs, dashboards, the complete observability picture. Four services depend on it downstream and have none of that.

annie ask, through the versioned infrastructure graph, can cross-reference the Grafana inventory, identify who is missing coverage, and work out which service is the right template. gcx can then push manifests and execute against the Grafana API. annie do is what joins the two.

Introducing annie do, with Grafana Labs' gcx

![Terminal running annie do "set up SLOs and dashboards for every service downstream of neo4j that doesn't already have one, model each after anyshift-backend". It prints a 24-step plan to clone observability from anyshift-backend to annie-intelligence, graph-connector, deepeval and anyshift-slackbot (write, dry-run, then push an SLO and a dashboard per service), ending in "Proceed? [y/N]".](/images/blog/annie-meets-grafana-plan.png)

Twenty-four steps, one approval: an SLO and a dashboard cloned to each uncovered service.

One approval, and four services each get an SLO and a dashboard. Eight resources in Grafana, none of them touched manually.

The shape of the problem

The obvious way to handle this would be to give Annie a Grafana API key and let her push resources directly. Three reasons we didn't:

1. Audit trail. Cloning observability across a fleet is a GitOps moment. The artifact has to be a reviewable file.

2. Trust boundary. Annie holds no Grafana keys. gcx does, via OAuth and the OS keychain.

3. Postmortem-grade evidence. When somebody asks "who decided to add an SLO for deepeval last Tuesday?", the answer is the YAML file under ~/.annie/runbooks/, with timestamps, the source-to-target mapping, and the evidence chain in its comment block.

What's new here, compared to a single mute, is reproducibility across N services. The same runbook structure scales from one target to a fleet without changing shape. The audit pattern doesn't degrade. The operator's job stays one y/N.

The thing gcx alone couldn't do

gcx can list SLOs, push manifests, pull dashboards as YAML. The Grafana UI can show you the resources that exist. Neither answers the operator's actual question:

Which of my services should have observability but don't, and which existing one is the right template to copy from?

That's a topology and coverage question. Annie answers it by combining her usual investigation toolkit with the Grafana state annie-cli pre-fetches at the start of the call:

The versioned infrastructure graph identifies the blast radius (four services depend on neo4j)
SCM search finds the NEO4J_URI references across *.tf and back-*.tf files
Sentry issue families confirm transitive dependencies (the slackbot cascade through anyshift-backend)
The pre-fetched gcx inventory tells her which already have SLOs (anyshift-backend does, the others don't)

The cross-reference is the new part. The generated runbook captures it as comments at the top of the file:

# Source: anyshift-backend → Targets (4):
#   - annie-intelligence: directly depends on neo4j (NEO4J_URI in
#         back-annie-intelligence.tf); no existing SLO in inventory
#   - graph-connector: primary neo4j writer; all graph ingestion flows
#         through it; no existing SLO in inventory
#   - deepeval: direct neo4j connection in deepeval.tf; no existing SLO
#   - anyshift-slackbot: transitive via anyshift-backend — no direct
#         NEO4J_URI but cascades on backend failure; Sentry SLACKBOT-14V/T/M
#         fires during neo4j storms; no existing SLO in inventory
# Investigation: cross-referenced Grafana SLO inventory (1 SLO tagged
#   service:anyshift-backend) against the four services downstream of
#   neo4j-production. Skipped the auto-generated SLO overview dashboard
#   (uid=grafana_slo_app-5zt2gp0xjru0hxd2eyih6) — Grafana recreates one
#   for each cloned SLO.

Real file references, real Sentry issue codes, the honest acknowledgement that slackbot is transitive rather than direct, and the deliberate skip of the SLO app's auto-generated dashboard. None of it comes from gcx alone, or the graph alone, or the LLM alone.

One detail worth dwelling on: deepeval. In the pup post, Annie excluded deepeval from the mute list because it has no Datadog service tag. Grafana SLOs key off labels, not Datadog tags, so that constraint doesn't apply, and here deepeval is in. Same blast radius, different vendor, different outcome. The graph identifies who; the constraints of each tool decide what's possible.

The handoff between Annie and gcx

When you run annie do for an observability bootstrap, two processes run in sequence.

Phase 1 — Annie investigates. Before planning anything, annie-cli fetches the current Grafana state (SLO definitions and dashboards via gcx). Annie then works the way she does for any investigation: traverse the versioned graph from the named resource, search IaC for connection strings, correlate Sentry issue families. The pre-fetched inventory tells her which services already have SLOs. She returns one JSON object: source service, target services, the SLO UUIDs to clone, the dashboard UIDs to clone.

Phase 2 — annie-cli renders the runbook. The Go layer pulls each source resource once via gcx, strips server-managed metadata, and derives substitutions automatically from each source-target name pair (hyphen and underscore variants, because Grafana label keys reject hyphens but values accept them). It embeds the substituted manifests inline. The operator can read every byte before approving.

annie do "..."
   │
   ▼  annie-cli: gcx slo definitions list · gcx dashboards list   (pre-fetch state)
   │
   ▼  Phase 1 — Annie (LLM): traverse the versioned graph + search SCM + correlate Sentry
   │           → one JSON object { source, targets[], slo_uuids[], dashboard_uids[] }
   │
   ▼  Phase 2 — annie-cli (Go, no LLM): pull each source once, strip server metadata,
   │           derive hyphen/underscore substitutions, embed manifests inline
   │
   ▼  gcx: execute (after operator approval)

Annie is good at the open-ended synthesis: which services need observability, which sibling is the right template, why deepeval qualifies here even though it was skipped in the Datadog case. She's less reliable at counted iteration: emitting four parallel SLO YAML blocks with identical structure for four different targets. So Annie returns one JSON object, and annie-cli does the iteration in Go, where loops are exact.

The runbook is self-contained. Every resource manifest is embedded as a heredoc with substitutions already applied, so the operator reads the exact bytes that will be pushed before approving anything.

Vim showing the generated runbook YAML: a shell step writes each service's SLO manifest as a heredoc, followed by a gcx slo definitions push --dry-run step and the real push step.

The rendered runbook: each SLO manifest written inline, dry-run first, then pushed via gcx. Nothing hidden behind an API call.

And it lands in Grafana. The five – Overview dashboards show up alongside their SLOs, each tagged managed-by:gcx-seed:

Grafana Dashboards list showing five new service Overview dashboards (annie-intelligence, anyshift-backend, anyshift-slackbot, deepeval, graph-connector), each tagged managed-by:gcx-seed and service:<name>.

Where it's going

The bootstrap path joins single-service mute and blast-radius mute in annie do's growing vocabulary. Each new capability is an LLM prompt, a Go template, and a worked example. The architectural shape is fixed: identify in Phase 1, render in Phase 2, execute via the vendor CLI.

The same graph traversal powers different verbs. annie do "mute every service downstream of neo4j" and annie do "set up SLOs for every service downstream of neo4j that doesn't have one" share their target-selection logic. The pup path mutes the fleet during a maintenance window; the gcx path scaffolds it so the next incident already has alerts. The graph is the same graph, the trust boundary is the same trust boundary, and the audit trail on disk looks identical whether the runbook is muting four services or bootstrapping observability across them.

This is an early internal build. Grafana and Anyshift customers who'd like to try it can reach out.