Home

mgtt

When something breaks in a distributed system — and the person who built it is asleep — you open three terminals and start guessing. mgtt replaces the guessing with a constraint engine.

Describe once. Your system's dependencies in a single YAML model.
Walk the graph. At 3am, the engine probes components in order of information value and eliminates healthy branches.
Know what's next. Always — and why.

Press Y at each step yourself, or hand the loop to an AI agent — same interface either way.

See it in action¶

Troubleshooting at 3am: `mgtt diagnose`¶

This is mgtt's reason for being. Alert fires. You trigger mgtt diagnose — from a GitLab/GitHub Actions job, a Slack slash-command, an LLM agent, or your laptop — and get a structured report back:

$ mgtt diagnose --suspect api --max-probes 10

  ▶ probe nginx upstream_count       ✗ unhealthy
  ▶ probe api ready_replicas         ✗ unhealthy
  ▶ probe rds available              ✓ healthy  ← eliminated
  ▶ probe frontend ready_replicas    ✓ healthy  ← eliminated

  Root cause: api.degraded
  Chain:      nginx ← api
  Probes run: 4/10

4 components probed. 2 eliminated. Root cause named. The engine ranks probes by information value, so every call moves the answer forward. You didn't need to know the system — the model knew it for you. Partial visibility (RBAC refusals, transient throttles) surfaces as a visible flag in the report rather than aborting the session.

Failure chains are pre-enumerated into a committed scenarios.yaml at design time, so diagnose eliminates whole branches before running a probe.

Full troubleshooting walkthrough | mgtt diagnose reference | scenarios.yaml

Simulation in CI: catch model drift before it matters (`mgtt simulate`)¶

Before the system is even running, mgtt simulate verifies the model's reasoning with hand-authored scenarios — a tiny YAML assertion like "if rds goes down and api crash-loops, the engine should blame rds, not api" — and asserts the engine concludes the same thing. No live system, no credentials, no cluster access. Runs anywhere Go runs.

Wire it into every PR for:

Model drift detection — when the real system evolves (new services, renamed components, changed dependencies), a stale model silently drifts away from reality. A failing scenario tells you before the model is needed at 3am.
Architecture unit tests — each scenario is a declarative assertion. Refactor the model, break a conclusion, the suite fails. Safe renames, safe dependency moves.
Design-time validation — write the model before the system exists; reason about dependency holes before building them. The engine treats your design as executable logic.
Regression harness — the next time a real incident happens, encode it as a scenario. The engine must now identify that chain forever. Your postmortems become tests.

$ mgtt simulate --all

  rds unavailable                          ✓ passed
  api crash-loop independent of rds        ✓ passed
  frontend crash-looping, api healthy      ✓ passed
  all components healthy                   ✓ passed

  4/4 scenarios passed

Full simulation walkthrough

What mgtt gives you¶

One model, two moments:

Model once — describe components, dependencies, and what "healthy" means in YAML.
Simulate in CI — inject synthetic failures; assert the engine reasons correctly; catch model drift before it matters.
Troubleshoot at 3am — mgtt diagnose gives you a structured root-cause report; run it from CI, Slack, an AI agent, or your laptop.

	Design time	At 3am
Command	`mgtt simulate`	`mgtt diagnose`
Facts from	scenario YAML	real probes
Driven by	CI pipeline	on-call engineer, CI job, Slack bot, or AI agent
Output	pass/fail	root cause + chain + eliminated components

mgtt plan exists too — same engine, interactive press-Y mode for debugging models or teaching. Not the daily driver.

Get started¶

Quick Start — complete end-to-end example: model, scenarios, simulate
Install — one-liner, Go, Docker, from source

Learn¶

How It Works — the constraint engine and dependency graph
Simulation walkthrough — design-time model validation
Troubleshooting walkthrough — runtime incident response

Using Providers¶

Overview — how mgtt invokes providers at probe time
Install Methods — git build vs. pre-built Docker image
Names and Versions — FQN + version constraint resolution
Image Capabilities — needs: vocabulary and operator overrides
Registry — browse official + community providers, copy the install line

Reference¶

Model Schema — every field in system.model.yaml
Scenario Schema — hand-authored scenarios/*.yaml for mgtt simulate
scenarios.yaml — the generated sidecar mgtt diagnose consumes
Type Catalog — all provider types, facts, states, and failure modes
CLI Reference — every command
Full Specification — the v1.0 spec index + MCP + design principles

Extend¶

Writing Providers — teach mgtt about your technology