Skip to content

How It Works

mgtt encodes your system's dependency graph in a YAML model. A constraint engine walks the graph, probing components and eliminating healthy branches until one failure path remains.

The same model and engine serve two phases — the only difference is where facts come from.

On this page


The three artifacts

system.model.yaml       you write once, version controlled
system.state.yaml       mgtt writes during incidents, append-only
providers/              community plugins, one per technology

The model describes the system. The state file records observations. Providers supply the vocabulary (types, facts, states) and the commands to collect facts from live systems.

The constraint engine

The engine is mgtt's core. It takes four inputs:

  1. Components — from the model
  2. Failure modes — from the providers
  3. Propagation rules — from the dependency graph
  4. Current facts — from scenarios (simulation) or live probes (troubleshooting)

It produces a ranked failure path tree: which paths are still possible, which are eliminated, and which single probe would narrow the search the most.

The engine is pure — no I/O, no credentials, no side effects. The same engine powers both mgtt simulate and mgtt diagnose. Only the source of facts differs.

For the full internals (strategies, probe-selection heuristics, termination conditions, complexity math), see the Engine Reference. This page stays at concept level.

Two modes, same model

Simulation Troubleshooting
Command mgtt simulate mgtt diagnose
Facts from Scenario YAML (authored) Live probes via installed providers
Needs Nothing Environment credentials
Runs in CI pipeline On-call laptop, CI job, Slack bot, or AI agent
Output Pass/fail assertions Structured root-cause report

Simulation (mgtt simulate)

You author scenario files that inject synthetic facts. The engine reasons over them and you assert the conclusion. This tests the model's reasoning, not the system's behavior.

model.yaml + scenario.yaml → engine → pass/fail

If someone removes a dependency from the model, the scenario fails. The PR is blocked. The blind spot never reaches production.

Full simulation walkthrough →

Troubleshooting (mgtt diagnose)

The engine walks the dependency graph from the outermost component inward. At each step it picks the single highest-value, lowest-cost probe, runs it, and continues until one failure path remains or the probe budget is hit.

model.yaml + live probes → engine → root cause

mgtt plan is the interactive press-Y variant for debugging models or teaching — same engine, prompts at each step.

Full troubleshooting walkthrough →

Probe ranking

Not all probes are equal. The engine ranks each candidate by:

  1. Information value — how many failure paths does this probe eliminate?
  2. Cost — how expensive/slow is this probe? (low/medium/high, declared by the provider)
  3. Access — what credentials or permissions does it need?

The engine always suggests the probe that eliminates the most uncertainty for the least cost. See Engine Reference — Probe selection heuristics for the exact algorithm.

Providers

Providers teach mgtt about technologies. Each provider defines:

  • Types — component types (e.g., deployment, rds_instance)
  • Facts — observable properties per type (e.g., ready_replicas, available)
  • States — derived from facts (e.g., live, degraded, stopped)
  • Failure modes — what downstream effects each non-healthy state can cause
  • Probes — the actual commands to collect facts from live systems

Providers for each technology are installed separately. See the Provider Registry for the current catalog — Kubernetes, AWS, Docker, Terraform, Tempo, Quickwit, and anything else the community has authored. Writing your own is a standalone guide.

Provider Type Catalog → | Writing Providers →