Skip to content

mgttmodel guided troubleshooting tool

Specification v0.2.0

This page is the formal spec index. Most sections now live in the reference pages — follow the links. Two sections that don't fit elsewhere (MCP Service and Design Principles) are inlined below.

On this page


What mgtt is

mgtt lets you encode a system model once, accumulate timestamped observations into a fact store, and use constraint propagation over the model and facts to guide — not replace — the troubleshooting process.

  • Not a monitoring tool. It doesn't run continuously.
  • Not an automation tool. It doesn't fix things.
  • Not AI-dependent. AI is one possible consumer of the model, not a requirement.
  • Not system-specific. The model language works for any distributed system.

The closest analogy is Terraform: separate desired state (model) from observed state (facts), and reason over the diff. mgtt does this for understanding, not provisioning.

model author       writes system.model.yaml once, calmly, knows the system
on-call engineer   mgtt incident start
                   mgtt diagnose
                   [structured root-cause report]
                   mgtt incident end

Cognitive load belongs at authoring time, not incident time.

Implementation: Go. Module path github.com/mgt-tool/mgtt. Binary: mgtt.


Reference index

Every schema, format, and CLI surface lives in a reference page:

Topic Reference
Model file — system.model.yaml Model Schema
Hand-authored scenario files — scenarios/*.yaml Scenario Schema
Generated chain sidecar — scenarios.yaml scenarios.yaml Reference
Provider manifest — manifest.yaml Manifest Schema
Stdlib primitives + provider-contributed types Type Catalog
How the engine picks probes and terminates Engine Reference
Every CLI subcommand and flag CLI Reference
Runtime configuration — $MGTT_HOME, env vars Configuration
Provider image capabilities — needs: vocabulary Image Capabilities
Provider registry format Registry Reference

Conceptual overviews — read these before the reference pages if you're new:

  • How It Works — model, facts, providers, the engine's place
  • Simulation — model drift detection, regression harness, design-time validation
  • Troubleshooting — the 3am flagship workflow

Provider authors — start here instead:


MCP Service

mgtt exposes its constraint engine as an MCP service, callable by LLMs and AI agents. CLI and MCP are equal consumers.

Tools

mgtt://tools/plan          run constraint engine, return failure path tree
mgtt://tools/probe         run a probe, append fact, return updated tree
mgtt://tools/fact/add      add a manual fact, return updated tree
mgtt://tools/ls/components list components and current status
mgtt://tools/ls/facts      list facts for a component

Tool Schemas

plan

{
  "input": {
    "component":  "string (optional — defaults to outermost)",
    "from_fact":  "string (optional — e.g. 'error_rate=0.94')"
  },
  "output": {
    "incident":    "string",
    "entry_point": "string",
    "state":       "string",
    "paths": [{
      "id":         "string",
      "components": ["string"],
      "hypothesis": "string",
      "eliminated": "boolean",
      "reason":     "string (if eliminated)"
    }],
    "suggested_probe": {
      "component":  "string",
      "fact":       "string",
      "eliminates": ["string"],
      "cost":       "low | medium | high",
      "access":     "string",
      "command":    "string"
    }
  }
}

probe

{
  "input": {
    "component": "string",
    "fact":      "string (optional — all facts if omitted)"
  },
  "output": {
    "fact":              "string",
    "value":             "any",
    "collector":         "string",
    "at":                "ISO8601",
    "paths_remaining":   "integer",
    "paths_eliminated":  "integer",
    "updated_plan":      "plan output (full)"
  }
}

fact/add

{
  "input": {
    "component": "string",
    "key":       "string",
    "value":     "any",
    "at":        "ISO8601 (optional)",
    "note":      "string (optional)"
  },
  "output": {
    "appended":      "boolean",
    "updated_plan":  "plan output (full)"
  }
}

Autonomy Modes

observe       AI sees facts and paths, surfaces to human
              never calls probe or fact/add autonomously

assist        AI runs probe when cost == low AND access is read-only
              surfaces to human for cost == medium|high or write
              default mode

autonomous    AI drives the full loop, human gets report at end
              not recommended for production systems

Design Principles

  • Zero cognitive load at incident time. The on-call engineer runs mgtt diagnose and reads a structured report. All system knowledge lives in the model, authored calmly beforehand.
  • Simple until explicit. Defaults cover 90% of cases. Namespacing and overrides exist for the other 10%.
  • Pecking order is the single resolution rule. Type, facts, probes, data types all resolve the same way: first provider wins.
  • State is observed, not declared. Component states derive from facts automatically. Engineers never set or advance state.
  • Stdlib is primitives only. Higher-level types belong in providers.
  • Credentials belong to the environment. mgtt never stores, manages, or transmits credentials. Same model as Terraform.
  • Providers are self-contained and external. Each provider lives in its own git repository. Providers depend only on the mgtt provider SDK.
  • AI friendly, not AI dependent. MCP makes mgtt callable by any LLM. The constraint engine reasons — the AI drives the loop.
  • Append only. The fact store is a record, not a scratchpad.
  • Derive, don't persist. State machine and failure path tree computed fresh. Only observations and current position stored.
  • Engine is pure. The constraint engine has no I/O, no probe execution, no credential access, no filesystem operations. It takes a model, providers, and facts as input and returns a failure path tree. The same engine is callable from the CLI, simulation runner, and MCP service — only the source of facts differs.
  • Guided, not automated. mgtt tells you what to check next and why. Human or AI decides whether to check it.