Skip to content

Conversation

@karencfv
Copy link
Contributor

@karencfv karencfv commented Jan 13, 2026

Adds the ability to enable the sled agent health monitor on simulated systems. This is and will be very useful for various types of testing.

Disabled:

# Configuration toml file
enabled = false
$ cargo xtask omicron-dev run-all
<...>
omicron-dev: sled agent API:         http://[::1]:56577
<...>
$ curl -H "api-version: 14.0.0"  http://[::1]:56577/inventory | jq .health_monitor
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21274  100 21274    0     0  7654k      0 --:--:-- --:--:-- --:--:-- 10.1M
{
  "smf_services_in_maintenance": {
    "ok": {
      "services": [],
      "errors": [],
      "time_of_status": null
    }
  }
}

With fake health monitor results

# Configuration toml file

enabled = false

[sim_health_checks.smf_services_in_maintenance.ok]
services = [
    { fmri = "svc:/system/fake-service-1:default", zone = "oxz_fake_zone_1" },
    { fmri = "svc:/network/fake-service-2:default", zone = "oxz_fake_zone_2" },
    { fmri = "svc:/application/fake-service-3:default", zone = "global" }
]

errors = []

time_of_status = "2026-04-12T23:20:50.52Z"
$ cargo xtask omicron-dev run-all --health-monitor-config sled-agent/tests/configs/health_monitor_sim_unhealthy.toml
<...>
omicron-dev: sled agent API:         http://[::1]:64707
<...>
$ curl -H "api-version: 14.0.0"  http://[::1]:64707/inventory | jq .health_monitor
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21505  100 21505    0     0  8932k      0 --:--:-- --:--:-- --:--:-- 10.2M
{
  "smf_services_in_maintenance": {
    "ok": {
      "services": [
        {
          "fmri": "svc:/system/fake-service-1:default",
          "zone": "oxz_fake_zone_1"
        },
        {
          "fmri": "svc:/network/fake-service-2:default",
          "zone": "oxz_fake_zone_2"
        },
        {
          "fmri": "svc:/application/fake-service-3:default",
          "zone": "global"
        }
      ],
      "errors": [],
      "time_of_status": "2026-04-12T23:20:50.520Z"
    }
  }
}

Enabled

# Configuration toml file
enabled = true
$ cargo xtask omicron-dev run-all --health-monitor-config sled-agent/tests/configs/health_monitor_sim_enabled.toml
<...>
omicron-dev: sled agent API:         http://[::1]:59351
<...>
$ curl -H "api-version: 14.0.0"  http://[::1]:59351/inventory | jq .health_monitor
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21418  100 21418    0     0  8900k      0 --:--:-- --:--:-- --:--:-- 10.2M
{
  "smf_services_in_maintenance": {
    "ok": {
      "services": [
        {
          "fmri": "svc:/site/fake-service2:default",
          "zone": "global"
        },
        {
          "fmri": "svc:/site/fake-service:default",
          "zone": "global"
        }
      ],
      "errors": [],
      "time_of_status": "2026-01-22T06:41:03.279150883Z"
    }
  }
}

Closes: #9517

@davepacheco
Copy link
Collaborator

Cool. Does this cause the simulated sled agent to look at the actual SMF state wherever it's running? Wouldn't it be more useful to allow the reported state to be customized directly?

@karencfv
Copy link
Contributor Author

Does this cause the simulated sled agent to look at the actual SMF state wherever it's running?

Yes, but that was the use case I've been having 😄.

Wouldn't it be more useful to allow the reported state to be customized directly?

That would be really useful too. I wonder if there is a possibility to have one or the other. Do you think it' would be relatively straightforward to do that?

@karencfv
Copy link
Contributor Author

karencfv commented Jan 22, 2026

@davepacheco, I've changed the approach. It's now possible to inject fake data via a config file. Let me know what you think!

I updated the PR's description to show the new way this would work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include health monitor information for testing

2 participants