Skip to content

Prometheus with HashiCorp Nomad#

Run an OpenTelemetry Collector on each Nomad client (or wherever Nomad’s HTTP API is reachable). It scrapes Nomad’s built-in Prometheus-format metrics and remote_writes to your Ametnes Prometheus endpoint. Nomad serves metrics at /v1/metrics with format=prometheus, not a separate /metrics port.

Prerequisites#

  • Prometheus provisioned on Ametnes Platform.
  • At least one Nomad client is registered (nomad node status shows a ready node).
  • After telemetry is enabled, Nomad exposes metrics at http://127.0.0.1:4646/v1/metrics?format=prometheus.
  • Docker is available to the Nomad docker task driver if you use the OpenTelemetry Collector job below.

1. Enable Nomad Prometheus metrics#

In the Nomad agent configuration (for example under /etc/nomad.d/):

telemetry {
  prometheus_metrics         = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

Restart Nomad, then on the host verify:

curl -sS 'http://127.0.0.1:4646/v1/metrics?format=prometheus' | head

2. OpenTelemetry Collector job (Nomad + Docker)#

Run otel/opentelemetry-collector-contrib, scrape Nomad locally, and remote-write to Ametnes Prometheus.

Important details:

  • Use network_mode = "host" so 127.0.0.1:4646 refers to the host’s Nomad HTTP API, not the container’s loopback.
  • Render the collector config with a Nomad template block (for example to local/config.yaml).
  • In the template body, escape OpenTelemetry’s ${env:VAR} as $${env:VAR} so Nomad’s HCL parser does not consume ${...}.

Example job sketch:

job "otel-collector" {
  datacenters = ["dc1"]
  type        = "service"

  group "otel" {
    task "otelcontrib" {
      driver = "docker"

      config {
        image        = "otel/opentelemetry-collector-contrib:<version>"
        network_mode = "host"
        args         = ["--config=/local/config.yaml"]
      }

      template {
        destination = "local/config.yaml"
        data        = <<EOH
extensions:
  basicauth/prw:
    client_auth:
      username: $${env:PRW_USER}
      password: $${env:PRW_PASSWORD}

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: nomad
          scrape_interval: 15s
          static_configs:
            - targets: ["127.0.0.1:4646"]
          metrics_path: "/v1/metrics"
          params:
            format: ["prometheus"]

processors:
  batch: {}

exporters:
  prometheusremotewrite:
    endpoint: https://<ametnes-prometheus-endpoint>/api/v1/write
    auth:
      authenticator: basicauth/prw

service:
  extensions: [basicauth/prw]
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheusremotewrite]
EOH
      }

      env {
        PRW_USER       = "<username>"
        PRW_PASSWORD   = "<password>"  # prefer Nomad Variables / Vault in production
      }

      resources {
        cpu    = 500
        memory = 512
      }
    }
  }
}

Deploy and inspect:

nomad job run otel-collector.nomad
nomad job status otel-collector
nomad alloc logs -f <alloc-id> otelcontrib

Replace https://<ametnes-prometheus-endpoint>/api/v1/write and PRW_USER / PRW_PASSWORD with the endpoint and credentials from your Ametnes Platform Prometheus resource.

3. Verify metrics in Prometheus (read API)#

Query remote Ametnes Prometheus:

curl -sS -G \
  -u '<user>:<password>' \
  --data-urlencode 'query=nomad_client_allocations_blocked' \
  'https://<ametnes-prometheus-endpoint>/api/v1/query' | jq .

Success looks like "status":"success" with a non-empty result, and typical labels such as job="nomad", instance="127.0.0.1:4646", plus node_id, datacenter, and related labels as emitted by Nomad.

HashiCorp Nomad Dashboard#

Import the community HashiCorp Nomad dashboard JSON file. You need a Grafana Prometheus data source pointed at your Ametnes Prometheus instance before the dashboard can query metrics.

  1. Add Prometheus as a data source (skip if you already completed this): in Grafana go to ConnectionsData sourcesAdd data sourcePrometheus. Set URL to your Ametnes Prometheus base URL (no trailing slash), enable Basic auth with the User and Password from your Prometheus resource, then Save & test until the health check succeeds. For the full sequence, see Connect Grafana to Prometheus.
  2. Go to DashboardsNewImport.
  3. Paste the dashboard JSON content (or upload the JSON file) in the import screen.
  4. When prompted, choose the Prometheus data source you added above.
  5. Click Import.

Adjust the dashboard’s variables or job selectors if your remote-write labels differ from the dashboard defaults.

For a more composable and repeatable process, Infrastructure as Code (IaC) is usually a better option.

Grafana’s Terraform provider can create the Prometheus data source and the dashboard, but it does not replicate the UI import step where you pick a data source from a dropdown. The dashboard JSON must already reference a data source that exists in Grafana (usually by uid in each panel’s datasource block). In practice you either:

  • Set a known uid on grafana_data_source and edit the downloaded JSON once so panels use that UID, or
  • Keep your exported JSON unchanged and use Terraform’s replace() to substitute the UID string embedded in the file with the UID of the data source you manage (shown below).

Store the dashboard JSON in your repository (for example dashboards/nomad-15764.json) and manage Grafana resources with Terraform:

terraform {
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = "~> 3.0"
    }
  }
}

provider "grafana" {
  url  = var.grafana_url
  auth = "${var.grafana_user}:${var.grafana_password}"
}

resource "grafana_data_source" "prometheus" {
  uid  = var.prometheus_datasource_uid
  type = "prometheus"
  name = "Ametnes Prometheus"
  url  = var.prometheus_url

  basic_auth            = true
  basic_auth_username   = var.prometheus_user
  secure_json_data_encoded = jsonencode({
    basicAuthPassword = var.prometheus_password
  })
}

resource "grafana_dashboard" "nomad" {
  depends_on = [grafana_data_source.prometheus]

  config_json = replace(
    file("${path.module}/dashboards/nomad-15764.json"),
    var.nomad_dashboard_embedded_prometheus_uid,
    grafana_data_source.prometheus.uid
  )
  overwrite = true
}

variable "grafana_url" {
  type = string
}

variable "grafana_user" {
  type = string
}

variable "grafana_password" {
  type      = string
  sensitive = true
}

variable "prometheus_url" {
  type = string
}

variable "prometheus_user" {
  type = string
}

variable "prometheus_password" {
  type      = string
  sensitive = true
}

variable "prometheus_datasource_uid" {
  type        = string
  description = "Stable UID for the Prometheus data source in Grafana (panels in the dashboard JSON should end up referencing this value)."
  default     = "ametnes-prometheus"
}

variable "nomad_dashboard_embedded_prometheus_uid" {
  type        = string
  description = "The Prometheus datasource `uid` string already present in the downloaded dashboard JSON (inspect the file; community exports often repeat the same UID in each panel)."
}

Other workloads#

For application metrics that use a standard Prometheus scrape path (not Nomad’s /v1/metrics), add another scrape_configs entry in the collector (or a separate agent) with the correct metrics_path and targets.

Validation checklist#

  • curl to 127.0.0.1:4646 shows nomad_* series locally.
  • Remote Ametnes Prometheus /api/v1/query returns data for nomad_client_allocations_blocked.
  • The imported Nomad dashboard shows panels after label/job alignment.