Skip to content

Observability and operations

Status: draft — content will evolve as procedures are finalized.

Monitoring

  • Track Zuul scheduler, executor, and Nodepool queue depths; alert on long queue times or node exhaustion.
  • Watch registry storage growth and schedule garbage-collection windows. Keep enough headroom for at least two full build sets.
  • Monitor mirror freshness by comparing package counts and repository snapshots against expected baselines.

Common issues and fixes

  • Mirror drift: snapshot mirrors regularly and pin digests in kolla-build.conf to avoid surprise updates.
  • Disk pressure: builders need ample space in /var/lib/docker or /var/lib/containers; size disks to keep >60GB free during builds.
  • TLS/CA errors: distribute internal CA bundles to all Nodepool images; set REQUESTS_CA_BUNDLE and SSL_CERT_FILE in jobs.
  • Slow pushes: throttle concurrent builds with Zuul semaphores and ensure registry servers have adequate IOPS.

Log and artifact retention

  • Centralize Zuul and Nodepool logs to object storage with lifecycle rules that meet compliance needs.
  • Retain BOMs and manifest lists for each promoted tag to make rollback and audits straightforward.

Periodic maintenance

  • Rotate registry credentials and tokens; automate injection via Zuul secrets.
  • Rebuild Nodepool images periodically to include updated CA bundles and tooling versions.
  • Review Zuul pipelines quarterly to ensure promotion gates still reflect business requirements.