Observability and operations¶
Status: draft — content will evolve as procedures are finalized.
Monitoring¶
- Track Zuul scheduler, executor, and Nodepool queue depths; alert on long queue times or node exhaustion.
- Watch registry storage growth and schedule garbage-collection windows. Keep enough headroom for at least two full build sets.
- Monitor mirror freshness by comparing package counts and repository snapshots against expected baselines.
Common issues and fixes¶
- Mirror drift: snapshot mirrors regularly and pin digests in
kolla-build.confto avoid surprise updates. - Disk pressure: builders need ample space in
/var/lib/dockeror/var/lib/containers; size disks to keep >60GB free during builds. - TLS/CA errors: distribute internal CA bundles to all Nodepool images; set
REQUESTS_CA_BUNDLEandSSL_CERT_FILEin jobs. - Slow pushes: throttle concurrent builds with Zuul semaphores and ensure registry servers have adequate IOPS.
Log and artifact retention¶
- Centralize Zuul and Nodepool logs to object storage with lifecycle rules that meet compliance needs.
- Retain BOMs and manifest lists for each promoted tag to make rollback and audits straightforward.
Periodic maintenance¶
- Rotate registry credentials and tokens; automate injection via Zuul secrets.
- Rebuild Nodepool images periodically to include updated CA bundles and tooling versions.
- Review Zuul pipelines quarterly to ensure promotion gates still reflect business requirements.