Skip to content

Air-Gapped Implementation 3 — Zuul Stack — VM2 (v2.0.0)

This document is the operator runbook for bringing up the Zuul stack on VM2 using Docker Compose:

  • ZooKeeper
  • Zuul scheduler / web / merger / executor
  • nodepool-launcher
  • PostgreSQL (Zuul DB, if configured)

It also covers the required bootstrap steps:

  • Shipping the Zuul git repository (offline) to use zuul/tools/zk-ca.sh
  • Generating ZooKeeper TLS material into the compose certs/ directory
  • Creating SSH keys + known_hosts for:
    • Gerrit (SSH port 29418)
    • VM3 runner container (SSH port 23389)

Tenant layout and pipeline strategy

  • Configuration repositories: keep a dedicated config repo (for example, zuul-config) that stores tenants, pipelines, and shared job definitions in zuul.d/. Project repos contain job-specific playbooks and roles (kolla-config, ansible-ops, or forks of OpenStack services). Use a separate secrets repo if your security model requires isolated storage for private data.

  • Example tenant fragment:

    - tenant:
        name: openstack-airgap
        source:
        gerrit:
            config-repos:
            - zuul-config
            untrusted-projects:
            - kolla-config
            - ansible-ops
    
  • Pipeline strategy: standard check, gate, and promote pipelines handle registry pushes. Add periodic for scheduled rebuilds to refresh CVEs. Use project-templates for repeated patterns (shared base jobs, log publishing, registry auth handlers) and explicitly set queue names for long-running build jobs.

  • Repository hygiene: mirror upstream OpenStack and Kolla repos into your control plane and pin branches to the target release. Keep Ansible roles/playbooks close to the owning projects to simplify code reviews and secrets scoping, and enforce strong review rules before merging changes to shared pipelines.


1) What must already exist (hard prerequisites)

1.1 VM-to-VM connectivity (must match network.md)

From VM2:

  • VM2 → VM1: TCP/8081,8082,8083
  • VM2 → VM3: TCP/23389
  • Local Gerrit on VM2: TCP/29418 reachable (either via VM2 IP/FQDN or localhost)

1.2 Software on VM2

  • Docker Engine installed and running
  • Docker Compose plugin available (docker compose version works)

1.3 Required directories on VM2

You must have a stable directory containing the Zuul compose stack. Recommended:

  • /opt/zuul-compose

This directory MUST contain:

  • docker-compose.yml
  • etc_zuul/
  • etc_nodepool/
  • playbooks/ (must include wait-to-start-certs.sh)
  • zk/ (must include zoo.cfg)
  • certs/ (will be created/populated)

2) Variables you must set (edit once)

Run these on VM2 (copy/paste, then edit values):

# Where your Zuul compose directory lives on VM2
export ZUUL_COMPOSE_DIR="/opt/zuul-compose"

# Where the Zuul git repository is placed on VM2 (shipped offline)
export ZUUL_SRC_DIR="/opt/src/zuul"

# Gerrit SSH endpoint (as used by Zuul / and for known_hosts pinning)
export GERRIT_SSH_HOST="<VM2_IP_OR_DNS>"     # FQDN or IP resolving to VM2
export GERRIT_SSH_PORT="29418"
export GERRIT_SSH_USER="<gerrit_zuul_user>"  # service user in Gerrit

# VM3 runner container SSH endpoint (as reached from VM2)
export RUNNER_HOST="<VM3_IP_OR_DNS>"
export RUNNER_SSH_PORT="23389"
export RUNNER_USER="<runner_user>"           # must match nodepool.yaml

# ZooKeeper TLS name to be used for cert generation (per your environment)
# You requested using this value:
export ZK_TLS_NAME="examples_zk_1.examples_default"

3) Fail-fast preflight checks (run before doing anything else)

set -euo pipefail

# 1) Compose and docker availability
docker version >/dev/null
docker compose version >/dev/null

# 2) Required directories/files exist
test -d "${ZUUL_COMPOSE_DIR}"
test -f "${ZUUL_COMPOSE_DIR}/docker-compose.yml"
test -d "${ZUUL_COMPOSE_DIR}/etc_zuul"
test -d "${ZUUL_COMPOSE_DIR}/etc_nodepool"
test -d "${ZUUL_COMPOSE_DIR}/playbooks"
test -f "${ZUUL_COMPOSE_DIR}/playbooks/wait-to-start-certs.sh"
test -d "${ZUUL_COMPOSE_DIR}/zk"
test -f "${ZUUL_COMPOSE_DIR}/zk/zoo.cfg"

# 3) VM2 can reach VM3 runner container SSH
nc -vz "${RUNNER_HOST}" "${RUNNER_SSH_PORT}"

# 4) docker-compose file is valid YAML for docker compose
cd "${ZUUL_COMPOSE_DIR}"
docker compose config >/dev/null
echo "Preflight OK"

4) Offline dependency: ship the Zuul git repository to VM2 (required)

We rely on Zuul tooling to generate ZooKeeper certs:

  • zuul/tools/zk-ca.sh

4.1 Place the repo on VM2

The repository must exist at:

  • ${ZUUL_SRC_DIR}

Validate:

test -d "${ZUUL_SRC_DIR}"
test -f "${ZUUL_SRC_DIR}/tools/zk-ca.sh"

5) Ensure all container images are present locally on VM2 (required)

Because the environment is offline, you must ensure the referenced images already exist locally (loaded from tar, or available via your internal registry).

5.1 List images used by the compose

cd "${ZUUL_COMPOSE_DIR}"
docker compose config | awk '/image:/{print $2}' | sort -u

5.2 Verify each image exists locally

For each image printed above, run:

docker image inspect <IMAGE_NAME> >/dev/null

If any inspect fails, the stack will fail at runtime. Load the missing image tar(s) and retry.

5.3 Deploy Gerrit on VM2 (Docker Compose)

Gerrit must be running before you perform the SSH checks in Section 7. Use the dedicated Gerrit runbook for the Docker Compose service definition, startup commands, and health checks:


6) Generate ZooKeeper certificates using Zuul tooling (required)

Your stack gates startup using playbooks/wait-to-start-certs.sh, and expects certs under:

  • ${ZUUL_COMPOSE_DIR}/certs (mounted as /var/certs)

6.1 Prepare cert directory

mkdir -p "${ZUUL_COMPOSE_DIR}/certs"
chmod 700 "${ZUUL_COMPOSE_DIR}/certs"

6.2 Generate certs (required command format)

cd "${ZUUL_SRC_DIR}"
./tools/zk-ca.sh "${ZUUL_COMPOSE_DIR}/certs" "${ZK_TLS_NAME}"

6.3 Sanity check: cert directory must not be empty

ls -la "${ZUUL_COMPOSE_DIR}/certs"

If the directory is empty: stop and fix ZUUL_SRC_DIR and/or the shipped Zuul repo.


7) SSH keys and known_hosts bootstrap (required)

This step prevents interactive SSH prompts and eliminates the most common Zuul bring-up failures.

7.1 Create SSH directories used by container mounts (host paths)

Docker compose mounts SSH material from these host paths:

  • /var/lib/zuul/ssh (used by merger as /var/lib/zuul/.ssh)
  • /var/lib/nodepool/ssh (used by nodepool-launcher and executor)

Create them:

sudo mkdir -p /var/lib/zuul/ssh
sudo mkdir -p /var/lib/nodepool/ssh
sudo chmod 700 /var/lib/zuul/ssh /var/lib/nodepool/ssh

7.2 Create the Gerrit Zuul service user (one-time in Gerrit UI)

  1. In Gerrit, create a dedicated service account for Zuul (for example, zuul).
  2. Add the public SSH key generated below to that account:
  3. /var/lib/zuul/ssh/id_ed25519_gerrit.pub
  4. Grant this user the minimum required permissions on the following projects:
  5. openstack/project-config
  6. openstack/kolla

The user must be able to: - Read repository contents. - Vote the labels required by your pipelines (for example, Verified, Code-Review, Workflow). - Submit changes when your pipelines expect Zuul to do so.

Apply these permissions in Gerrit (e.g., via project access settings or an ACL change) before starting Zuul. This prevents the merger from failing on initial connections.

7.3 Generate Gerrit SSH key (used by Zuul Merger)

sudo test -f /var/lib/zuul/ssh/id_ed25519_gerrit || \
  sudo ssh-keygen -t ed25519 -N '' -f /var/lib/zuul/ssh/id_ed25519_gerrit

sudo chmod 600 /var/lib/zuul/ssh/id_ed25519_gerrit
sudo chmod 644 /var/lib/zuul/ssh/id_ed25519_gerrit.pub

Required one-time manual action in Gerrit: Add the public key to the Gerrit service user:

  • /var/lib/zuul/ssh/id_ed25519_gerrit.pub

7.4 Generate Nodepool SSH key (used to SSH into VM3 runner container)

sudo test -f /var/lib/nodepool/ssh/id_ed25519_nodepool || \
  sudo ssh-keygen -t ed25519 -N '' -f /var/lib/nodepool/ssh/id_ed25519_nodepool

sudo chmod 600 /var/lib/nodepool/ssh/id_ed25519_nodepool
sudo chmod 644 /var/lib/nodepool/ssh/id_ed25519_nodepool.pub

Required one-time manual action on VM3 runner container: Append this public key to the runner user’s authorized_keys:

  • /var/lib/nodepool/ssh/id_ed25519_nodepool.pub/home/${RUNNER_USER}/.ssh/authorized_keys (inside the runner container)

7.5 Generate known_hosts (pin host keys; no prompts)

To avoid duplicates and stale keys, rebuild known_hosts files deterministically:

# Gerrit known_hosts (both contexts: merger and nodepool)
sudo rm -f /var/lib/zuul/ssh/known_hosts /var/lib/nodepool/ssh/known_hosts
sudo touch /var/lib/zuul/ssh/known_hosts /var/lib/nodepool/ssh/known_hosts
sudo chmod 644 /var/lib/zuul/ssh/known_hosts /var/lib/nodepool/ssh/known_hosts

# Pin Gerrit SSH host key
sudo ssh-keyscan -p "${GERRIT_SSH_PORT}" -H "${GERRIT_SSH_HOST}" | sudo tee -a /var/lib/zuul/ssh/known_hosts >/dev/null
sudo ssh-keyscan -p "${GERRIT_SSH_PORT}" -H "${GERRIT_SSH_HOST}" | sudo tee -a /var/lib/nodepool/ssh/known_hosts >/dev/null

# Pin runner container SSH host key
sudo ssh-keyscan -p "${RUNNER_SSH_PORT}" -H "${RUNNER_HOST}" | sudo tee -a /var/lib/nodepool/ssh/known_hosts >/dev/null

7.6 Non-interactive SSH validation (must pass before starting Zuul)

# VM2 -> runner container
sudo ssh -p "${RUNNER_SSH_PORT}" -i /var/lib/nodepool/ssh/id_ed25519_nodepool \
  -o StrictHostKeyChecking=yes \
  -o UserKnownHostsFile=/var/lib/nodepool/ssh/known_hosts \
  "${RUNNER_USER}@${RUNNER_HOST}" 'id && uname -a'

# VM2 -> Gerrit (requires Gerrit user to contain the public key)
sudo ssh -p "${GERRIT_SSH_PORT}" -i /var/lib/zuul/ssh/id_ed25519_gerrit \
  -o StrictHostKeyChecking=yes \
  -o UserKnownHostsFile=/var/lib/zuul/ssh/known_hosts \
  "${GERRIT_SSH_USER}@${GERRIT_SSH_HOST}" gerrit version

Do not proceed until both commands succeed.


8) Start the Zuul stack (VM2)

cd "${ZUUL_COMPOSE_DIR}"
docker compose up -d

Wait briefly and check status:

sleep 5
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

9) Post-start verification (must pass)

9.1 Zuul Web is reachable (local)

curl -sSf http://:9000/ >/dev/null
echo "Zuul web OK"

9.2 ZooKeeper is not blocked by cert wait script

cd "${ZUUL_COMPOSE_DIR}"
docker compose logs --no-color --tail=200 zk

Expected:

  • ZooKeeper starts normally (no continuous “waiting for certs” loops).

9.3 No crash-looping services

cd "${ZUUL_COMPOSE_DIR}"
for s in zk zuul-scheduler zuul-merger zuul-web zuul-executor nodepool-launcher pgsql; do
  echo "=== ${s} ==="
  docker compose ps "${s}" || true
done

If any service is restarting, go to Section 10.

9.4 Nodepool launcher not failing SSH to runner

cd "${ZUUL_COMPOSE_DIR}"
docker compose logs --no-color --tail=200 nodepool-launcher

Expected:

  • No repeated SSH failures to VM3:23389.

10) Deterministic fixes (apply in this order)

10.1 ZooKeeper stuck waiting for certs

Symptoms:

  • zk logs show wait-to-start-certs.sh looping.

Fix:

  1. Ensure certs exist on host:
ls -la "${ZUUL_COMPOSE_DIR}/certs"
  1. Ensure certs are visible inside container:
cd "${ZUUL_COMPOSE_DIR}"
docker compose exec -T zk sh -lc 'ls -la /var/certs'
  1. Re-run cert generation:
cd "${ZUUL_SRC_DIR}"
./tools/zk-ca.sh "${ZUUL_COMPOSE_DIR}/certs" "${ZK_TLS_NAME}"
  1. Restart ZooKeeper:
cd "${ZUUL_COMPOSE_DIR}"
docker compose restart zk

10.2 Merger cannot fetch from Gerrit (auth / known_hosts)

Symptoms:

  • zuul-merger logs show SSH failures or host key verification errors.

Fix:

  1. Confirm Gerrit has the public key:

    • /var/lib/zuul/ssh/id_ed25519_gerrit.pub is installed for ${GERRIT_SSH_USER}
  2. Re-run the Gerrit SSH validation (Section 7.6).

  3. Restart merger:

    cd "${ZUUL_COMPOSE_DIR}"
    docker compose restart zuul-merger
    

10.3 Executor/Nodepool cannot SSH into runner container

Symptoms:

  • nodepool-launcher logs show repeated SSH connection/auth failures to VM3:23389.

Fix:

  1. Confirm VM2 can reach port 23389:

    nc -vz "${RUNNER_HOST}" "${RUNNER_SSH_PORT}"
    
  2. Confirm runner authorized_keys contains nodepool public key.

  3. Re-run runner SSH validation (Section 7.6).
  4. Restart nodepool-launcher and executor:
cd "${ZUUL_COMPOSE_DIR}"
docker compose restart nodepool-launcher zuul-executor

10.4 Postgres container runs but Zuul cannot connect

Symptoms:

  • Zuul services log DB connection errors.

Fix checklist:

  • Confirm pgsql is healthy:
cd "${ZUUL_COMPOSE_DIR}"
docker compose logs --no-color --tail=200 pgsql
  • Confirm the DB credentials and host in your etc_zuul/zuul.conf match the compose DB settings:

    • DB host should typically be pgsql (docker network name), not localhost.

Restart Zuul scheduler after config changes:

cd "${ZUUL_COMPOSE_DIR}"
docker compose restart zuul-scheduler