Skip to content

GPU Worker Setup

Connect a GPU server, cloud VM, or local workstation to the CQ Hub as a job worker. This is the foundation of GPU Anywhere — zero config, any OS, encrypted relay.

Architecture

Your laptop              CQ Hub (cloud)          Worker (GPU/CPU)
───────────              ──────────────          ────────────────
cq hub submit  ────────► job queue        ◄────  cq serve
(code snapshot +         (distributes)           (polls queue,
 job spec)                                        runs job,
                                                  uploads results)

Workers are stateless — no project config needed on the worker machine. The job carries everything: code snapshot, environment variables, and artifact declarations.

3-Step Quick Start

Step 1: Install CQ on the Worker Machine

sh
curl -fsSL https://raw.githubusercontent.com/PlayIdea-Lab/cq/main/install.sh | sh

This works on Linux (x86_64, ARM64), macOS, and Windows/WSL2. Docker and NVIDIA Container Toolkit are detected and configured automatically if present.

Step 2: Authenticate

sh
cq auth login    # GitHub OAuth — use the same account as your laptop

Or, for headless machines (no browser):

sh
cq auth login --device    # Device code flow — enter code on another device

Step 3: Start

sh
cq serve    # Starts Hub worker + MCP + relay + cron in one process

The worker is now connected. Jobs submitted from your laptop arrive automatically.

What cq serve Starts

cq serve is the all-in-one entry point. It replaces running individual components separately.

ComponentIncluded
Hub worker (job polling)Yes
MCP serverYes
Relay (NAT traversal)Yes
Cron schedulerYes
pg_notify real-timeYes (when cloud.direct_url is set)

Run as a Service

sh
cq serve start      # Start worker in background
cq serve enable     # Auto-start on boot (systemd/launchd/Task Scheduler)
systemctl status cq-worker

Check logs:

sh
journalctl -fu cq-worker

Manual systemd unit (if you prefer):

ini
[Unit]
Description=CQ Hub Worker
After=network.target docker.service

[Service]
User=ubuntu
SupplementaryGroups=docker
WorkingDirectory=/opt/gpu-worker
ExecStart=/usr/local/bin/cq serve
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

macOS (launchd)

sh
cq serve start      # Start worker in background
cq serve enable     # Register launchd plist for auto-start

Docker Compose

sh
curl -sSL https://github.com/PlayIdea-Lab/cq/releases/latest/download/gpu-worker.tar.gz | tar xz

cat > .env <<EOF
C5_HUB_URL=https://<hub-host>:8585
C5_API_KEY=sk-worker-<your-key>
EOF

docker compose up -d
docker compose logs -f

Kubernetes

CQ workers run natively in K8s. The official container image is published to ghcr.io on every release.

sh
docker pull ghcr.io/playidea-lab/cq-gpu-worker:latest
docker pull ghcr.io/playidea-lab/cq-gpu-worker:v1.58-cuda12.8

Deployment manifest (GPU worker with health probes):

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cq-gpu-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cq-gpu-worker
  template:
    metadata:
      labels:
        app: cq-gpu-worker
    spec:
      containers:
        - name: worker
          image: ghcr.io/playidea-lab/cq-gpu-worker:latest
          env:
            - name: C5_API_KEY
              valueFrom:
                secretKeyRef:
                  name: cq-secrets
                  key: api-key
          ports:
            - containerPort: 8081
          startupProbe:
            httpGet:
              path: /startup
              port: 8081
            failureThreshold: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8081
            periodSeconds: 15
            timeoutSeconds: 3
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8081
            periodSeconds: 10
          resources:
            limits:
              nvidia.com/gpu: "1"
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule

Health probes (v1.58+):

EndpointPurpose200 when
/startupStartupWorker initialization complete
/healthzLivenessHeartbeat file updated within 60s
/readyzReadinessWorker can accept new jobs

Graceful shutdown: On SIGTERM, the worker returns the in-progress job to the Hub queue before exiting (within 10s). Set terminationGracePeriodSeconds: 30 in the pod spec.

Image tags: v{version}-cuda{cuda_version} (e.g., v1.58-cuda12.8). Use latest for the most recent stable build.

Helm chart support is planned for a future release. For now, use the manifest above or kustomize.


Real-Time Job Delivery

By default, workers poll for jobs every 30 seconds. For sub-second delivery, configure a direct database connection:

yaml
# ~/.c4/config.yaml
cloud:
  direct_url: "postgresql://..."    # Direct Supabase connection string

With direct_url, the worker uses PostgreSQL LISTEN 'new_job' — jobs arrive instantly.

Submitting Jobs

From your laptop, in Claude Code:

sh
# MCP tool
cq_hub_submit(command="python train.py")

Or from the terminal:

sh
cq hub submit --run "python train.py"

CQ snapshots the current directory to Drive (content-addressable, automatic dedup) and posts the job to the Hub. No Git required.

GPU Detection

Workers automatically detect GPU capabilities:

  • If nvidia-smi is found, the worker registers as GPU-capable
  • Jobs with requires_gpu: true are only routed to GPU workers
  • If nvidia-smi is not found, the worker starts in CPU-only mode (no action needed)

Routing Jobs to Specific Workers

By worker ID

sh
cq hub submit --target worker-abc123 python train.py

By capability

sh
cq hub submit --capability cuda python train.py

By tags

sh
cq hub submit --tags gpu,a100 python train.py

Declare tags in caps.yaml on the worker:

yaml
tags:
  - gpu
  - a100
  - datacenter-us

Monitoring

sh
cq hub workers              # Active workers
cq hub workers --all        # Include offline workers
cq hub list                 # Recent jobs
cq hub status <job_id>      # Job status
cq hub watch <job_id>       # Live job output
cq hub log <job_id>         # Job logs
cq hub summary              # Hub stats

Maintenance

Remove zombie workers

Workers offline for 24+ hours are pruned automatically. Manual cleanup:

sh
cq hub workers prune              # Remove offline workers
cq hub workers prune --dry-run    # Preview

Version gate

If the Hub requires a minimum worker version:

sh
cq update               # Update binary
cq hub worker start     # Restart worker

Authentication Reference

MethodHow
Session (default)cq auth login — stored at ~/.c4/session.json, used automatically
API keyexport C5_API_KEY=sk-worker-<key>
Device codecq auth login --device — for headless machines

Key prefixes:

PrefixScope
sk-worker-*Poll and complete jobs only
sk-user-*Submit and query jobs only
(none)Full access

Troubleshooting

SymptomFix
nvidia-smi not foundWorker runs in CPU-only mode automatically — no action needed
Auth errorRe-run cq auth login or cq auth login --device
Worker shows offlineRun `ps aux
Job stuckCheck cq hub log <job_id> and worker logs
--non-interactive needed in CIPass --non-interactive flag to cq hub worker init
WSL2 relay dropsCQ sets SO_KEEPALIVE automatically — no config needed. Ensure cq serve (not cq hub worker start) is used, as it includes the keepalive-aware relay component

Next Steps

  • Knowledge Loop — accumulate experiment results into reusable AI knowledge
  • Tiers — understand Free/Pro/Team feature sets