Skip to content

Architecture Reference

CQ is a GPU orchestration platform — Device Anywhere, Anytime, Anything. Every AI conversation becomes permanent knowledge, quality gates ensure code integrity, and E2E encrypted relay enables remote GPU training from any machine, any network. This document describes the core components.


System Overview

+------------------+          +----------------------------+
| Local (Thin Agent)|  JWT    | Cloud (Supabase)            |
|                   |<------->|                             |
| Hands:            |         | Brain:                      |
|  +- Files / Git  |         |  +- Tasks (Postgres)        |
|  +- Build / Test |         |  +- Knowledge (pgvector)    |
|  +- LSP analysis |         |  +- LLM Proxy (Edge Fn)    |
|  +- MCP bridge   |         |  +- Quality Gates           |
|                   |         |  +- Hub (distributed jobs)  |
| Service (cq serve)|   WSS  |                             |
|  +- Relay --------+-------->|  Relay (Fly.io)             |
|  +- EventBus     |         |  +- NAT traversal            |
|  +- Token refresh|         |                             |
+------------------+          | Remote AI Workspace (CF Worker)  |
                              |  +- OAuth 2.1 MCP proxy     |
Any AI (ChatGPT,   --- MCP -->|  +- Knowledge record/search |
 Claude, Gemini)              |  +- Session summary         |
                              +----------------------------+

free:   Everything local (SQLite + your API key)
pro:    AI workspace in cloud + E2E encrypted relay (login + serve)
team:   Pro + shared GPU workers + autonomous research loop

Deployment Tiers

TierData SSOTLLMSetup
FreeLocal SQLiteUser's API keyconfig.yaml required
ProSupabase (cloud-primary)PI Lab LLM Proxycq auth login + cq serve
TeamSupabase (cloud-primary)PI Lab LLM ProxyPro + shared GPU workers
  • Cloud failure falls back to SQLite (read-only)
  • ~70 tools cloud-primary, ~48 tools require local (files/git/build)
  • Remote AI Workspace: ChatGPT/Claude/Gemini connect via OAuth MCP (no local install needed)

Go MCP Server (c4-core/)

The primary MCP server. Serves 217 tools via stdio transport.

Claude Code -> Go MCP Server (stdio, 217 tools)
                +-> Go native (28): state, tasks, files, git, validation, config
                +-> Go + SQLite (13): spec, design, checkpoint, artifact, lighthouse
                +-> Soul/Persona/Twin (10): soul_evolve, persona_learn, twin_record, ...
                +-> LLM Gateway (3): llm_call, llm_providers, llm_costs
                +-> CDP + WebMCP (5): cdp_run, webmcp_discover, web_fetch, ...
                +-> Drive (6): upload, download, list, delete, info, mkdir
                +-> File Index (2): fileindex_search, fileindex_status
                +-> Session (3): session_index, session_summarize, session_snapshot
                +-> Memory (1): memory_import
                +-> Relay (2): cq_workers, cq_relay_call
                +-> Knowledge (13): record, search, distill, ingest, sync, publish, ...
                +-> Hub Client (19, conditional): job, worker, DAG, artifact, cron
                +-> Worker Standby (3, Hub): standby, complete, shutdown
                +-> C7 Observe (4, build tag): metrics, logs, trace, status
                +-> C6 Guard (5, build tag): check, audit, policy, deny
                +-> C8 Gate (6, build tag): webhook, schedule, slack, github, ...
                +-> EventSink (1) + HubPoller (1)
                +-> JSON-RPC proxy (10) -> Python Sidecar

Tool Tiering

  • Core (40 tools): Always loaded, immediate availability
  • Extended (177 tools): Loaded on demand, available after initialization
  • Conditional: Hub tools require serve.hub.enabled: true; C7/C6/C8 require build tags

Package Structure

c4-core/
+-- cmd/c4/           # CLI (cobra) + MCP server entry point
+-- internal/
    +-- mcp/          # Registry + stdio transport
    |   +-- apps/     # MCP Apps ResourceStore + embedded widget HTML
    |   +-- handlers/ # Per-tool handlers
    +-- bridge/       # Python sidecar manager (JSON-RPC/TCP, lazy start)
    +-- task/         # TaskStore (SQLite, Memory, Supabase)
    +-- state/        # State machine (INIT -> COMPLETE)
    +-- worker/       # Worker manager + survivability (watchdog, safeGo)
    +-- validation/   # Validation runner (go test, pytest, cargo test auto-detect)
    +-- config/       # Config manager (YAML, env, economic presets)
    +-- cloud/        # Auth (OAuth), CloudStore, HybridStore, TokenProvider
    +-- hub/          # Hub REST+WS client (26 tools)
    +-- daemon/       # Local job scheduler (GPU-aware)
    +-- eventbus/     # C3 EventBus v4 (gRPC, WS bridge, DLQ, filter v2)
    +-- knowledge/    # Knowledge (FTS5 + Vector + Embedding + Sync)
    +-- research/     # Research iteration store
    +-- c2/           # Workspace/Profile/Persona + webcontent
    +-- drive/        # Drive client (TUS resumable upload)
    +-- fileindex/    # Cross-device file search
    +-- session/      # Session tracking + LLM summarizer
    +-- memory/       # ChatGPT/Claude session import pipeline
    +-- relay/        # WebSocket relay client (auto-restart)
    +-- llm/          # LLM Gateway (Anthropic, OpenAI, Gemini, Ollama)
    +-- cdp/          # Chrome DevTools Protocol + WebMCP
    +-- observe/      # C7 Observe (c7_observe build tag)
    +-- guard/        # C6 Guard (c6_guard build tag)
    +-- gate/         # C8 Gate (c8_gate build tag)

Build and Install

bash
# Build + install (CRITICAL -- always use make install)
cd c4-core && make install

# Tests
cd c4-core && go test ./...

# Environment diagnostics
cq doctor

Worker Survivability (v1.44-v1.48)

Workers are designed to self-heal from crashes, network failures, and overload without operator intervention.

OS Watchdog

Workers register as system services with a --watchdog flag. The OS service manager (systemd/launchd) restarts the process automatically on exit.

ExecStart=/usr/local/bin/cq serve --watchdog
Restart=always
RestartSec=5

safeGo

All goroutines are launched via safeGo, a wrapper that recovers from panics and logs them to the ring buffer instead of crashing the process.

safeGo(func() {
    // goroutine body -- panics are caught, logged, never crash the process
})

Heartbeat Circuit Breaker

Workers send periodic heartbeats to Supabase. If the heartbeat fails repeatedly, the circuit breaker opens and the worker enters a reconnect loop rather than continuing to fail silently.

Heartbeat tick
    |-- success -> reset failure count
    |-- failure -> increment count
    +-- count >= threshold -> circuit open -> exponential backoff -> retry

Crash Log Collection

A fixed-size RingBuffer captures the last N log lines in memory. On crash or panic recovery, UploadCrashLog ships the buffer to Supabase for post-mortem analysis.

429 Adaptive Backoff

When the Hub or LLM Gateway returns HTTP 429, the worker reads the Retry-After header and backs off for exactly that duration instead of using a fixed retry interval.

Relay WebSocket Auto-Restart

The relay WebSocket connection monitors itself. On disconnect (network drop, relay restart), it automatically reconnects with exponential backoff — no cq serve restart required.


Knowledge Loop (v1.40-v1.48)

The Knowledge Loop learns from user corrections and experiment results, promoting stable patterns into shared knowledge.

User correction / explicit feedback
    |
    v
PreferenceLedger (count per preference)
    |
    v
GrowthMetrics (session corrections + trend)
    |
    v
RulePromoter
    |-- count >= 3 -> promote to hint (CLAUDE.md hint section)
    |-- count >= 5 -> promote to rule (.claude/rules/<topic>.md)
    |
    v
GlobalPromoter (depersonalized)
    +-> community knowledge pool (shared to all users via GlobalPromoter)

Components

ComponentRole
PreferenceLedgerTracks each user preference with occurrence count
GrowthMetricsPer-session correction count + multi-session trend
RulePromoterGraduates hints to rules at count thresholds (3→hint, 5→rule)
GlobalPromoterStrips personal context and publishes patterns to community knowledge

Output Files

  • Hints and rules are written directly to CLAUDE.md and .claude/rules/
  • Rules become active immediately on next Claude Code session load
  • GlobalPromoter output feeds cq_knowledge_publish for cross-user sharing

TUI Dashboard (v1.44-v1.46)

Three terminal UI commands built on BubbleTea.

cq jobs

Full-featured job monitor with:

  • Detail panel: side panel showing job spec, logs, and metrics for the selected job
  • Adaptive multi-row charts: metric charts that expand rows based on terminal height
  • Compare mode: select two jobs and diff their metrics side by side

cq workers

Worker Connection Board — shows all registered workers with status, affinity scores, last heartbeat, and current job assignments.

cq dashboard

Unified board menu — entry point that routes to jobs, workers, or project status view.

cq dashboard
    +-- [j] Jobs monitor     (cq jobs)
    +-- [w] Workers board    (cq workers)
    +-- [s] Project status   (cq status)

Session Intelligence (v1.39-v1.41)

/done vs /exit Split

CommandCapture depthUse when
/doneFull — structured summary, knowledge extraction, persona updateCompleting real work
/exitLight — minimal metadata onlyAbandoning or quick close

Summarization Prompts

/done uses deeper prompts designed to extract actionable knowledge:

  • Decisions made and rationale
  • Patterns discovered
  • Problems encountered and how they were resolved
  • Next steps with concrete starting point

Fallback Handling

  • Global DB fallback: if Supabase is unavailable, session summary writes to local SQLite
  • LLM failure metadata: if the summarization LLM call fails, the raw session text is stored with llm_failed=true for retry on next connection

MCP Apps (Widget System)

When a tool is called with format=widget, the response includes _meta.ui.resourceUri. The MCP client fetches the HTML via resources/read and renders it in a sandboxed iframe.

Tool call (format=widget)
  -> handler returns {data: {...}, _meta: {ui: {resourceUri: "ui://cq/..."}}}
  -> client calls resources/read("ui://cq/...")
  -> ResourceStore returns embedded HTML
  -> client renders in sandboxed iframe
Widget URIToolDescription
ui://cq/dashboardc4_dashboardProject status summary
ui://cq/job-progressc4_job_statusJob progress
ui://cq/job-resultc4_job_summaryJob results
ui://cq/experiment-comparec4_experiment_searchExperiment comparison
ui://cq/task-graphc4_task_graphTask dependency graph
ui://cq/nodes-mapc4_nodes_mapConnected nodes map
ui://cq/knowledge-feedcq_knowledge_searchKnowledge feed
ui://cq/cost-trackerc4_llm_costsLLM cost tracker
ui://cq/test-resultscq_run_validationTest results
ui://cq/git-diffc4_diff_summaryGit diff viewer
ui://cq/error-tracec4_error_traceError trace viewer

Knowledge System

4-layer pipeline: every task decision becomes searchable knowledge for future tasks.

Plan (knowledge_search) -> Task DoD (Rationale) -> Worker (knowledge_context injected)
     ^                                                       |
pattern_suggest <- distill <- autoRecordKnowledge <- Worker complete (handoff)
  • FTS5: Full-text search on all knowledge records
  • pgvector: OpenAI 768-dim embeddings (or Ollama 768-dim nomic-embed-text)
  • 3-way RRF: Ranked fusion of FTS + vector + popularity scores
  • Auto-distill: Triggered by /finish when knowledge count >= 5
  • Cloud sync: Local SQLite <-> Supabase pgvector sync
  • Cross-project: cq_knowledge_publish / cq_knowledge_pull for sharing

Knowledge handoff (cq_submit)

Workers submit structured handoff with their task:

json
{
  "summary": "What was implemented",
  "files_changed": ["src/feature.go"],
  "discoveries": ["pattern X works better than Y"],
  "concerns": ["edge case Z not handled"],
  "rationale": "Why approach A was chosen"
}

This is auto-parsed and recorded as knowledge for future workers.


Hub (Distributed Execution)

The Hub is a distributed job queue backed by Supabase PostgreSQL. Workers pull jobs via a lease model.

Developer (laptop)
  +-- cq_job_submit(spec, routing={tags: ["gpu"]}) -->+
                                                      |
                                    Supabase: hub_jobs INSERT
                                              | pg_notify('new_job')
                                              v
                                    Worker (remote GPU server)
                                      +- ClaimJob (lease)
                                      +- Execute
                                      +- Upload artifacts
                                      +- CompleteJob

DAG Pipelines

c4_hub_dag_create (nodes + edges)
    |
    v (topological sort -> root nodes auto-submitted)
    v
Worker completes node -> advance_dag -> next layer released
    |
    v
All nodes complete -> DAG complete event

Worker Affinity

Workers are automatically routed based on affinity scores:

affinity_score = project_match * 10 + tag_match * 3 + recency * 2 + success_rate * 5

View affinity scores: cq hub workers (shows AFFINITY column).


Relay (NAT Traversal)

The relay enables external MCP clients to reach local workers through NAT.

External MCP client (Cursor / Codex / Gemini CLI)
    | HTTPS (MCP over HTTP)
    v
cq-relay.fly.dev  [Go relay server]
    ^ WSS (outbound, worker connects first)
cq serve  [local / cloud worker]
    |
    v
Go MCP Server (stdio) + Python Sidecar

Authentication flow:

  1. cq auth login -> Supabase Auth -> JWT issued + relay URL auto-configured
  2. cq serve starts -> relay WSS connection (Authorization: Bearer JWT)
  3. Relay verifies token, registers worker tunnel
  4. External client -> https://cq-relay.fly.dev/<worker-id> -> relay -> WSS -> worker

The relay WebSocket auto-restarts on disconnect (see Worker Survivability above).


EventBus (C3)

gRPC UDS daemon with WebSocket bridge. 18 event types. 78 tests.

EventBus (gRPC UDS)
    |-- Rules engine (YAML routing)
    |-- DLQ (dead letter queue)
    |-- WebSocket bridge (external subscribers)
    +-- HMAC-SHA256 webhook delivery

Event types:

CategoryEvents
Taskstask.completed, task.updated, task.blocked, task.created
Checkpointscheckpoint.approved, checkpoint.rejected
Reviewsreview.changes_requested
Validationvalidation.passed, validation.failed
Knowledgeknowledge.recorded, knowledge.searched
Hubhub.job.completed, hub.job.failed, hub.worker.started, hub.worker.offline
Observabilitytool.called (C7), guard.denied (C6)

Remote AI Workspace (Cloudflare Worker)

OAuth 2.1 MCP proxy. Any AI (ChatGPT, Claude web, Gemini) can access CQ knowledge without local install.

Tools exposed via Remote AI Workspace:

ToolDescription
cq_knowledge_recordAI proactively saves knowledge (5-condition trigger in tool description)
cq_knowledge_searchVector + FTS + ilike 3-stage fallback search
c4_session_summaryCapture complete session summary on conversation end
cq_statusRead current project state

Python Sidecar (c4/)

Go MCP server delegates 10 tools to Python via JSON-RPC/TCP (lazy-started).

Go MCP Server -- JSON-RPC/TCP --> Python Sidecar (10 tools)
                                    +-> LSP (7): find_symbol, get_overview, replace_body,
                                    |          insert_before/after, rename, find_refs
                                    |          (Python/JS/TS only -- Go/Rust: use cq_search_for_pattern)
                                    +-> Doc (2): parse_document, extract_text
                                    +-> Onboard (1): cq_onboard
  • Lazy Start: Sidecar starts only on first proxy tool call
  • Graceful fallback: If Python/uv unavailable, LSP/Doc tools are disabled (not a crash)

State Machine

INIT -> DISCOVERY -> DESIGN -> PLAN -> EXECUTE <-> CHECKPOINT -> REFINE -> POLISH -> COMPLETE
                                          |
                                          +-> HALTED (resumable)
StateMeaning
INITProject created, no tasks yet
DISCOVERYGathering requirements (c4-plan Phase 1)
DESIGNArchitecture decisions (c4-plan Phase 2)
PLANTasks created, ready to execute
EXECUTEWorkers active, tasks being claimed
CHECKPOINTPhase gate reached, review in progress
HALTEDExecution paused, resumable with /run
COMPLETEAll tasks done, ready for /finish

Security: Permission Hook

Two-layer gate on all tool use and shell execution:

PreToolUse event
    |
    v
c4-gate.sh (pattern match)
    |-- allow_patterns -> immediate allow
    |-- model mode -> Haiku API decision
    |-- block_patterns -> block (with audit log)
    +-- fallback -> built-in safe patterns

PermissionRequest event
    |
    v
c4-permission-reviewer.sh (Haiku classification)

Configuration (.c4/config.yaml):

yaml
permission_reviewer:
  enabled: true
  mode: hook        # "hook" (regex only) or "model" (Haiku API)
  auto_approve: true
  allow_patterns: []
  block_patterns: []

Supabase Schema (Key Tables)

TablePurpose
c4_tasksTask queue (state, assignments, commit SHAs)
c4_documentsKnowledge records (content, embeddings, FTS)
c4_projectsProject registry (owners, settings)
hub_jobsDistributed job queue (spec, status, lease)
hub_workersRegistered workers (capabilities, affinity)
hub_dagsDAG pipeline definitions
hub_cron_schedulesCron job definitions
c4_drive_filesDrive file metadata (hash, URLs, versions)
c4_datasetsDataset registry with content-addressable versioning
c1_messagesInter-session and messaging channel messages
notification_channelsTelegram/Dooray notification configs

52 migrations, RLS policies on all user-facing tables, pgvector extension for embeddings.


Skills (v1.48)

42 skills across research, ML, data, and orchestration domains. Skills are loaded from .claude/skills/ and invoked via /skill-name.

CategoryCountExamples
Research & Experiment8c9-loop, c9-survey, research-loop, experiment-workflow
ML / Data Science13eda-profiler, pytorch-model-builder, transfer-learning-expert
Orchestration9plan, run, finish, quick, pi
Dev Quality7tdd-cycle, spec-first, debugging, company-review
Other5release, incident-response, standby, pdf

Skills interact with CQ MCP tools directly — a skill is a structured prompt that drives the MCP server, not a separate binary.