Routing System Architecture

Tier 2 | Deep technical documentation for model routing Hub: README.md | Full Architecture: ARCHITECTURE.md

Overview

The routing system intelligently selects the optimal CLI/model for each task through a multi-stage pipeline. The full executed order in composite-router-stages.ts:runPipeline is:

Task
  → Budget                              (filter — eliminate over-budget CLIs)
  → Scoring (parallel)                  (ConfidenceCascade, CapabilityMatch,
                                         KnnRouting, DistilledRule,
                                         ResourceStrategy, ZeroRouter, Preference)
  → QualityConstraint                   (constraint-first; can short-circuit, #1686)
  → CategoryOverride                    (CATEGORY_CHAIN_OVERRIDES per category;
                                         can short-circuit on sensitive cats,
                                         #2414/#2417)
  → TOPSIS                              (rank, with stage-score-adjusted profiles
                                         and performance-floor penalties, #1354/#1401)
  → LinUCB                              (bandit selection from ranked candidates)
  → PerfFloorOverride                   (reject LinUCB pick if CLI < 50% success at
                                         ≥20 samples; promote TOPSIS top, #1790)
  → Latency                             (record per-CLI latency for feedback loop)
  → Selected Model

The simpler legacy “Budget → ZeroRouter → Preference → TOPSIS → LinUCB” 5-stage diagram pre-dated #755/#1350/#2414. The constraint and category-override stages can short-circuit routing without ever reaching TOPSIS — omitting them gives the wrong mental model when debugging “why was my model rejected?” (#2947).

Use CompositeRouter.route(task) — do NOT directly instantiate stage routers.

CompositeRouter Pipeline

Chains multiple routers in sequence for intelligent model selection.

interface ICompositeRouter {
  route(task: CliTask): Promise<Result<CompositeRoutingDecision, CompositeRoutingError>>;
  getStats(): CompositeRouterStats;
  invalidateCaches(): void;
}

interface CompositeRoutingDecision {
  readonly cliName: 'claude' | 'gemini' | 'codex' | 'opencode';
  readonly reason: string;
  readonly confidence: number;
  readonly topsisScore?: number;
  readonly linucbExploration?: number;
  readonly alternatives: readonly ('claude' | 'gemini' | 'codex' | 'opencode')[];
  readonly stagesExecuted: readonly string[];
}

Stage 1: Task Analysis

Profiles tasks before routing:

Characteristic	Derived From	Impact
`reasoningComplexity`	Keywords (“design”, “architect”)	Boosts Claude quality score
`contextRequired`	0.25 tokens/char + 500 tokens/file	Filters by context window
`codeGeneration`	Keywords (“implement”, “write”)	Boosts Codex score
`budgetSensitive`	Keywords (“quick”, “simple”)	Prioritizes Gemini

Stage 2: Budget Filter

Enforces token/cost/latency constraints:

interface BudgetConstraint {
  readonly maxTokens?: number;
  readonly maxCostUsd?: number;
  readonly maxLatencyMs?: number;
}

Stage 3: TOPSIS Ranking

Multi-criteria decision for Pareto-optimal selection:

Criterion	Weight	Direction	Description
Quality	50%	Maximize	Reasoning + code generation
Cost	30%	Minimize	$/token estimate
Latency	20%	Minimize	Response time

Stage 4: LinUCB Learning

Contextual bandit learns from outcomes:

// 6D context vector
const context = {
  taskComplexity: 0.8, // Normalized 0-1
  contextLengthNormalized: 0.3, // Tokens / max context
  isCodeTask: true,
  isReasoningTask: false,
  budgetUtilization: 0.2, // % of budget used
  timePressure: 0.0, // Deadline proximity
};

// UCB score calculation
UCB = E[reward | context] + alpha * sqrt(uncertainty);

Task Router Interface

Routes tasks to optimal CLI based on capability matching.

interface ITaskRouter {
  route(task: Task): Promise<Result<ICliAdapter, RoutingError>>;
  routeWithDetails(task: Task): Promise<Result<RoutingDecision, RoutingError>>;
}

interface RoutingDecision {
  readonly adapter: ICliAdapter;
  readonly confidence: number; // 0-1 routing confidence
  readonly reason: string; // Why this CLI was chosen
  readonly alternatives: readonly ICliAdapter[];
  readonly decisionTimeMs: number;
}

type CliName = 'claude' | 'gemini' | 'codex' | 'opencode';
type CliTransport = 'mcp' | 'subprocess';

Budget Router (IBudgetRouter)

Budget-constrained routing with PILOT pattern (arXiv:2508.21141).

interface IBudgetRouter {
  getSessionBudget(): SessionBudget;
  updateBudget(usage: { tokens?: number; costUsd?: number }): void;
  resetBudget(): void;
  checkBudget(task: CliTask, constraint?: BudgetConstraint): BudgetRoutingResult;
  routeWithBudget(
    task: CliTask,
    budget?: BudgetConstraint
  ): Promise<Result<BudgetRoutingResult, BudgetExceededError>>;
  executeWithBudget(
    task: CliTask,
    budget?: BudgetConstraint
  ): Promise<Result<CliResponse & { budgetAfter: SessionBudget }, CliError>>;
}

Budget Thresholds

Level	Usage	Action
Info	50%	Log usage
Warning	75%	Warn user
Critical	90%	Suggest task simplification
Hard	100%	Reject task

Session Budget

interface SessionBudget {
  readonly tokenBudget: number; // Default: 1M tokens
  readonly costBudgetUsd: number; // Default: $10
  readonly tokensUsed: number;
  readonly costUsed: number;
  readonly resetAt: number; // Epoch ms
}

Circuit Breaker (ICircuitBreaker)

Prevents cascading failures with configurable thresholds.

interface ICircuitBreaker {
  execute<T>(operation: () => Promise<T>): Promise<T>;
  getState(): CircuitState; // 'closed' | 'open' | 'half_open'
  recordFailure(category: FailureCategory): void;
  recordSuccess(): void;
  reset(): void;
  getSnapshot(): CircuitBreakerSnapshot;
}

State Transitions

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: failures >= threshold
    Open --> HalfOpen: timeout elapsed
    HalfOpen --> Closed: success
    HalfOpen --> Open: failure

Configuration

circuitBreaker:
  failureThreshold: 5 # Failures before open
  successThreshold: 2 # Successes to close from half-open
  timeout: 30000 # ms before half-open
  rollingWindow: 60000 # ms for failure counting

CLI Detection Cache (ICliDetectionCache)

Caches CLI health check results with TTL and invalidation.

interface ICliDetectionCache {
  get(cliName: CliName): Promise<CliHealthResult | undefined>;
  set(cliName: CliName, result: CliHealthResult): Promise<void>;
  invalidate(cliName: CliName): void;
  invalidateAll(): void;
  getStats(): CacheStats;
  onInvalidate(listener: (cliName: CliName) => void): () => void;
}

interface CliHealthResult {
  readonly available: boolean;
  readonly version?: string;
  readonly checkedAt: number;
  readonly error?: string;
}

Cache TTL Strategy

Scenario	TTL	Rationale
Available	5 minutes	Stable, reduce checks
Unavailable	30 seconds	Retry quickly after failure
Version change	Immediate	Capabilities may differ

Token Counter (ITokenCounter)

Universal token counting across model providers.

interface ITokenCounter {
  count(text: string): Promise<TokenCountResult>;
  countMessages(messages: Message[]): Promise<TokenCountResult>;
  getMaxTokens(): number;
  getProvider(): TokenCounterProvider;
}

type TokenCounterProvider = 'tiktoken' | 'anthropic' | 'heuristic';

Provider Selection

Provider	Accuracy	Speed	Use Case
`tiktoken`	High	Fast	OpenAI models
`anthropic`	Exact	Medium	Claude models
`heuristic`	±10%	Instant	Quick estimates

Capacity Monitor (ICapacityMonitor)

Tracks rate limits across model providers.

interface ICapacityMonitor {
  updateFromHeaders(provider: string, headers: Headers): void;
  getCapacity(provider: string): CapacityInfo | null;
  onLowCapacity(callback: LowCapacityCallback): () => void;
  setLowCapacityThreshold(threshold: number): void;
  getTimeUntilReset(provider: string): number | null;
}

interface CapacityInfo {
  readonly remainingTokens: number;
  readonly remainingRequests: number;
  readonly resetTime: Date | null;
  readonly utilizationPercent: number;
}

Rate Limit Headers

Provider	Token Header	Request Header
Anthropic	`anthropic-ratelimit-*`	`anthropic-ratelimit-*`
OpenAI	`x-ratelimit-*-tokens`	`x-ratelimit-*-requests`
Google	`x-goog-api-*`	`x-goog-api-*`

Work Balancer (IWorkBalancer)

Distributes parallel tasks across available CLIs.

interface IWorkBalancer {
  balance(tasks: TaskProfile[]): Promise<BalanceResult>;
  queueTask(task: TaskProfile): void;
  getQueueDepth(): number;
  clearQueue(): void;
}

interface BalanceResult {
  assignments: Map<string, CliName>;
  unassigned: string[];
  reasoning: Record<string, ScoreBreakdown>;
}

Balancing Algorithm

Capacity check: Filter CLIs with available capacity
Task match: Score CLI capabilities vs task requirements
Load balance: Distribute evenly with affinity hints
Fallback: Queue tasks if all CLIs at capacity

Feedback Integration (IFeedbackIntegration)

Connects routing decisions to outcomes for closed-loop learning.

interface IFeedbackIntegration {
  recordRoutingDecision(decision: CompositeRoutingDecision): string;
  recordOutcome(routingId: string, outcome: TaskOutcome): void;
  getRoutingStats(cliName: CliName): RoutingOutcomeStats;
  exportFeedback(): FeedbackExport;
}

interface TaskOutcome {
  readonly success: boolean;
  readonly latencyMs: number;
  readonly tokensUsed?: number;
  readonly errorCategory?: string;
}

interface RoutingOutcomeStats {
  readonly totalRoutings: number;
  readonly successRate: number;
  readonly avgLatencyMs: number;
  readonly avgTokens: number;
}

Reward Computation

reward = success * 0.5 + (1 - retries / max) * 0.3 + coherence * 0.2;

CLI Debugging

# Dry-run routing for a task
nexus-agents routing-audit "Implement a sorting algorithm" --format=json

# Output shows:
# - Task profile analysis
# - Budget filter results
# - TOPSIS scores per CLI
# - LinUCB selection with UCB scores
# - Feature importance analysis

# Show bandit statistics
nexus-agents routing-audit "task" --bandit-stats

Configuration

routing:
  enableBudgetFilter: true # Stage 2 on/off
  enableTopsisRanking: true # Stage 3 on/off
  enableLinUCBSelection: true # Stage 4 on/off

  budget:
    tokenBudget: 1000000 # Session token limit
    costBudgetUsd: 10.0 # Session cost limit
    resetIntervalMs: 3600000 # 1 hour reset

  topsis:
    qualityWeight: 0.5
    costWeight: 0.3
    latencyWeight: 0.2

  linucb:
    alpha: 1.0 # Exploration parameter

Difficulty Estimation

Tier-routing difficulty estimation is done by the ZeroRouter (see “Source Files” below). The composite-router consumes decision.difficulty / decision.tier from ZeroRouter for fast/balanced/powerful tier selection.

History note (#2940): an alternate DAAOEstimator (VAE-inspired, arXiv:2509.11079) was prototyped under Issue #334 and exported from cli-adapters/index.ts, but #334 ended up being implemented via ZeroRouter, not DAAO. The DAAO surface was retired in #2940 — see that issue for the full removal scope. If a true alternate difficulty estimator with different feature weights returns, reintroduce alongside its wiring stage in the same PR.

Source Files

File	Purpose
`src/cli-adapters/composite-router.ts`	Main routing pipeline
`src/cli-adapters/budget-router.ts`	Budget enforcement
`src/cli-adapters/topsis-router.ts`	Multi-criteria ranking
`src/cli-adapters/linucb-bandit.ts`	Contextual bandit
`src/cli-adapters/zero-router.ts`	Difficulty estimation
`src/cli-adapters/circuit-breaker.ts`	Fault tolerance
`src/cli-adapters/cli-detection-cache.ts`	Health check caching
`src/context/token-counter.ts`	Token counting
`src/adapters/capacity-monitor.ts`	Rate limit tracking
`src/learning/feedback-integration.ts`	Outcome learning
`src/cli/routing-audit.ts`	Debug CLI command

Research Sources

Technique	Paper	Paper-Reported Metrics (not measured on this system)
PILOT Budget Routing	arXiv:2508.21141	Budget-constrained routing
TOPSIS Multi-Criteria	arXiv:2509.07571	31.46% cost reduction (paper benchmark)
IPR Quality Routing	arXiv:2509.06274	43.9% cost reduction (paper benchmark)
RouteLLM Preference	arXiv:2406.18665	2x cost reduction (paper benchmark)
SATER Confidence	arXiv:2510.05164	50%+ cost reduction, 80% latency reduction (paper)

Memory System: MEMORY_SYSTEM.md
Agent System: AGENT_SYSTEM.md
Full Architecture: ARCHITECTURE.md

Routing System Architecture

Overview

CompositeRouter Pipeline

Stage 1: Task Analysis

Stage 2: Budget Filter

Stage 3: TOPSIS Ranking

Stage 4: LinUCB Learning

Task Router Interface

Budget Router (IBudgetRouter)

Budget Thresholds

Session Budget

Circuit Breaker (ICircuitBreaker)

State Transitions

Configuration

CLI Detection Cache (ICliDetectionCache)

Cache TTL Strategy

Token Counter (ITokenCounter)

Provider Selection

Capacity Monitor (ICapacityMonitor)

Rate Limit Headers

Work Balancer (IWorkBalancer)

Balancing Algorithm

Feedback Integration (IFeedbackIntegration)

Reward Computation

CLI Debugging

Configuration

Difficulty Estimation

Source Files

Research Sources

Related Documents