Agents Frame is a four-stage pipeline. Every entry point (REST, MCP, CLI) funnels into the same engine, so behavior is identical regardless of how you call in.
intent ──► detectLang ──► embed ──► recall ──► rerank ──► selected
                (en/zh)   (Gemini)  (pgvector  (ELO-       framework
                          1536d)    cosine)    weighted)   + promptMd

1. Language detection

The router auto-detects whether your intent is English or Chinese (the two languages currently supported). You can override with the lang field. Every framework has a parallel bilingual prompt, so switching language switches the whole experience — not just the surface text.

2. Embedding

Intents are embedded with Google’s gemini-embedding-001 model at 1536 dimensions. Embeddings live close to the database (Vertex AI in the same region as Neon Postgres us-east-1) to keep round-trip latency in the low hundreds of milliseconds.

3. Recall

The embedded intent is matched against FrameworkEmbedding rows in Postgres using pgvector cosine similarity. We return the top recallK frameworks (default 16) — wide enough to leave room for the re-ranker, narrow enough to stay cheap.
pgvector is the only vector store in the loop. There’s no separate ANN service, no embedding cache layer, no third-party retrieval hop. Fewer moving parts, fewer places to debug.

4. Rerank

Each candidate carries an ELO score updated from real user feedback (UserFeedback table). The final score blends cosine similarity with the framework’s win-rate, so frameworks that actually help float to the top over time. The top-1 candidate becomes selected; the rest are returned as candidates so you can inspect alternatives or build your own selection UI.

Output shape

type RouteResult = {
  selected: {
    slug: string;
    frameworkVersionId: string;  // pin this for reproducibility
    promptMd: string;            // the actual prompt to use
  };
  candidates: Array<{ slug: string; score: number }>;
  detectedLang: 'en' | 'zh';
  latencyMs: number;
};

Stateless by design

The router holds no conversation state. Each /v1/think call is a pure function of (intent, lang, topK, recallK, category, excludeSlugs) plus the framework library version. Your agent keeps its own memory; Agents Frame just answers a single question — which framework? — and answers it the same way every time.

What’s next

The same pipeline will scale from 5 frameworks to 30 without any contract change. Future versions will add categories (decision, debugging, planning, communication), cross-lingual recall, and optional RAG-style enrichment from your private framework set.