1. Language detection
The router auto-detects whether your intent is English or Chinese (the two languages currently supported). You can override with thelang
field. Every framework has a parallel bilingual prompt, so switching
language switches the whole experience — not just the surface text.
2. Embedding
Intents are embedded with Google’sgemini-embedding-001 model at
1536 dimensions. Embeddings live close to the database (Vertex AI in
the same region as Neon Postgres us-east-1) to keep round-trip latency
in the low hundreds of milliseconds.
3. Recall
The embedded intent is matched againstFrameworkEmbedding rows in
Postgres using pgvector cosine similarity. We return the top
recallK frameworks (default 16) — wide enough to leave room for the
re-ranker, narrow enough to stay cheap.
pgvector is the only vector store in the loop. There’s no separate
ANN service, no embedding cache layer, no third-party retrieval
hop. Fewer moving parts, fewer places to debug.
4. Rerank
Each candidate carries an ELO score updated from real user feedback (UserFeedback table). The final score blends cosine
similarity with the framework’s win-rate, so frameworks that
actually help float to the top over time.
The top-1 candidate becomes selected; the rest are returned as
candidates so you can inspect alternatives or build your own
selection UI.
Output shape
Stateless by design
The router holds no conversation state. Each/v1/think call is a
pure function of (intent, lang, topK, recallK, category, excludeSlugs)
plus the framework library version. Your agent keeps its own memory;
Agents Frame just answers a single question — which framework? —
and answers it the same way every time.