RecompAI | Russell Moore

Overview

RecompAI is an AI-powered body recomposition coaching platform that replaces the fragmented stack of calorie counters, supplement trackers, and generic fitness apps with a single system that understands your full health context. Users log meals via text, photo, or voice; manage peptide, TRT, and supplement protocols with dose escalation tracking; log workouts and body metrics; and get personalized coaching through natural conversation — all from one interface.

The technical ambition behind RecompAI is running a sophisticated multi-agent AI system entirely on Cloudflare’s edge infrastructure. No GPU servers, no traditional cloud backend. Every AI inference call, every database query, every image analysis runs on Workers, D1, R2, Vectorize, and Workers AI. The project has become a deep exploration of what’s possible — and what breaks — when you build production AI features on constrained edge compute.

The codebase is ~275K lines of TypeScript across 323+ files, shipped over 9 major milestones since January 2026.

Architecture

Multi-Agent Routing

The core of the system is a multi-agent dispatch architecture that replaced an earlier monolithic AI layer. Every incoming message flows through a lightweight LLM router (Llama 3.1 8B) that classifies intent into one of four domains: meal, protocol, body-metrics+workout, or general. High-confidence single-domain messages route directly to a specialist agent with scoped tools and a domain-specific system prompt. Multi-domain or ambiguous messages route to a general coordinator that fans out to specialists in parallel and synthesizes their responses.

This decomposition was driven by a practical problem: a single model with access to all 20+ tools made unreliable tool selections. Scoping each specialist to only its domain’s tools (meal specialist sees only meal tools, protocol specialist sees only protocol tools) dramatically improved tool-calling accuracy and reduced token usage per request.

The coordinator itself calls no domain tools — it delegates to specialist runner functions and synthesizes results into one natural coaching response. This separation prevents the coordinator from accidentally logging data or modifying protocols when it should only be aggregating information.

Five Models, Five Jobs

Not every task needs the same model. RecompAI uses five different models, each selected for a specific capability:

Llama 3.1 8B — Intent classification router. Fast, cheap, and accurate enough for a 4-class classification task. Also handles onboarding Q&A where no tool calling is needed.
Llama 3.2 11B Vision — Food photo analysis. The only multi-modal model in the stack. Identifies food items from photos, estimates portions, and returns structured macro data.
Llama 4 Scout 17B — Primary tool-calling model for specialist agents. Promoted over the 70B model after testing revealed that the larger model had nondeterministic tool sequencing and weak hypothetical rejection. Scout is faster and more reliable at following tool-calling instructions.
GLM 4.7 Flash — Fallback model when the primary fails or is unavailable.
BGE Base (bge-base-en-v1.5) — Embedding model for semantic food search via Vectorize.

Model selection is configurable per-task through environment variables, so swapping models requires a config change rather than a code deploy.

Context Assembly

Every chat request assembles a rich context payload before calling the AI model. The system loads the user’s profile, today’s meals with running macro totals, the last 20 chat messages, active protocols with dose schedules, recent workouts, and optionally relevant past context from Vectorize semantic search.

Context loading is configurable to avoid waste. A meal logging request loads meal-scoped context (no workout history, no chat archive), reducing memory usage by 60-80% compared to loading everything. An analytics query loads date-ranged data. A quick profile check loads almost nothing. This matters on Workers, where execution time and memory are constrained.

The Tool-Calling Problem

The most technically interesting part of RecompAI is the set of guardrails engineered around LLM tool calling. This is where the real lessons live.

Why Cloudflare’s `runWithTools` Doesn’t Work

Cloudflare provides a runWithTools utility for tool-calling loops. It has a critical flaw: when the model returns a text response after processing tool results, runWithTools discards that response and makes another ai.run call for a “final response.” This second call often produces worse or wrong output because it lacks the context of the tool results the model just processed.

RecompAI replaces runWithTools with a custom tool loop that returns the model’s response as soon as it stops calling tools, without making redundant extra calls. The custom loop also passes max_tokens properly (Workers AI defaults to ~256 tokens when unset, causing truncation) and sets temperature: 0 with top_k: 1 for deterministic tool calling — the default temperature: 0.6 causes the model to randomly choose between calling tools and narrating what it would do.

Nudge-Retry Logic

Even with low temperature, Llama 4 Scout sometimes narrates tool actions as text instead of actually calling the tools. “I’ll log that meal for you” — but no log_meal tool call appears. The custom tool loop detects this on round 0: if the model returns text without tool calls on an action-phrased message, it appends a nudge message (“Please use the available tools to complete this action. Do not describe what you would do — call the tools directly.”) and retries once. This single retry fixes the majority of narration failures.

Duplicate Call Detection

Models can get stuck in loops, calling the same tool with the same arguments repeatedly. The tool loop tracks call signatures (tool name + serialized arguments) across rounds and breaks the loop when it detects a duplicate, falling back to a final response generation without tools.

USDA-First Guardrail

A code-level guardrail blocks log_meal from executing unless search_usda has been called first in the same request. This prevents the model from logging meals with hallucinated macro values. The guardrail returns a synthetic error to the model explaining what it needs to do: “You must call search_usda for each food item BEFORE calling log_meal.” The model then self-corrects on the next round.

This guardrail exists because prompt instructions alone are insufficient — models will occasionally skip the search step and estimate macros from training data, especially under token pressure or when the food item is common enough that the model “knows” the answer.

Hypothetical Intent Filtering

Users ask questions like “What would happen if I ate 200g of protein?” or “How much protein is in a chicken breast?” These are questions, not logging requests. But models with available write tools will eagerly log data on hypothetical messages.

The solution is surgical: a regex-based detector identifies hypothetical/question intent, and when triggered, the system strips write tools from the tool schema before the AI call. The model literally cannot call log_meal or log_weight because those tools don’t exist in its schema for that request. This is more reliable than prompt instructions because it removes the possibility rather than asking the model to exercise restraint.

Destructive Operation Gate

Archive and delete operations (like removing a protocol or deleting a dose record) are blocked on round 0 of the tool loop. The guardrail forces the model to describe what the operation will do and ask for user confirmation before it can execute. On subsequent rounds (after the user has confirmed), the operations are allowed through.

Food Search System

Meal logging depends on accurate food identification. RecompAI runs a dual-search system combining FTS5 full-text search against a downloaded USDA food database in D1 with Vectorize semantic search, both executing in parallel.

FTS5 handles exact and prefix matches — “chicken breast” finds chicken breast entries directly. Vectorize handles semantic queries — “something to keep my food safe from bears” would find bear canisters if this were a product catalog. For food, semantic search catches natural language descriptions that keyword matching misses.

Results are merged, deduplicated by FDC ID, and scored. Foods found in both searches get a boosted score. Brand detection adds another layer: queries like “Kraft mac and cheese” trigger brand-boosted BM25 weighting that heavily favors brand_name and brand_owner column matches, with a 100ms timeout that falls back gracefully to standard search if the brand path is slow.

Compound Knowledge Base

The protocol domain is backed by a 37-compound knowledge base covering TRT, GLP-1 agonists, peptides, and ancillaries. Each compound has structured data (clinical dose ranges, canonical units, half-life, common protocols, FDA ranges) stored in KV, plus deep-research narrative content stored as markdown files in R2 and auto-indexed via Cloudflare AutoRAG for semantic retrieval.

When a user asks about a compound, a pre-flight search scores the message against the knowledge base. High relevance scores inject the compound knowledge into the specialist’s context with explicit instructions to use only the provided knowledge and not training data (which may be outdated for rapidly evolving compounds). Low scores trigger fuzzy matching to suggest similar compound names.

Smart dose validation catches input errors before they corrupt protocol history: range checks against the knowledge base, 5x deviation detection from the user’s recent doses, and unit consistency checks (mg vs mcg vs IU).

Security

Input sanitization strips prompt injection patterns (instruction manipulation, role-playing attempts, template injection, code blocks, dangerous Unicode) before any message reaches the AI. A scoring system classifies injection risk as low, medium, or high — high-risk messages are blocked entirely and return a safe fallback response. Rate limiting is applied per-endpoint with configurable thresholds. JWT authentication uses Web Crypto API with token rotation.

Custom ESLint rules enforce SQL injection prevention and domain boundary integrity at lint time, catching violations before they reach production.

Eval Framework

A Claude-as-judge evaluation framework validates AI behavior across test cases covering all four domains. Each test case specifies expected routing, required and forbidden tool calls, and a rubric with weighted criteria. The eval runner executes the full dispatch pipeline against each case, and a Claude judge scores the response on a multi-dimensional rubric.

This catches regressions that unit tests miss — like a model upgrade that routes correctly but produces worse coaching responses, or a prompt change that improves one domain while degrading another.

Tech Stack

Cloudflare Workers — All API routes, AI inference orchestration, marketing site SSR, and frontend serving from a single Worker entry point.
D1 — User data, meals, protocols, chat history, body metrics, workouts, USDA food database with FTS5 indexing. Optimized indexes support context loading patterns.
R2 — Food photos, progress photos, compound knowledge base documents, marketing assets.
KV — Pre-computed weekly summaries, recent conversation snapshots, protocol status cache, structured compound data (37 compounds with Zod-validated schemas).
Vectorize — Two shared indexes: recomp-food-names for semantic food search and recomp-chat-summaries for coaching context retrieval. Per-user metadata filtering.
Workers AI — Five models for five jobs (see architecture section). All inference at the edge.
React 19 — Frontend PWA with Vite and Tailwind CSS.
Capacitor 8 — Native iOS wrapper with camera integration for meal photos, push notifications for supplement reminders, and offline meal draft support.
Resend + React Email — Transactional emails (verification, password reset) and automated weekly coaching digests triggered by Worker cron.

Outcomes

Multi-agent architecture decomposing a monolithic AI layer into 6 bounded domain modules with specialist agents and a coordinator
Custom tool-calling loop replacing Cloudflare’s runWithTools with nudge-retry, duplicate detection, and code-level guardrails
Five-model orchestration matching model capabilities to task requirements for optimal cost, speed, and reliability
Dual-search food system combining FTS5 and Vectorize with brand detection for accurate meal logging against a full USDA database
37-compound knowledge base with AutoRAG semantic search, KV-backed structured data, and smart dose validation
Claude-as-judge eval framework catching behavioral regressions across routing, tool usage, and response quality
275K+ lines of TypeScript shipped across 9 milestones in under 3 months, running entirely on Cloudflare’s edge infrastructure