Run distributed LLM evaluations with remote rollout processing and automatic trace collection. No manual trace management required.
📖 New to Remote Rollout Processing? Read the complete Remote Rollout Processor Tutorial first - this repository serves as the working example for that tutorial.
pip install eval-protocolSet up your Fireworks API Key for model inference:
export FIREWORKS_API_KEY="your_fireworks_key"Start the Python server:
python -m remote_serverIn another terminal, run the evaluation test:
pytest quickstart.py -vsStart the TypeScript server:
cd typescript-server
pnpm install
pnpm run devIn another terminal, run the same evaluation test:
pytest quickstart.py -vs- /init triggers one rollout: Eval Protocol makes a POST
/initrequest with the row payload and correlation metadata to our server onhttp://127.0.0.1:3000, which triggers the rollout (in this case a simple chat completion asking "What is the capital of France"). - Send logs via
FireworksTracingHttpHandler: Our server emits structured logs tagged with the rollout’s correlation fields. - Send chat completions and store as trace: Our chat completion calls are recorded as traces in Fireworks.
- Once rollout finished, pull full trace and evaluate: Eval Protocol polls Fireworks for a completion signal, then pulls the trace back and scores.
After running the test, start the local UI server:
ep logs
At http://localhost:8000, you'll see results like:

