Implement Evaluator Versions #402

dphuang2 · 2026-01-08T00:25:29Z

Note

Implements versioned evaluator workflow and streamlines platform integration.

Switches evaluator upload to version-based APIs: evaluator_versions.create, get_upload_endpoint, validate_upload; create_evaluation now returns (result, version_id)
Adds centralized create_fireworks_client() with FIREWORKS_EXTRA_HEADERS support; migrates SDK calls across evaluation, RFT, platform secrets, and dataset upload
New ep create evj command to create Evaluation Jobs with auto arg generation and local validation hooks
Introduces reusable secrets flow (cli_commands/secrets.py) with selection and double-confirm overrides; integrated into upload, create rft, and create evj
Refactors CLI utils: shared evaluator resolution, local pytest validation, upload-and-poll for ACTIVE version
Improves SQLite robustness with connect/create-table retry wrappers
Updates auth: dotenv discovery/loading helpers and verify base selection
Bumps fireworks-ai to 1.0.0a22; adds comprehensive tests for secrets, client factory, evaluator versioning, CLI flows; VS Code launch example added and .gitignore adjusted

^{Written by Cursor Bugbot for commit 37f4856. This will update automatically on new commits. Configure here.}

- Introduced a new `fireworks_client.py` module to centralize Fireworks SDK client creation. - Updated CLI and evaluation modules to use the new `create_fireworks_client` function instead of direct instantiation of the Fireworks class. - Enhanced handling of API key, account ID, base URL, and extra headers through environment variables. - Added tests for the new Fireworks client factory to ensure proper functionality and configuration.

- Added functionality to load environment variables from .env.dev or .env as a fallback when the auth module is imported. - Updated the API key verification process to allow explicit base URL handling, defaulting to dev.api.fireworks.ai if not provided. - Removed redundant environment variable loading code from platform_api module.

- Introduced functionality to create evaluator versions using parameters such as commit hash, entry point, and requirements. - Updated the upload endpoint call to utilize the newly created evaluator version ID instead of a hardcoded test version ID. - Added error handling for missing evaluator version ID in the response to ensure robustness during code uploads.

eval_protocol/cli.py

update to latest once SDK is published with changes

- Implemented a try-except block to handle APIStatusError during evaluator creation. - Added logic to check for existing evaluators and retrieve the existing one if a conflict occurs (status code 409). - Enhanced logging for better traceability of evaluator creation process.

eval_protocol/cli_commands/create_rft.py

eval_protocol/evaluation.py

…d signature introspection, avoiding unnecessary API requests during help invocations.

…dating polling functions to target specific evaluator versions. Refactor related CLI commands and tests to accommodate these changes, ensuring clearer status messages and improved error handling.

eval_protocol/auth.py

cursor · 2026-01-14T00:05:01Z

eval_protocol/evaluation.py

                logger.warning(f"Code upload failed (evaluator created but code not uploaded): {upload_error}")
                # Don't fail - evaluator is created, just code upload failed
+                # Return None for version_id since upload failed
+                return result, None


Tar file not cleaned up when upload fails

Medium Severity

When the upload process fails after the tar.gz file is created at tar_path, the exception handler returns (result, None) without removing the tar file. The cleanup at lines 347-349 is only reached on the success path. This leaves a {dir_name}.tar.gz file in the user's working directory after every failed upload attempt, which could be large and confusing.

Additional Locations (1)

eval_protocol/evaluation.py#L252-L255

…iable loading into local test command. Introduced functions to find and retrieve values from .env files, enhancing configuration management for Docker tests.

…mplement-evaluator-versions

eval_protocol/cli_commands/upload.py

…kspace so it should include the .env file

eval_protocol/cli_commands/create_rft.py

- improving environment variable management and preventing conflicts with other .env files.

…valuator-versions

- Added `upload_and_ensure_evaluator` function to handle evaluator uploads and ensure the latest version is ACTIVE. - Updated `create_evj_command` and `create_rft_command` to utilize the new upload function. - Removed redundant polling logic from `create_rft.py` and `create_evj.py`, centralizing it in the new utility function. - Adjusted tests to mock the new upload function correctly.

eval_protocol/auth.py

eval_protocol/cli_commands/create_rft.py

- Implemented functions to check for existing secrets and confirm overrides before uploading to Fireworks. - Enhanced user interaction with double confirmation for overriding existing secrets, including fallback for non-interactive environments. - Updated the upload command to handle new and existing secrets separately, ensuring proper management during uploads.

- Added `handle_secrets_upload` function to `create_evj.py`, `create_rft.py`, and `upload.py` for managing secrets with double verification for existing entries. - Streamlined the upload process by consolidating secret management logic, enhancing user interaction during uploads. - Removed redundant secret loading functions from `upload.py` to improve code clarity and maintainability.

…e code clarity and maintainability.

eval_protocol/cli_commands/create_evj.py

- Replaced manual parsing of .env files with `dotenv_values()` for improved handling of comments, quotes, and multi-line values. - Updated `load_secrets_from_env_file` function to return a filtered dictionary of environment variables, enhancing code clarity and maintainability.

…luator-versions

cursor · 2026-01-16T01:13:37Z

eval_protocol/cli.py


    rft_parser.add_argument("--yes", "-y", action="store_true", help="Non-interactive mode")
    rft_parser.add_argument("--dry-run", action="store_true", help="Print planned SDK call without sending")
-    rft_parser.add_argument("--force", action="store_true", help="Overwrite existing evaluator with the same ID")


Missing --env-file argument for rft and evj commands

Medium Severity

Both create_rft_command and create_evj_command access env_file via getattr(args, "env_file", None) and pass it to handle_secrets_upload, but neither rft_parser nor evj_parser defines the --env-file CLI argument. Users cannot specify a custom env file path for these commands, and the code always receives None. The upload command correctly defines this argument but it's missing from the create subcommands.

Additional Locations (2)

eval_protocol/cli_commands/create_rft.py#L539-L545

eval_protocol/cli_commands/create_evj.py#L171-L178

dphuang2 added 5 commits January 7, 2026 13:07

remove launch.json

d465a89

Add .vscode/launch.json to .gitignore

348bb58

cursor bot reviewed Jan 8, 2026

View reviewed changes

eval_protocol/cli.py Outdated Show resolved Hide resolved

dphuang2 added 6 commits January 8, 2026 15:20

test

3dbcd59

REVERT this later

532e071

update to latest once SDK is published with changes

Merge branch 'main' into dhuang/dxe-478-implement-evaluator-versions

5e7a5fa

fix mock tests

060d72c

Support EP_REMOTE_API_KEY

ea08062

cursor bot reviewed Jan 9, 2026

View reviewed changes

eval_protocol/cli_commands/create_rft.py Outdated Show resolved Hide resolved

Merge branch 'main' into dhuang/dxe-478-implement-evaluator-versions

f246087

cursor bot reviewed Jan 10, 2026

View reviewed changes

eval_protocol/evaluation.py Show resolved Hide resolved

dphuang2 added 8 commits January 12, 2026 10:28

include launch.json.backup

6b53ac1

rename to .example and add docker run extra arg

ec0c8ca

use ignore-docker by default

fc036f5

delete backup

4566584

ignore-docker by default in dev

f103b69

Refactor evaluator function calls to use Fireworks directly for metho…

9c3e417

…d signature introspection, avoiding unnecessary API requests during help invocations.

use in-flight SDK version

ea673f4

Enhance evaluator handling by returning version ID on creation and up…

26fbc2d

…dating polling functions to target specific evaluator versions. Refactor related CLI commands and tests to accommodate these changes, ensuring clearer status messages and improved error handling.

cursor bot reviewed Jan 13, 2026

View reviewed changes

eval_protocol/auth.py Show resolved Hide resolved

dphuang2 added 3 commits January 13, 2026 15:52

update

4702307

use published a22 of fireworks-ai

9d1bc74

uv lock

3314bec

cursor bot reviewed Jan 14, 2026

View reviewed changes

dphuang2 added 2 commits January 13, 2026 16:25

Refactor dotenv handling in auth module and integrate environment var…

66f191a

…iable loading into local test command. Introduced functions to find and retrieve values from .env files, enhancing configuration management for Docker tests.

add create rft launch configuration

165afe1

dphuang2 added 2 commits January 13, 2026 16:30

Refactor dotenv handling in auth module and integrate environment var…

838c7a5

…iable loading into local test command. Introduced functions to find and retrieve values from .env files, enhancing configuration management for Docker tests.

Merge branch 'pass-dot-env-to-docker-container' into dhuang/dxe-478-i…

71599e6

…mplement-evaluator-versions

cursor bot reviewed Jan 14, 2026

View reviewed changes

eval_protocol/cli_commands/upload.py Show resolved Hide resolved

dphuang2 added 2 commits January 14, 2026 10:20

actually not necessary for local test since local-test mounts the wor…

0144c9f

…kspace so it should include the .env file

increase sql retries

c8774a6

cursor bot reviewed Jan 14, 2026

View reviewed changes

eval_protocol/cli_commands/create_rft.py Show resolved Hide resolved

dphuang2 added 7 commits January 14, 2026 12:15

Refactor dotenv loading to use explicit paths in CLI and API modules

2076f0a

- improving environment variable management and preventing conflicts with other .env files.

Merge branch 'main' into dhuang/dxe-478-implement-evaluator-versions

8acdc35

Refactor dotenv loading to use explicit paths in CLI and API modules

432a649

- improving environment variable management and preventing conflicts with other .env files.

Merge branch 'ensure-explicit-dotenv' into dhuang/dxe-478-implement-e…

ab04086

…valuator-versions

"ep create evj"

3c2db59

use SDK for Dataset API calls

17eb18f

cursor bot reviewed Jan 15, 2026

View reviewed changes

eval_protocol/auth.py Show resolved Hide resolved

eval_protocol/cli_commands/create_rft.py Show resolved Hide resolved

dphuang2 added 5 commits January 15, 2026 15:22

handle existing secrets with caution

2f88428

Remove unused _to_pyargs_nodeid function from upload.py to enhanc…

a2165fb

…e code clarity and maintainability.

increase sql retries

1445d75

cursor bot reviewed Jan 15, 2026

View reviewed changes

eval_protocol/cli_commands/create_evj.py Show resolved Hide resolved

dphuang2 added 4 commits January 15, 2026 15:50

make connection more robust

d4a445b

Merge branch 'increase-sql-retries' into dhuang/dxe-478-implement-eva…

b3adfee

…luator-versions

passes

37f4856

cursor bot reviewed Jan 16, 2026

View reviewed changes

dphuang2 closed this Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Evaluator Versions #402

Implement Evaluator Versions #402

Uh oh!

dphuang2 commented Jan 8, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Evaluator Versions #402

Implement Evaluator Versions #402

Uh oh!

Conversation

dphuang2 commented Jan 8, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 14, 2026

Choose a reason for hiding this comment

Tar file not cleaned up when upload fails

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 16, 2026

Choose a reason for hiding this comment

Missing --env-file argument for rft and evj commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dphuang2 commented Jan 8, 2026 •

edited by cursor bot

Loading

Missing `--env-file` argument for `rft` and `evj` commands