#

coding-benchmark

Here is 1 public repository matching this topic...

redush-com / FluxCodeBench

FluxCodeBench — coding benchmark for evaluating LLM agents on multi-phase programming tasks with hidden requirements.

python benchmark machine-learning code-generation evaluation-framework ai-agents llm llm-evaluation agent-evaluation coding-benchmark

Updated Jan 18, 2026
Python

Improve this page

Add a description, image, and links to the coding-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the coding-benchmark topic, visit your repo's landing page and select "manage topics."