👋 Welcome to RefineBench — a comprehensive evaluation library for testing refinement capabilities of language models across multiple settings and domains. To reproduce the full results reported in ...
Abstract: Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating ...
Evaluation allows us to assess how a given model is performing against a set of specific tasks. This is done by running a set of standardized benchmark tests against the model. Running evaluation ...
Abstract: The quality of modern software relies heavily on the effective use of static code analysis tools. To improve their usefulness, these tools should be evaluated using a framework that ...