Eval Function Python Program Code

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

👋 Welcome to RefineBench — a comprehensive evaluation library for testing refinement capabilities of language models across multiple settings and domains. To reproduce the full results reported in ...

IEEE

AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation

Abstract: Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating ...

GitHub

Python Library for Evaluation

Evaluation allows us to assess how a given model is performing against a set of specific tasks. This is done by running a set of standardized benchmark tests against the model. Running evaluation ...

IEEE

Evaluating Python Static Code Analysis Tools Using FAIR Principles

Abstract: The quality of modern software relies heavily on the effective use of static code analysis tools. To improve their usefulness, these tools should be evaluated using a framework that ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results