We open-sourced our framework for large language model evaluation, which provides facilities for prompt engineering, tool usage, multi-turn dialogue, and model-graded evaluations.