Centralized Evaluation Framework (CEF)
CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
Centralized Evaluation Framework (CEF)
CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.
[[TOC]]
Overview
The CEF allows developers to:
- Run evaluations on GitLab AI features (such as Duo Chat, Code Suggestions, etc.)
- Generate performance metrics and comparison reports
- Test and validate prompt engineering changes
Requirements
CEF relies on LangSmith for evaluation tracking and experiment management.
What is LangSmith?
LangSmith is a platform developed by LangChain for debugging, testing, evaluating, and monitoring LLM applications. GitLab uses LangSmith for several critical functions:
- Evaluation Management: It provides tools for measuring the quality of AI outputs through custom metrics.
- Visualization: LangSmith offers detailed visualizations of evaluation metrics and LLM execution traces, making it easier to understand and optimize AI feature behavior.
Getting Started
Prerequisites
- Install
misefollowing these instructions. - Ensure you have Git installed and configured.
Installation
- The first time you install
mise, you should trustmise.tomlto avoid warnings. - From the current working directory, run
mise install. - Run
make installto install dependencies.
Environment Setup
-
Make a copy of
.env.exampleto a new file called.env.cp .env.example .env -
Edit the
.envfile to include your API keys and tokens:LANGCHAIN_API_KEY: For LangSmith integrationANTHROPIC_API_KEY: For Claude model accessGITLAB_PRIVATE_TOKEN: For GitLab API accessGITLAB_BASE_URL: GitLab instance URL (default:http://localhost:3000)- And other API keys as needed
Documentation
Comprehensive documentation is available in the doc directory:
- Evaluation Scenarios: Instructions for evaluating specific GitLab AI features
- Datasets: Guide for creating, managing, and using datasets
- Evaluators: Information about evaluation metrics and methods
Evaluation Runner
CEF works in conjunction with the Evaluation Runner. The Evaluation Runner enables developers to run CEF without any local setup, providing a standardized environment for evaluations. This is particularly useful when you:
- Want to compare results between different branches
- Need a consistent environment for reproducible evaluations
- Want to integrate evaluations into your CI/CD workflow
- Want to avoid installing and configuring CEF and its dependencies locally
For detailed instructions, see the documentation.
Contributing
We welcome contributions to the Centralized Evaluation Framework! Please see our CONTRIBUTING.md guide for information on how to get started.
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks