Centralized Evaluation Framework (CEF)

CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.

Views1
PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

Centralized Evaluation Framework (CEF)

CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.

[[TOC]]

Overview

The CEF allows developers to:

  • Run evaluations on GitLab AI features (such as Duo Chat, Code Suggestions, etc.)
  • Generate performance metrics and comparison reports
  • Test and validate prompt engineering changes

Requirements

CEF relies on LangSmith for evaluation tracking and experiment management.

What is LangSmith?

LangSmith is a platform developed by LangChain for debugging, testing, evaluating, and monitoring LLM applications. GitLab uses LangSmith for several critical functions:

  • Evaluation Management: It provides tools for measuring the quality of AI outputs through custom metrics.
  • Visualization: LangSmith offers detailed visualizations of evaluation metrics and LLM execution traces, making it easier to understand and optimize AI feature behavior.

Getting Started

Prerequisites

  1. Install mise following these instructions.
  2. Ensure you have Git installed and configured.

Installation

  1. The first time you install mise, you should trust mise.toml to avoid warnings.
  2. From the current working directory, run mise install.
  3. Run make install to install dependencies.

Environment Setup

  1. Make a copy of .env.example to a new file called .env.

    cp .env.example .env
    
  2. Edit the .env file to include your API keys and tokens:

    • LANGCHAIN_API_KEY: For LangSmith integration
    • ANTHROPIC_API_KEY: For Claude model access
    • GITLAB_PRIVATE_TOKEN: For GitLab API access
    • GITLAB_BASE_URL: GitLab instance URL (default: http://localhost:3000)
    • And other API keys as needed

Documentation

Comprehensive documentation is available in the doc directory:

  • Evaluation Scenarios: Instructions for evaluating specific GitLab AI features
  • Datasets: Guide for creating, managing, and using datasets
  • Evaluators: Information about evaluation metrics and methods

Evaluation Runner

CEF works in conjunction with the Evaluation Runner. The Evaluation Runner enables developers to run CEF without any local setup, providing a standardized environment for evaluations. This is particularly useful when you:

  • Want to compare results between different branches
  • Need a consistent environment for reproducible evaluations
  • Want to integrate evaluations into your CI/CD workflow
  • Want to avoid installing and configuring CEF and its dependencies locally

For detailed instructions, see the documentation.

Contributing

We welcome contributions to the Centralized Evaluation Framework! Please see our CONTRIBUTING.md guide for information on how to get started.

Share: