Centralized Evaluation Framework (CEF)

CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.

PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

Centralized Evaluation Framework (CEF)

CEF is a command-line interface (CLI) framework designed to provide a centralized solution for evaluating AI features at GitLab.

[[TOC]]

Overview

The CEF allows developers to:

Run evaluations on GitLab AI features (such as Duo Chat, Code Suggestions, etc.)
Generate performance metrics and comparison reports
Test and validate prompt engineering changes

Requirements

CEF relies on LangSmith for evaluation tracking and experiment management.

What is LangSmith?

LangSmith is a platform developed by LangChain for debugging, testing, evaluating, and monitoring LLM applications. GitLab uses LangSmith for several critical functions:

Evaluation Management: It provides tools for measuring the quality of AI outputs through custom metrics.
Visualization: LangSmith offers detailed visualizations of evaluation metrics and LLM execution traces, making it easier to understand and optimize AI feature behavior.

Getting Started

Prerequisites

Install mise following these instructions.
Ensure you have Git installed and configured.

Installation

The first time you install mise, you should trust mise.toml to avoid warnings.
From the current working directory, run mise install.
Run make install to install dependencies.

Environment Setup

Make a copy of .env.example to a new file called .env.
```
cp .env.example .env
```
Edit the .env file to include your API keys and tokens:
- LANGCHAIN_API_KEY: For LangSmith integration
- ANTHROPIC_API_KEY: For Claude model access
- GITLAB_PRIVATE_TOKEN: For GitLab API access
- GITLAB_BASE_URL: GitLab instance URL (default: http://localhost:3000)
- And other API keys as needed

Documentation

Comprehensive documentation is available in the doc directory:

Evaluation Scenarios: Instructions for evaluating specific GitLab AI features
Datasets: Guide for creating, managing, and using datasets
Evaluators: Information about evaluation metrics and methods

Evaluation Runner

CEF works in conjunction with the Evaluation Runner. The Evaluation Runner enables developers to run CEF without any local setup, providing a standardized environment for evaluations. This is particularly useful when you:

Want to compare results between different branches
Need a consistent environment for reproducible evaluations
Want to integrate evaluations into your CI/CD workflow
Want to avoid installing and configuring CEF and its dependencies locally

For detailed instructions, see the documentation.

Contributing

We welcome contributions to the Centralized Evaluation Framework! Please see our CONTRIBUTING.md guide for information on how to get started.

Contents

View Original Source

Related Skills

General

PromptBeginner5 minmarkdown

Untitled Skill

193

Jan 12, 2026

General

PromptBeginner5 minmarkdown

Frontend Typescript Linting.mdc

TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...

160

Feb 15, 2026

General

PromptBeginner5 minmarkdown

2. Apply Deepthink Protocol (reason about dependencies

risks

126

Jan 15, 2026