ABC Project (AI Benchmark Cluster)

[![Pipeline Status](https://gitlab.com/ai9804501/abc/badges/main/pipeline.svg)](https://gitlab.com/ai9804501/abc/-/pipelines)

Views1
PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

ABC Project (AI Benchmark Cluster)

Pipeline Status Coverage Version Python License: MIT

Overview

ABC (AI Benchmark Cluster) is an advanced LLM benchmarking platform that evaluates AI models against human educational standards. The system provides comprehensive testing across multiple subjects and educational levels, from elementary school to PhD, using Ollama for model execution.

Key Features

  • Educational Level Benchmarking: Compare LLM performance against:

    • 5th Grade Level
    • High School Level
    • Masters Level
    • PhD Level
  • Subject Areas:

    • Mathematics
    • Computer Science
    • Problem Solving
    • General Reasoning
    • Grammar
    • Creative Writing
  • Automated Documentation: Self-generating performance reports and analysis through GitLab CI/CD pipelines

  • Pass/Fail Grading: Objective evaluation criteria for each educational level

Directory Structure

abc/
├── docs/          # Documentation and benchmark results
│   ├── results/      # Auto-generated benchmark results
│   ├── analysis/     # Performance analysis reports
│   └── comparisons/  # Educational level comparisons
├── src/          # Source code
│   ├── analysis/     # Analysis and metrics
│   ├── benchmarking/ # Core benchmarking system
│   ├── costs/        # Resource usage tracking
│   ├── database/     # Results storage
│   ├── pipeline/     # CI/CD pipeline integration
│   ├── runner/       # Ollama model runners
│   └── testing/      # Test suites by subject
├── tests/        # Test framework
└── templates/    # Report templates

Requirements

Development Environment

  • WSL (Windows Subsystem for Linux)
  • Python 3.12 or higher with pyenv and uv
  • Ollama
  • Docker & Docker Compose
  • GitLab Runner (for CI/CD)
  • glab CLI tool
  • kubectl and helm for Kubernetes deployments

Environment Validation

Run the environment check script to verify your setup:

./scripts/check_dev.sh

This script will validate the installation of all required tools and provide installation instructions for any missing components.

The recommended way to run ABC is using Docker Compose, which ensures consistent environment and dependencies across all platforms.

Installation

  1. Clone the repository:
git clone https://gitlab.com/ai9804501/abc.git
cd abc
  1. Build and start services:
docker-compose up -d

Manual Installation (Alternative)

  1. Clone the repository:
git clone https://gitlab.com/ai9804501/abc.git
cd abc
  1. Install Ollama:
curl https://ollama.ai/install.sh | sh
  1. Ensure Python 3.12 is installed:
python3 --version  # Should output Python 3.12.x
  1. Install uv:
pip install uv
  1. Create virtual environment and install dependencies:
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

Running Benchmarks

  1. Run benchmarks:
docker-compose exec app python -m src.pipeline.cli run-benchmarks

Manual Method

  1. Start Ollama service:
ollama serve
  1. Pull required models:
ollama pull llama2
# Add other models as needed
  1. Run benchmarks:
python -m src.pipeline.cli run-benchmarks

Benchmark Reports

Reports are automatically generated in the GitLab CI pipeline and can be found in:

  • Pipeline artifacts under docs/results/
  • Project wiki (auto-updated)
  • Generated site at pages/benchmarks/

Sample Report Structure

  • Overall Performance Summary
  • Educational Level Comparisons
  • Subject-Specific Analysis
  • Pass/Fail Statistics
  • Resource Usage Metrics

Contributing

  1. Create a new branch:
git checkout -b feature/your-feature-name
  1. Run tests:
pytest
  1. Submit merge request

DevOps Setup

CI/CD Pipeline

The project uses GitLab CI/CD with the following stages:

  1. Setup: Prepares the Python environment
  2. Test: Runs unit and integration tests
  3. Benchmark: Executes model benchmarks
  4. Analyze: Processes benchmark results
  5. Document: Generates documentation and updates wiki
  6. Build: Creates Docker images
  7. Deploy: Deploys to Kubernetes environments
  8. Cleanup: Manages environment resources

Kubernetes Deployment

The application can be deployed to Kubernetes using Helm:

  1. Configure kubectl context:
kubectl config use-context your-cluster-context
  1. Deploy to staging:
# Kubernetes deployment configuration has been removed
# Please refer to Docker Compose for deployment

Note: Kubernetes deployment configuration has been removed from this project. Please use Docker Compose for deployment as described above.

GitLab Configuration

Required GitLab CI/CD variables:

  • KUBE_CONFIG: Base64 encoded kubeconfig file
  • CI_REGISTRY_USER: GitLab registry username
  • CI_REGISTRY_PASSWORD: GitLab registry password
  • GITLAB_TOKEN: Token for wiki updates

Pre-commit Hooks

Install pre-commit hooks to ensure code quality:

uv pip install pre-commit
pre-commit install

This will run linters and formatters before each commit.

License

MIT License - see LICENSE file for details

Share: