GKE-ENKI-GitLab-agent

GKE-ENKI-GitLab-agent is a project to manage and configure GitLab Kubernetes Agent for the ENKI Google Cloud cluster and to configure all required cluster tools for production, monitoring, and backup.

Views1
PublishedJan 14, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

GKE-ENKI-GitLab-agent

GKE-ENKI-GitLab-agent is a project to manage and configure GitLab Kubernetes Agent for the ENKI Google Cloud cluster and to configure all required cluster tools for production, monitoring, and backup.

Contents

▪️ Future roadmap
▪️ Installed cluster components (dependent order)
▪️ CI/CD for automated deployment and maintenance
▪️ Some useful kubectl commands
▪️ Kubernetes cluster configuration
▪️ GitLab Kubernetes agent installation
▪️ Tearing down and reinstalling the agent

Future roadmap

  • Fully integrate GitLab Kubernetes Agent for GitOps as an alternative to using GitLab Runner and Helm.

    The agent has to mature to handle sequenced YAML deploys, and the agent must operate with clusterwide admin privileges to make this integration possible.

  • Consider adding the following:

    • User billing and tracking (using Kubecost)
    • Runbooks for JupyterLab (notebook-based) GitOps using Rubix/Nurtch
    • Cloudwatch integration
    • Elastic Container Service
  • Investigate the Google Cloud Run serverless platform.

    Port knative Geobarometer and MELTS web services to remove any dependence on the Kubernetes cluster.

Installed cluster components (dependent order)

  1. GitLab Kubernetes Agent

    Entity that attaches a GKE cluster to this repository (configuration notes below).

  2. GitLab Runner

    Gitlab Runner allows CI jobs to run on the cluster in privileged mode, which allows us to execute kubectl and helm commands to perform GitOps tasks using YAML files stored in this repository. Basically, the runner gives us the functionality of Google Cloud Shell or a desktop connection of gcloud/kubectl using GitLab CI.

  3. Kubernetes NGINX Ingress Controller

    The ingress controller is utilized to expose endpoints of services to external ports. There are multiple ingress controllers operating on the cluster. This one is used to expose Grafana and Kasten K10 endpoints. Another is built into JupyterHub to expose that endpoint.

  4. Cert Manager

    Used by the ingress controller to acquire and attach TLS certificates to ingress external endpoints so that ports can support https and encrypted traffic.

  5. Prometheus and Grafana (exposed at https://cluster.enki-portal.org/)

    The Kube Prometheus stack (with Grafana) monitors the cluster and exposes metrics at an external endpoint so that cluster performance can be assessed.

  6. Google Cloud Storage

    Storage independent of the Kubernetes cluster that is utilized for backups of cluster resources. The backup service (Kasten K10) is capable of restoring and migrating the cluster using this independent storage.

  7. Kasten K10 (exposed at https://k10.enki-portal.org/k10/)

    Backup, restoration, and migration tool for Kubernetes

  8. JupyterHub

    Service that hosts the ENKI server. JupyterHub exposes single-user pods that host the ThermoEngine Docker container image with a JupyterLab user interface. It also allocates and maintains access to user-based persistent storage.

    1. Testing installation

      This installation is for testing options and configuring possible upgrades to the production server. For cost reasons, it is normally not running.

    2. Production installation

      This installation is the production server exposed at https://server.enki-portal.org/ .

  9. Knative web services

    Service to expose stateless, scalable web services. These services should probably be moved outside the cluster and exposed using the Google Cloud Run serverless platform. See Future RoadMap above.

  10. MySQL (exposed as http://mysql.enki-portal.org:3306/ )

    Database server that currently holds the LEPR/TraceDs as well as some smaller databases (Stixrude, Berman, Inforex, etc.) that are used by cluster apps.

CI/CD for automated deployment and maintenance

The .gitlab-ci.yml YAML file performs a number of functions:

  • Deploys manifests using GitLab Kubernetes Agent to perform GitOps tasks
  • Runs helm and kubectl jobs on the cluster to perform GitOps tasks
  • Functions as the downstream pipeline for related projects that generate content related to the cluster (See the GitLab project https://gitlab.com/ENKI-portal/jupyterhub_custom)

Some useful kubectl commands

  • Commands for managing namespaces and their resources:
    kubectl create ns gitlab-runner
    kubectl delete all --all -n {namespace}
    
  • Get GitLab usernames associated with persistent storage volumes:
    kubectl --namespace jhub describe persistentvolumeclaims | grep "hub.jupyter.org/username"
    
  • Restart hub on cluster using Google Cloud Shell in order to update ENKI-portal/jupyterhub_custom to amend login page:
    helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --version=1.1.3 --namespace jhub --reuse-values
    

Kubernetes cluster configuration

The following Google Cloud setup instructions are from the Zero to JupyterHub document https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/google/step-zero-gcp.html, as found in October 2021.

  1. Using Google Cloud Shell, install kubectl and helm using gcloud after enabling the Kubernetes Engine API.
  2. Create a managed kubernetes cluster with a default node pool:
    gcloud container clusters create \
      --machine-type n1-standard-2 \
      --enable-autoscaling \
      --max-nodes=6 \
      --min-nodes=2 \
      --zone <compute zone from the list linked below> \
      --cluster-version latest \
      <CLUSTERNAME>
    
    • <CLUSTERNAME> is enkiserver
    • <compute zone from the list linked below> is us-west1-a
  3. Elevate the user Google Cloud account for administrative functions:
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user=&#x3C;GOOGLE-EMAIL-ACCOUNT>
    
    • <GOOGLE-EMAIL-ACCOUNT> is email address of Google Cloud account owner
  4. Create a node pool for users:
    gcloud beta container node-pools create user-pool \
      --machine-type n1-standard-2 \
      --num-nodes 0 \
      --enable-autoscaling \
      --min-nodes 0 \
      --max-nodes 6 \
      --node-labels hub.jupyter.org/node-purpose=user \
      --node-taints hub.jupyter.org_dedicated=user:NoSchedule \
      --zone us-central1-b \
      --cluster &#x3C;CLUSTERNAME>
    

After you complete these steps, two node pools are up and running. The default node pool is used to run cluster-wide apps, while the tainted user node pool is used to launch nodes for single-user Jupyter pods. Six nodes in the user pool should be able to accommodate about 100 users doing small-scale ENKI-related modeling.

GitLab Kubernetes Agent installation

The following instructions are from the GitLab document https://docs.gitlab.com/ee/user/clusters/agent/#set-up-the-kubernetes-agent-server, as found in October 2021.

  1. Create a config.yaml file in the repository at .gitlab/agents/primary-agent with the contents:

    gitops:
      manifest_projects:
      - id: "enki-portal/gke-enki-gitlab-agent"
        paths:
        - glob: 'generated-manifests/**/*.{yaml,yml,json}'
        inventory_policy: adopt_if_no_inventory
    
    • The ID is the repository name that contains the manifest files (this repository).
    • The glob is altered from the default suggestion to look only at YAML files in the folder and subfolders of generated-manifests.
    • The inventory_policy is changed from the default suggestion to allow the agent to inherit the management of applications that are already running on the cluster when their YAML manifests are added to the generated-manifests file hierarchy.

    Multiple manifest projects can be defined; future plans will allow these to be private repositories.

    Currently the agent repository must be public; future plans will allow the agent to be associated with a group.

  2. Create the agent in GitLab (Infrastructure > Kubernetes clusters) and generate a secret token. Assign this token to a pipeline environment variable (Settings > CI/CD > Variables) with the name GITLAB_AGENT_TOKEN. Make sure that the value is protected and masked in order to keep it hidden in pipeline logs.

  3. In Google Cloud Shell, execute the following to create a namespace for the agent:

    kubectl create ns gitlab-kubernetes-agent
    

    Then install the agent, with the appropriate token value substituted for $(GITLAB_AGENT_TOKEN):

    docker run --pull=always --rm \
        registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/cli:stable generate \
        --agent-token=$(GITLAB_AGENT_TOKEN) \
        --kas-address=wss://kas.gitlab.com \
        --agent-version stable \
        --namespace gitlab-kubernetes-agent | kubectl apply -f -
    
  4. Upgrade the GitLab agent service account to have a cluster-admin role (so that it can create secrets, pods, config maps, etc. in arbitrary cluster namespaces) by executing first in Google Cloud Shell:

    kubectl get rolebindings,clusterrolebindings --all-namespaces  \
        -o custom-columns='KIND:kind,NAMESPACE:metadata.namespace,NAME:metadata.name,SERVICE_ACCOUNTS:subjects[?(@.kind=="ServiceAccount")].name' | grep gitlab-agent
    

    Note that this critical step is missing from the GitLab documentation. The command gives the output:

    ClusterRoleBinding   &#x3C;none>          cilium-alert-read                                      gitlab-agent
    ClusterRoleBinding   &#x3C;none>          gitlab-agent-gitops-read-all                           gitlab-agent
    ClusterRoleBinding   &#x3C;none>          gitlab-agent-gitops-write-all                          gitlab-agent
    ClusterRoleBinding   &#x3C;none>          gitlab-agent-read-binding                              gitlab-agent
    ClusterRoleBinding   &#x3C;none>          gitlab-agent-write-binding                             gitlab-agent
    
  5. Apply the binding with the command:

    kubectl create clusterrolebinding gitlab-agent-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:gitlab-agent
    kubectl get clusterrolebinding | grep gitlab-agent
    

    The command gives output such as the following:

    gitlab-agent-cluster-admin-binding                     ClusterRole/cluster-admin                                          12s
    gitlab-agent-gitops-read-all                           ClusterRole/gitlab-agent-gitops-read-all                           162d
    gitlab-agent-gitops-write-all                          ClusterRole/gitlab-agent-gitops-write-all                          162d
    gitlab-agent-read-binding                              ClusterRole/gitlab-agent-read                                      162d
    gitlab-agent-write-binding                             ClusterRole/gitlab-agent-write                                     162d
    

The agent is now installed.

Tearing down and reinstalling the agent

This process is tricky and not automated by GitLab. Occasionally, reinstalling the agent is necessary, as the agent does not tolerate errors in YAML manifests very well and can enter a condition in which it is unresponsive.

Follow this procedure in Google Cloud Shell:

  1. Delete all resources associated with the agent in its namespace:
    kubectl delete all --all -n gitlab-kubernetes-agent
    
  2. Delete the namespace:
    kubectl delete ns gitlab-kubernetes-agent
    
  3. Delete the inventory file in the default namespace that the agent uses to track managed installations (This resource is not automatically removed with the agent's namespace resources):
    1. Go to the Google Cloud Platform, and choose Kubernetes Engine > Configuration from the upper left menu.
    2. In the default namespace, delete the Config Map named inventory-nnn, where nnn is a string of numbers and dashes.
    3. In the default namespace, delete the secret gitlab-agent-token-nnn, where nnn is some arbitrary hexadecimal number.
  4. Reinstall the agent following the above instructions, utilizing the same authorization token.

Prompt Playground

1 Variable

Fill Variables

Preview

# GKE-ENKI-GitLab-agent

GKE-ENKI-GitLab-agent is a project to manage and configure GitLab Kubernetes Agent for the ENKI Google Cloud cluster and to configure all required cluster tools for production, monitoring, and backup.


## Contents
▪️ [Future roadmap](#future-roadmap)  
▪️ [Installed cluster components (dependent order)](#installed-cluster-components-dependent-order)  
▪️ [CI/CD for automated deployment and maintenance](#cicd-for-automated-deployment-and-maintenance)  
▪️ [Some useful kubectl commands](#some-useful-kubectl-commands)  
▪️ [Kubernetes cluster configuration](#kubernetes-cluster-configuration)  
▪️ [GitLab Kubernetes agent installation](#gitlab-kubernetes-agent-installation)  
▪️ [Tearing down and reinstalling the agent](#tearing-down-and-reinstalling-the-agent)  


## Future roadmap 
- Fully integrate GitLab Kubernetes Agent for GitOps as an alternative to using GitLab Runner and Helm.  
  > The agent has to mature to handle sequenced YAML deploys, and the agent must operate with clusterwide admin privileges to make this integration possible.
- Consider adding the following:
    - User billing and tracking (using Kubecost)
    - Runbooks for JupyterLab (notebook-based) GitOps using Rubix/Nurtch 
    - Cloudwatch integration
    - Elastic Container Service
    
- Investigate the Google Cloud Run serverless platform.
  > Port knative Geobarometer and MELTS web services to remove any dependence on the Kubernetes cluster. 

## Installed cluster components (dependent order)

1. **GitLab Kubernetes Agent**
    > Entity that attaches a GKE cluster to this repository (configuration notes below).
1. **GitLab Runner**
    > Gitlab Runner allows CI jobs to run on the cluster in privileged mode, which allows us to execute *kubectl* and *helm* commands to perform GitOps tasks using YAML files stored in this repository.  Basically, the runner gives us the functionality of Google Cloud Shell or a desktop connection of gcloud/kubectl using GitLab CI. 
1. **Kubernetes NGINX Ingress Controller**
    > The ingress controller is utilized to expose endpoints of services to external ports.  There are multiple ingress controllers operating on the cluster. This one is used to expose Grafana and Kasten K10 endpoints. Another is built into JupyterHub to expose that endpoint.
1. **Cert Manager**
    > Used by the ingress controller to acquire and attach TLS certificates to ingress external endpoints so that ports can support *https* and encrypted traffic.
1. **Prometheus** and **Grafana** (exposed at https://cluster.enki-portal.org/)
    > The Kube Prometheus stack (with Grafana) monitors the cluster and exposes metrics at an external endpoint so that cluster performance can be assessed.
1. **Google Cloud Storage**
    > Storage independent of the Kubernetes cluster that is utilized for backups of cluster resources.  The backup service (Kasten K10) is capable of restoring and migrating the cluster using this independent storage.
1. **Kasten K10** (exposed at https://k10.enki-portal.org/k10/)
    > Backup, restoration, and migration tool for Kubernetes
1. **JupyterHub**
    > Service that hosts the ENKI server. JupyterHub exposes single-user pods that host the ThermoEngine Docker container image with a JupyterLab user interface. It also allocates and maintains access to user-based persistent storage.
    1. Testing installation
        > This installation is for testing options and configuring possible upgrades to the production server. For cost reasons, it is normally not running.
    1. Production installation
        > This installation is the production server exposed at https://server.enki-portal.org/ .
1. **Knative** web services
    > Service to expose stateless, scalable web services.  These services should probably be moved outside the cluster and exposed using the Google Cloud Run serverless platform. See *Future RoadMap* above.
1. **MySQL** (exposed as http://mysql.enki-portal.org:3306/ )
    > Database server that currently holds the LEPR/TraceDs as well as some smaller databases (Stixrude, Berman, Inforex, etc.) that are used by cluster apps.

## CI/CD for automated deployment and maintenance
The *.gitlab-ci.yml* YAML file performs a number of functions:
- Deploys manifests using GitLab Kubernetes Agent to perform GitOps tasks
- Runs *helm* and *kubectl* jobs on the cluster to perform GitOps tasks
- Functions as the downstream pipeline for related projects that generate content related to the cluster (See the GitLab project https://gitlab.com/ENKI-portal/jupyterhub_custom)

## Some useful kubectl commands
- Commands for managing namespaces and their resources:
    ```
    kubectl create ns gitlab-runner
    kubectl delete all --all -n {namespace}
    ``` 
- Get GitLab usernames associated with persistent storage volumes:
    ```
    kubectl --namespace jhub describe persistentvolumeclaims | grep "hub.jupyter.org/username"
    ```
- Restart hub on cluster using Google Cloud Shell in order to update ENKI-portal/jupyterhub_custom to amend login page:
    ```
    helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --version=1.1.3 --namespace jhub --reuse-values
    ```

## Kubernetes cluster configuration 
The following Google Cloud setup instructions are from the **Zero to JupyterHub** document https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/google/step-zero-gcp.html, as found in October 2021.
1. Using Google Cloud Shell, install **kubectl** and **helm** using **gcloud** after enabling the Kubernetes Engine API.
1. Create a managed kubernetes cluster with a default node pool:
    ```
    gcloud container clusters create \
      --machine-type n1-standard-2 \
      --enable-autoscaling \
      --max-nodes=6 \
      --min-nodes=2 \
      --zone <compute zone from the list linked below> \
      --cluster-version latest \
      <CLUSTERNAME>
    ```
    - *\<CLUSTERNAME\>* is **enkiserver**
    - *\<compute zone from the list linked below\>* is **us-west1-a**
1. Elevate the user Google Cloud account for administrative functions:
    ```
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user=<GOOGLE-EMAIL-ACCOUNT>
    ```
    - *\<GOOGLE-EMAIL-ACCOUNT\>* is *email address* of Google Cloud account owner
1. Create a node pool for users:
    ```
    gcloud beta container node-pools create user-pool \
      --machine-type n1-standard-2 \
      --num-nodes 0 \
      --enable-autoscaling \
      --min-nodes 0 \
      --max-nodes 6 \
      --node-labels hub.jupyter.org/node-purpose=user \
      --node-taints hub.jupyter.org_dedicated=user:NoSchedule \
      --zone us-central1-b \
      --cluster <CLUSTERNAME>
    ```
After you complete these steps, two node pools are up and running. The default node pool is used to run cluster-wide apps, while the tainted user node pool is used to launch nodes for single-user Jupyter pods. Six nodes in the user pool should be able to accommodate about 100 users doing small-scale ENKI-related modeling.

## GitLab Kubernetes Agent installation
The following instructions are from the GitLab document https://docs.gitlab.com/ee/user/clusters/agent/#set-up-the-kubernetes-agent-server, as found in October 2021.
1. Create a config.yaml file in the repository at *.gitlab/agents/primary-agent* with the contents:
    ```
    gitops:
      manifest_projects:
      - id: "enki-portal/gke-enki-gitlab-agent"
        paths:
        - glob: 'generated-manifests/**/*.{yaml,yml,json}'
        inventory_policy: adopt_if_no_inventory
    ```
    - The *ID* is the repository name that contains the manifest files (this repository).
    - The *glob* is altered from the default suggestion to look only at YAML files in the folder and subfolders of *generated-manifests*.
    - The *inventory_policy* is changed from the default suggestion to allow the agent to inherit the management of applications that are already running on the cluster when their YAML manifests are added to the *generated-manifests* file hierarchy.      
    
    Multiple manifest projects can be defined; future plans will allow these to be *private* repositories.  
    
    Currently the agent repository must be public; future plans will allow the agent to be associated with a *group*.
1. Create the agent in GitLab (*Infrastructure > Kubernetes clusters*) and generate a *secret token*. Assign this token to a pipeline environment variable (*Settings* > *CI/CD* > *Variables*) with the name *GITLAB_AGENT_TOKEN*. Make sure that the value is *protected* and *masked* in order to keep it hidden in pipeline logs.
1. In Google Cloud Shell, execute the following to create a namespace for the agent:
    ```
    kubectl create ns gitlab-kubernetes-agent
    ```
    Then install the agent, with the appropriate token value substituted for *$(GITLAB_AGENT_TOKEN)*:
    ```
    docker run --pull=always --rm \
        registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/cli:stable generate \
        --agent-token=$(GITLAB_AGENT_TOKEN) \
        --kas-address=wss://kas.gitlab.com \
        --agent-version stable \
        --namespace gitlab-kubernetes-agent | kubectl apply -f -
    ```
    
1. Upgrade the GitLab agent service account to have a cluster-admin role (so that it can create *secrets*, *pods*, *config maps*, etc. in arbitrary cluster *namespaces*) by executing first in Google Cloud Shell:
    ```
    kubectl get rolebindings,clusterrolebindings --all-namespaces  \
        -o custom-columns='KIND:kind,NAMESPACE:metadata.namespace,NAME:metadata.name,SERVICE_ACCOUNTS:subjects[?(@.kind=="ServiceAccount")].name' | grep gitlab-agent
    ```
    Note that this critical step is missing from the GitLab documentation. The command gives the output:
    ```
    ClusterRoleBinding   <none>          cilium-alert-read                                      gitlab-agent
    ClusterRoleBinding   <none>          gitlab-agent-gitops-read-all                           gitlab-agent
    ClusterRoleBinding   <none>          gitlab-agent-gitops-write-all                          gitlab-agent
    ClusterRoleBinding   <none>          gitlab-agent-read-binding                              gitlab-agent
    ClusterRoleBinding   <none>          gitlab-agent-write-binding                             gitlab-agent
    ```
1. Apply the binding with the command:
    ```
    kubectl create clusterrolebinding gitlab-agent-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:gitlab-agent
    kubectl get clusterrolebinding | grep gitlab-agent
    ```
    The command gives output such as the following:
    ```
    gitlab-agent-cluster-admin-binding                     ClusterRole/cluster-admin                                          12s
    gitlab-agent-gitops-read-all                           ClusterRole/gitlab-agent-gitops-read-all                           162d
    gitlab-agent-gitops-write-all                          ClusterRole/gitlab-agent-gitops-write-all                          162d
    gitlab-agent-read-binding                              ClusterRole/gitlab-agent-read                                      162d
    gitlab-agent-write-binding                             ClusterRole/gitlab-agent-write                                     162d
    ```
The agent is now installed.

## Tearing down and reinstalling the agent
This process is tricky and not automated by GitLab. Occasionally, reinstalling the agent is necessary, as the agent does not tolerate errors in YAML manifests very well and can enter a condition in which it is unresponsive.  

Follow this procedure in Google Cloud Shell:
1. Delete all resources associated with the agent in its namespace:
    ```
    kubectl delete all --all -n gitlab-kubernetes-agent
    ```
1. Delete the namespace:
    ```
    kubectl delete ns gitlab-kubernetes-agent
    ```
1. Delete the inventory file in the default namespace that the agent uses to track managed installations (This resource is not automatically removed with the agent's namespace resources):
    1. Go to the Google Cloud Platform, and choose *Kubernetes Engine* > *Configuration* from the upper left menu.
    1. In the *default* namespace, delete the *Config Map* named *inventory-nnn*, where *nnn* is a string of numbers and dashes.
    1. In the *default* namespace, delete the *secret* *gitlab-agent-token-nnn*, where *nnn* is some arbitrary hexadecimal number.
1. Reinstall the agent following the above instructions, utilizing the same authorization token. 
Share: