btdt: "been there, done that"
btdt
is a tool for flexible caching files in CI pipelines.
By being a simple CLI program, it is agnostic to the CI platform and can be integrated into various pipelines.
This tool is still under active development and not feature complete yet. See below for details.
Example: caching node_modules
CACHE_KEY=node-modules-$(btdt hash package-lock.json)
btdt restore --cache path/to/cache --keys $CACHE_KEY node_modules
if [ $? -ne 0 ]; then
npm ci
btdt store --cache path/to/cache --keys $CACHE_KEY node_modules
fi
Examples for specific CI platforms can be found in the documentation (see below).
Documentation
The main user guide and documentation is located at https://jgosmann.github.io/btdt. The API documentation is found on docs.rs
Motivation
I was annoyed that there isn't a good (self-hosted) caching solution for Jenkins and Tekton, similar to the cache for GitHub Actions. Also, it seemed that it shouldn't be that hard to implement a caching solution. So I put my money where my mouth is. In particular, I didn't see any reason why caching should be tied to a specific CI platform by implementing it as a plugin for that platform. To me, it seems, that this problem is solvable with a CLI tool that can be integrated into any pipeline.
Regarding Jenkins, I know of two caching plugins and I have my quarrels with both of them:
- Job Cacher will delete the complete cache once it reaches the maximum size. This is inefficient and I prefer to delete least recently used caches until the limit is met. The plugin also does not share the cache between different build jobs which severely limits its usefulness in certain scenarios. We also had some other constraints that made it impossible to use this plugin, but a CLI tool could have been integrated.
- jenkins-pipeline-cache-plugin requires S3 API compatible storage, which excludes some other use cases. It also doesn't seem to provide a way to limit the cache size.
Regarding Tekton, a few suggestions are made in their blog. But I don't think those are perfect:
- Caching layers in a container registry imposes a dependency order on your cached layers. This might be fine, if invalidating one cache/layer, implies that all subsequent caches/layers are also invalidated. But if you have two orthogonal caches, you must decide for an order, and always have one case where one of the caches might be invalidated needlessly.
- Caching on a persistent disk does not, as far as I understand, allow for multiple caches to be stored without
additional tooling. If you have builds that require different caches, you might end up overwriting caches constantly.
btdt
provides tooling to have multiple separate caches.
State of development
A basic version of btdt
that can be used in some scenarios is working.
I still intend to implement the following features:
- A server for storing caches remotely.
- Compression of the cache (to reduce the amount of data transferred).
- Hashing multiple files in a stable way for the cache key.
- A templating system for cache keys, such that
btdt hash
doesn't need to be called, but a cache key in the form ofcache-key-${hashFiles('**/package-lock.json')}
can be used directly. - Potentially, configuration via environment variables and/or configuration files.
- Potentially, using S3 compatible APIs as storage backend.
Installation
There are multiple ways to install btdt
. Choose the method from below that best fits your needs.
Currently, only Unix (Linux, macOS) systems are supported.
Pre-compiled binaries
You can download pre-compiled binaries from the GitHub Releases page.
The archive contains a single executable binary btdt
.
You might want to place it in your $PATH
for easy access.
Docker images
Docker images are available on Docker Hub.
This allows to directly run btdt
without installing it on your system:
docker run jgosmann/btdt btdt --help
However, you will have to mount the directories with the cache and the files to cache into the container.
This can be done with the --mount
or --volume
option.
The images use Semantic Versioning tags. For example, jgosmann/btdt:0.1
refers to the latest v0.1.x
image.
Build from source using Rust
If you have Rust installed, you can build btdt
from source using cargo
:
cargo install btdt
Getting Started
This guide will show you the general steps of using btdt
to cache files, in particular,
installed dependencies of a package manager such as npm
.
If you are looking to integrate btdt
into a CI pipeline,
you might want to also check out the CI-specific integration guides.
Determining cache keys
Usually, you will have a file that completely specifies the dependencies and their versions of your project.
For example, in the JavaScript/NPM ecosystem, this is the package-lock.json
file.
As long as it doesn't change, the installed dependencies will be the same and could be cached.
Thus, the primary cache key should be based on this file.
We can use the btdt hash
command to generate a cache key from the file:
CACHE_KEY=cache-key-$(btdt hash package-lock.json)
This calculates a cryptographic hash over the file contents and appends it to the string cache-key-
.
The result could look something like cache-key-f3dd7a501dd93486194e752557585a1996846b9a6df16e76f104e81192b0039f
.
If the package-lock.json
file changes, the hash will change as well and the cache key will be different.
Trying to restore the cache
Before we try to install the dependencies, e.g. with npm ci
, we can try to restore the cache instead:
btdt restore --cache path/to/cache --keys $CACHE_KEY node_modules
RESTORE_EXIT_CODE=$?
npm
will install the dependencies into node_modules
, so we are using this as the target directory.
Furthermore, we will store the exit code because it comes in handy in the next step. It will be 0
if the cache was
restored successfully from the first given key, and non-zero otherwise. (Use the --success-rc-on-any-key
flag to
return a zero exit code no matter the key that was used to restore the cache.)
Installing dependencies and storing the cache
If the cache could not be restored, we will install the dependencies with npm ci
, and then store the installed
dependencies in the cache:
if [ $RESTORE_EXIT_CODE -ne 0 ]; then
npm ci # Install dependencies
btdt store --cache path/to/cache --keys $CACHE_KEY node_modules
fi
Using multiple cache keys
You can specify multiple cache keys. This allows to have a fallback mechanism. The cache keys will be tried in order during the restore operation and allow you to use a cache which might not contain the exact dependencies required, but could still speed up the installation if most of them are contained.
With npm
the usage of multiple cache keys could look like this:
CACHE_KEY=cache-key-$(btdt hash package-lock.json)
btdt restore --cache path/to/cache --keys "$CACHE_KEY,fallback" node_modules
RESTORE_EXIT_CODE=$?
npm ci
if [ $RESTORE_EXIT_CODE -ne 0 ]; then
btdt store --cache path/to/cache --keys $CACHE_KEY,fallback node_modules
fi
This will store the latest cached dependencies also under the key fallback
. This cache entry will be used, if no more
specific cache enry is found.
Cleanup
To prevent the cache from growing indefinitely, you might want to clean up old cache entries from time to time, for example to only keep cache entries accessed within the last seven days and limit the cache size to at most 10 GiB:
btdt clean --cache path/to/cache --max-age 7d --max-size 10GiB
Overview
While the Getting Started guide provides a high-level overview of how to use btdt
,
this section contains guides for specific CI systems:
If you have experience with a CI system that is not covered here, please consider contributing a guide.
Tekton
This guide explains how to use btdt
in a Tekton pipeline.
It will use the Docker images, so that no changes to the images of your tasks are necessary.
Of course, you could also install btdt
within the respective task images which might simplify the integration a bit.
Provide a Persistent Volume Claim as workspace to the pipeline run
To use btdt
in a Tekton pipeline, you need to provide a Persistent Volume Claim (PVC) for the cache.
This PVC should be provided as actual persistentVolumeClaim
in the PipelineRun
, not volumeClaimTemplate
.
Otherwise, you will have a fresh volume on each pipeline run, making the cache useless.
An example PipelineRun
could look like this:
# PipelineRun template, e.g. as part of your trigger
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: my-pipeline-run-$(uid)
spec:
params:
# ...
pipelineRef:
name: my-tekton-pipeline
workspaces:
- name: cache
persistentVolumeClaim:
claimName: my-tekton-cache
With the default Tekton settings (at time of writing), only a single PVC can be mounted into a task. Thus, if you are already using a PVC for you task (likely to check out your source code repository), you will have to also store the cache on this PVC.
Alternatively, you can disable the affinity assistant to be
able to mount multiple PVCs into a task. Run kubectl edit configmap feature-flags
to edit the configuration.
In the following, we assume this second setup. If you are using a single PVC, you will have to adjust the paths
accordingly.
Provide the cache workspace to the task
To be able to use the cache in a task, the cache workspace needs to be provided:
# pipeline.yaml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: my-tekton-pipeline
spec:
workspaces:
- name: cache
tasks:
- name: run-tests
taskRef:
name: run-tests
kind: Task
workspaces:
- name: git-sources
workspace: git-sources
- name: cache
workspace: cache
Use the cache in a task
You must declare the cache workspace in the task, so that it can be used by the individual steps:
# task_run-tests.yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: run-tests
spec:
steps:
# ...
workspaces:
- name: git-sources
description: Provides the workspace with the cloned repository.
- name: cache
description: Provides the btdt cache.
Restore the cache
Now you can add a step to restore the cache at the beginning of the task.
Here, we try to restore a node_modules
directory:
# task_run-tests.yaml
spec:
# ...
steps:
- name: restore-cache
image: jgosmann/btdt:0.1
workingDir: $(workspaces.cache.path)
onError: continue
script: |
#!/bin/sh
CACHE_KEY=node-modules-$(btdt hash package-lock.json)
echo "Cache key: $CACHE_KEY"
btdt restore --cache $(workspaces.cache.path) --keys $CACHE_KEY node_modules
Install dependencies only on cache miss
Depending on what you are caching, you might want to run some commands only a cache miss to generate the files that would be cached. For example, to install NPM dependencies only if the cache could not be restored:
# task_run-tests.yaml
spec:
# ...
steps:
# try restore
- name: run-tests
image: node
workingDir: $(workspaces.git-sources.path)
script: |
#!/bin/sh
if [ $(cat $(steps.step-restore-cache.exitCode.path)) -eq 0 ]; then
echo "Cache restore succeeded, skipping npm ci"
else
npm ci
fi
# run tests, build, etc.
Note, if you are using fallback keys, you would always want to run
npm ci
to ensure that the dependencies are installed correctly.
Store the cache
For the cache to provide a benefit, we need to fill it if a cache miss occurred. This requires an additional step
after the files to cache have been generated (e.g. by running npm ci
):
# task_run-tests.yaml
spec:
# ...
steps:
# try restore
# install dependencies/generate files to cache
- name: store-cache
image: jgosmann/btdt:0.1
workingDir: $(workspaces.git-sources.path)
script: |
#!/bin/sh
if [ $(cat $(steps.step-restore-cache.exitCode.path)) -eq 0 ]; then
echo "Cache restore succeeded, skipping cache store"
exit 0
fi
CACHE_KEY=node-modules-$(btdt hash package-lock.json)
echo "Cache key: $CACHE_KEY"
btdt store --cache $(workspaces.cache.path) --keys $CACHE_KEY node_modules
Example of complete task
When putting all of this together, your task definition will look something like this:
# task_run-tests.yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: run-tests
spec:
steps:
- name: restore-cache
image: jgosmann/btdt:0.1
workingDir: $(workspaces.git-sources.path)
onError: continue
script: |
#!/bin/sh
CACHE_KEY=node-modules-$(btdt hash package-lock.json)
echo "Cache key: $CACHE_KEY"
btdt restore --cache $(workspaces.cache.path) --keys $CACHE_KEY node_modules
- name: run-tests
image: node
workingDir: $(workspaces.git-sources.path)
script: |
#!/bin/sh
if [ $(cat $(steps.step-restore-cache.exitCode.path)) -eq 0 ]; then
echo "Cache restore succeeded, skipping npm ci"
else
npm ci
fi
# run tests etc.
- name: store-cache
image: jgosmann/btdt
workingDir: $(workspaces.git-sources.path)
script: |
#!/bin/sh
if [ $(cat $(steps.step-restore-cache.exitCode.path)) -eq 0 ]; then
echo "Cache restore succeeded, skipping cache store"
exit 0
fi
CACHE_KEY=node-modules-$(btdt hash package-lock.json)
echo "Cache key: $CACHE_KEY"
btdt store --cache $(workspaces.cache.path) --keys $CACHE_KEY node_modules
workspaces:
- name: git-sources
description: Provides the workspace with the cloned repository.
- name: cache
description: Provides btdt cache.
Cleanup
To prevent the cache from growing indefinitely, you should configure a regular cleanup:
Clean task
# task_cache-clean.yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: cache-clean
spec:
steps:
- name: cache-clean
image: jgosmann/btdt:0.1
script: |
#!/bin/sh
btdt clean --cache $(workspaces.cache.path) --max-age 7d --max-size 10GiB
workspaces:
- name: cache
description: Provides the btdt cache.
Clean pipeline
# pipeline_cache-clean.yaml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: cache-clean-pipeline
spec:
workspaces:
- name: cache
params:
- name: runid
type: string
tasks:
- name: cache-clean
taskRef:
name: cache-clean
kind: Task
workspaces:
- name: cache
workspace: cache
Cron trigger
# trigger_cache-clean.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: cache-clean-schedule
spec:
schedule: '@hourly'
jobTemplate:
spec:
template:
spec:
containers:
- name: cache-clean-trigger
image: curlimages/curl
command: [ '/bin/sh', '-c' ]
args: [ "curl --header \"Content-Type: application/json\" --data '{}' el-cache-clean-listener.default.svc.cluster.local:8080" ]
restartPolicy: Never
---
apiVersion: triggers.tekton.dev/v1alpha1
kind: EventListener
metadata:
name: cache-clean-listener
spec:
triggers:
- name: cache-clean-trigger
interceptors: [ ]
template:
spec:
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: cache-clean-$(uid)
spec:
pipelineRef:
name: cache-clean-pipeline
params:
- name: runid
value: $(uid)
workspaces:
- name: cache
persistentVolumeClaim:
claimName: my-tekton-cache