E.g. Debian with Curl, Ubuntu Build etc.
Find a file
j 9b00794f2a
All checks were successful
Build-Publish-GPUBig / build-gpubig (push) Successful in -1m1s
Build-Publish-Multi-Arch / build (linux/arm64) (push) Successful in 14s
Build-Publish-Multi-Arch / build (linux/amd64) (push) Successful in -52s
Build-Publish-Multi-Arch / create-manifest (push) Successful in 7s
Build-Publish-GPU / build-gpu (push) Successful in 5m34s
Bump fastapi to 0.136.3 and require starlette>=1.0.1
2026-05-28 22:42:13 +12:00
.gitea/workflows Split build and push with 3x retry for all gpubig images 2026-05-16 16:38:31 +12:00
Dockerfile.accelerated_base make APT_PROXY conditional on proxy host being resolvable 2026-05-25 23:08:42 +12:00
Dockerfile.clabtree-api-base Bump fastapi to 0.136.3 and require starlette>=1.0.1 2026-05-28 22:42:13 +12:00
Dockerfile.clabtree-faceswap-intel-base feat: add APT proxy support and python3-dev to faceswap base images 2026-04-07 10:13:19 +12:00
Dockerfile.clabtree-faceswap-nvidia-base feat: add APT proxy support and python3-dev to faceswap base images 2026-04-07 10:13:19 +12:00
Dockerfile.clabtree-imagetools-base fix: add peft to imagetools-base for LoRA support 2026-04-06 12:34:49 +12:00
Dockerfile.clabtree-music-pascal Pin diffusers <0.38.0 to fix infer_schema crash on PyTorch 2.5.1 2026-05-05 22:57:57 +12:00
Dockerfile.comfyui-cuda Fix VideoHelperSuite requirements path 2026-05-10 17:17:44 +12:00
Dockerfile.debian-curl ci: add APT_PROXY build arg for apt caching proxy support 2026-04-06 11:53:25 +12:00
Dockerfile.ds-network-sidecar Add ds-network-sidecar image with Tailscale and Cloudflare tunnel 2026-04-18 22:37:14 +12:00
Dockerfile.ik_llama.cpp-server-cuda Update Dockerfile.ik_llama.cpp-server-cuda 2026-05-05 22:39:52 +12:00
Dockerfile.knowledgebase-base knowledgebase-base: remove stale apt proxy file before apt-get 2026-05-10 23:35:18 +12:00
Dockerfile.llama-cpp-server-sycl Add llama.cpp server image with Intel SYCL for Arc GPUs 2026-04-16 12:30:34 +12:00
Dockerfile.nemotron-speech-ada feat: add nemotron-speech-ada Dockerfile for Ada Lovelace GPUs 2026-04-06 14:57:41 +12:00
Dockerfile.nemotron-speech-blackwell feat: split nemotron-speech-blackwell into 3 layered images 2026-04-06 21:51:48 +12:00
Dockerfile.nemotron-speech-blackwell-torch feat: split nemotron-speech-blackwell into 3 layered images 2026-04-06 21:51:48 +12:00
Dockerfile.nemotron-speech-blackwell-vllm feat: split nemotron-speech-blackwell into 3 layered images 2026-04-06 21:51:48 +12:00
Dockerfile.orthrus-base Switch gpubig workflow to gpu64 runner (ripper.home, 64GB), MAX_JOBS=4 2026-05-16 14:54:07 +12:00
Dockerfile.pytorch-gpu Build flash-attn from source when pre-built wheel unavailable 2026-05-03 20:54:34 +12:00
Dockerfile.pytorch-gpu-pascal Update Dockerfile.pytorch-gpu-pascal 2026-05-05 19:05:56 +12:00
Dockerfile.qwen3-asr-base Add gcc/g++ to qwen3-asr-base for vLLM/Triton JIT compilation 2026-04-11 23:35:32 +12:00
entrypoint.ds-network-sidecar.sh Add ds-network-sidecar image with Tailscale and Cloudflare tunnel 2026-04-18 22:37:14 +12:00
package-lock.json add package-lock.json 2026-05-25 15:49:39 +12:00
README.md rebase accelerated_base on Debian trixie for Arrow Lake VA-API support 2026-05-25 22:54:27 +12:00

Generic Docker Images

Reusable Docker base images published to forge.jde.nz/public/.

Requires forgejo runners gpu, gpubig to run the CI.

Available Images

pytorch-gpu

PyTorch + CUDA base image for NVIDIA GPU workloads. Supports all NVIDIA GPUs from GTX 1060 to RTX 5090.

GPU Compute Capabilities: 6.1 (GTX 10xx), 7.5 (RTX 20xx), 8.0 (RTX A6000/A40), 8.6 (RTX 30xx), 8.9 (RTX 40xx), 9.0 (RTX PRO 6000), 10.0 (RTX 50xx/Blackwell)

Includes:

  • CUDA 12.8.1 runtime
  • PyTorch 2.7+ with torchaudio
  • transformers, accelerate, safetensors
  • Flash Attention 2 (compiled for all target architectures)
  • FFmpeg, libsndfile, sox (audio processing)
  • Python 3 venv at /opt/venv

Usage:

FROM forge.jde.nz/public/pytorch-gpu:latest

RUN pip install my-project-specific-packages
COPY my_app/ /app/
CMD ["python", "/app/server.py"]

Architecture: x86_64 only (CUDA)

Size: ~8-10GB

orthrus-base

PyTorch + Flash Attention 2 + transformers 5.8+ base image for Orthrus-Qwen3 models with diffusion-mode speculative decoding.

Includes:

  • CUDA 12.8.1 runtime
  • PyTorch 2.x with CUDA 12.8
  • transformers >= 5.8.0, accelerate >= 1.13.0
  • Flash Attention 2 (compiled for Ampere+ architectures: sm_80, sm_86, sm_89, sm_90, sm_100)
  • FastAPI + uvicorn
  • Python 3 venv at /opt/venv

Usage:

FROM forge.jde.nz/public/orthrus-base:latest
COPY server.py /app/
CMD ["python3", "/app/server.py"]

Architecture: x86_64 only (CUDA, Ampere+ GPUs)

clabtree-imagetools-base

Pre-built base image for clabtree-imagetools. Layers all heavy ML deps on top of pytorch-gpu.

Includes: torchvision, transformers 4.x (pinned below 5.0 for Flux2KleinPipeline compat), diffusers from git (Flux2KleinPipeline), torchao, onnxruntime-gpu, insightface, gfpgan, realesrgan, opencv-python-headless, pillow-heif

Purpose: Avoids 3+ minute pip builds on every imagetools deploy. Consumer image just copies app code on top.

Architecture: x86_64 only (CUDA, GPU runner only)

clabtree-api-base

Pre-built base image for clabtree-api. Bundles all heavy Python dependencies so API deploys only need to copy app code.

Includes:

  • PyTorch CPU (no CUDA — API runs on CPU-only server)
  • pipecat-ai with voice pipeline extras (silero, webrtc, openai, smart-turn)
  • LiveKit agents + plugins, sherpa-onnx, LocalVQE neural AEC (compiled from source)
  • fastembed (ONNX embeddings)
  • weasyprint + pandoc (PDF generation)
  • ffmpeg, audio/graphics system libraries
  • All other API Python deps (fastapi, httpx, aiosqlite, etc.)
  • All models pre-baked so the API image and runtime download nothing: livekit turn-detector EOU models (via download-files), sherpa STT/TTS, LocalVQE GGUF, and the BAAI/bge-small-en-v1.5 fastembed model. Cache dirs are pinned (HF_HOME, FASTEMBED_CACHE_PATH) so baked models resolve at runtime.

Purpose: Avoids per-deploy HuggingFace fetches. The consumer image (clabtree-api) is pure app code: FROM base + COPY . ..

Architecture: x86_64 only

accelerated_base

Media processing base image with Intel QuickSync / VA-API + FFmpeg hardware acceleration.

Features:

  • Debian trixie (Debian 13) — ships the Intel iHD media driver 25.x, which supports modern Intel iGPUs including Arrow Lake-S. (Ubuntu 22.04's iHD 22.x did not, so VA-API hardware encode failed on Arrow Lake.)
  • FFmpeg (Debian 7.1, VA-API enabled) + FFmpeg dev libraries
  • Intel QuickSync / VA-API via intel-media-va-driver-non-free (x86_64)
  • Python venv at /opt/venv (on PATH) — consumer images pip install directly, no PEP 668 friction
  • OpenCV (headless), numpy, Pillow, imageio; non-root appuser

Usage: layer your app on top and pass --device /dev/dri at runtime for hardware encode/decode:

FROM forge.jde.nz/public/accelerated_base:latest
USER root
RUN pip install --no-cache-dir my-deps
COPY app/ /app/

Architecture: x86_64 (Intel QuickSync) + arm64 (VA-API libraries only — no Intel iGPU driver, software fallback)

Note: the previous Ubuntu CUDA-11.8 NVENC libs (libnvidia-encode-515 etc.) were removed — they were unused by consumers. Use a CUDA base image if you need NVIDIA NVENC.

nemotron-speech-blackwell

Pre-built base image for clabtree-parakeet-asr on Blackwell GPUs (RTX 50xx, sm_120/sm_121). Contains everything from pipecat-ai/nemotron-january-2026 Dockerfile.unified except the app code layer (Phase 11).

Includes: PyTorch from source (CUDA 13.0/13.1), torchaudio, NeMo ASR+TTS, vLLM, llama.cpp, triton, mamba-ssm — all compiled for Blackwell (sm_120 x86_64 / sm_121 arm64).

Purpose: Avoids the 2-3 hour build on every deploy. Consumer image (clabtree-parakeet-asr) just layers the app code on top (COPY src/ + uv pip install -e .).

Architecture: x86_64 + arm64 (CUDA, GPU runner only)

nemotron-speech-ada

Pre-built base image for clabtree-parakeet-asr on Ada Lovelace GPUs (RTX 40xx, sm_89). ASR only — no vLLM, llama.cpp, or TTS. Pre-built PyTorch cu128 wheels replace the 2-3 hour from-source build in the Blackwell variant.

Includes: PyTorch + torchaudio (pre-built cu128 wheels), NeMo ASR, triton.

Purpose: Same as nemotron-speech-blackwell but for Ada Lovelace. ~20-30 min build time.

Architecture: x86_64 only (CUDA, GPU runner only)

comfyui-cuda

Headless ComfyUI API server with CUDA support. Pre-installs custom nodes for Wan 2.2 GGUF video generation.

Includes:

  • ComfyUI (latest from git)
  • PyTorch with CUDA 12.8
  • ComfyUI-GGUF (city96) — GGUF quantized model loading
  • ComfyUI-WanMoeKSampler (stduhpf) — auto HighNoise/LowNoise step splitting
  • ComfyUI-VideoHelperSuite (Kosinkadink) — video output handling
  • FFmpeg

Usage:

FROM forge.jde.nz/public/comfyui-cuda:latest
COPY workflow.json /comfyui/
COPY app/ /app/

Mount models at runtime via -v /data/models:/comfyui/models/diffusion_models etc.

Architecture: x86_64 only (CUDA)

debian-curl

Minimal Debian image with curl. Multi-arch.

Registry

Images are available at:

  • forge.jde.nz/public/<image_name>:latest — multi-arch manifest (or x86_64 for GPU images)
  • forge.jde.nz/public/<image_name>:latest-x86_64 — x86_64 specific
  • forge.jde.nz/public/<image_name>:latest-aarch64 — arm64 specific (non-GPU only)

Building

Images are automatically built and published by CI on push to main.

To build locally: ./build.sh To publish: ./publish.sh