-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
System Info
Platform: Hugging Face Spaces (Docker / CPU Basic Tier)
Base Image: python:3.10-slim (Debian Bookworm)
Goal: Deploy OpenAI-compatible server for a GGUF model.
Description I am unable to deploy llama-cpp-python on a CPU-only Docker environment (Hugging Face Spaces). I have attempted three distinct installation methods, all of which fail.
Describe the solution you'd like
Is there a recommended Docker pattern for Debian-based CPU-only deployments that avoids the musl wheel issue but does not require a full source compilation?
Describe alternatives you've considered
Attempt 1: Pre-built Wheels (Architecture Mismatch) I attempted to install using the CPU-specific extra index URL to avoid compilation time.
Dockerfile:
RUN apt-get update && apt-get install -y build-essential cmake gcc
RUN pip install llama-cpp-python \
--prefer-binary \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
Error: The installer pulls a wheel that requires musl, causing a runtime crash on Debian.
OSError: libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory
Attempt 2: Official Docker Image I attempted to use the official image to avoid installation steps entirely. Dockerfile:
FROM ghcr.io/abetlen/llama-cpp-python:latest...
Error: The server fails to start, unable to load the shared library (likely due to GPU/CUDA dependencies in :latest or pathing issues).
FileNotFoundError: Shared library with base name 'llama' not found
Attempt 3: Build from Source I attempted to build from source to ensure correct architecture. Dockerfile:
ENV CMAKE_ARGS="-DGGML_CUDA=off"
RUN pip install llama-cpp-python --no-cache-dir
Error: The build process hits the Hugging Face Spaces timeout limit (hard limit) during the wheel building phase.
Building wheel for llama-cpp-python (pyproject.toml): started
... [Process limits out / Times out]```