Skip to content

[Deployment Issue] Installation deadlock on Hugging Face Spaces (CPU): Wheels fail (musl/glibc mismatch) & Source builds timeout #2118

@Po-Hsuan-Huang

Description

@Po-Hsuan-Huang

System Info

Platform: Hugging Face Spaces (Docker / CPU Basic Tier)

Base Image: python:3.10-slim (Debian Bookworm)

Goal: Deploy OpenAI-compatible server for a GGUF model.

Description I am unable to deploy llama-cpp-python on a CPU-only Docker environment (Hugging Face Spaces). I have attempted three distinct installation methods, all of which fail.

Describe the solution you'd like
Is there a recommended Docker pattern for Debian-based CPU-only deployments that avoids the musl wheel issue but does not require a full source compilation?

Describe alternatives you've considered

Attempt 1: Pre-built Wheels (Architecture Mismatch) I attempted to install using the CPU-specific extra index URL to avoid compilation time.

Dockerfile:

RUN apt-get update && apt-get install -y build-essential cmake gcc
RUN pip install llama-cpp-python \
    --prefer-binary \
    --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Error: The installer pulls a wheel that requires musl, causing a runtime crash on Debian.

OSError: libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory

Attempt 2: Official Docker Image I attempted to use the official image to avoid installation steps entirely. Dockerfile:

FROM ghcr.io/abetlen/llama-cpp-python:latest...

Error: The server fails to start, unable to load the shared library (likely due to GPU/CUDA dependencies in :latest or pathing issues).

FileNotFoundError: Shared library with base name 'llama' not found

Attempt 3: Build from Source I attempted to build from source to ensure correct architecture. Dockerfile:

ENV CMAKE_ARGS="-DGGML_CUDA=off"
RUN pip install llama-cpp-python --no-cache-dir

Error: The build process hits the Hugging Face Spaces timeout limit (hard limit) during the wheel building phase.

Building wheel for llama-cpp-python (pyproject.toml): started
... [Process limits out / Times out]```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions