diff --git a/README.rst b/README.rst index 55be0e583f..211964e7e1 100644 --- a/README.rst +++ b/README.rst @@ -308,6 +308,37 @@ Troubleshooting cd transformer_engine pip install -v -v -v --no-build-isolation . +**Problems using UV or Virtual Environments:** + +1. **Import Error:** + + * **Symptoms:** Cannot import ``transformer_engine`` + * **Solution:** Ensure your UV environment is active and that you have used ``uv pip install --no-build-isolation `` instead of a regular pip install to your system environment. + +2. **cuDNN Sublibrary Loading Failed:** + + * **Symptoms:** Errors at runtime with ``CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED`` + * **Solution:** This can occur when TE is built against the container's system installation of cuDNN, but pip packages inside the virtual environment pull in pip packages for ``nvidia-cudnn-cu12/cu13``. To resolve this, when building TE from source please specify the following environment variables to point to the cuDNN in your virtual environment. + + + .. code-block:: bash + + export CUDNN_PATH=$(pwd)/.venv/lib/python3.12/site-packages/nvidia/cudnn + export CUDNN_HOME=$CUDNN_PATH + export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$LD_LIBRARY_PATH + +3. **Building Wheels:** + + * **Symptoms:** Regular TE installs work correctly but UV wheel builds fail at runtime. + * **Solution:** Ensure that ``uv build --wheel --no-build-isolation -v`` is used during the wheel build as well as the pip installation of the wheel. Use ``-v`` for verbose output to verify that TE is not pulling in a mismatching version of PyTorch or JAX that differs from the UV environment's version. + +**JAX-specific Common Issues and Solutions:** + +1. **FFI Issues:** + + * **Symptoms:** ``No registered implementation for custom call to for platform CUDA`` + * **Solution:** Ensure ``--no-build-isolation`` is used during installation. If pre-building wheels, ensure that the wheel is both built and installed with ``--no-build-isolation``. See "Problems using UV or Virtual Environments" above if using UV. + .. troubleshooting-end-marker-do-not-remove Breaking Changes