Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,37 @@ Troubleshooting
cd transformer_engine
pip install -v -v -v --no-build-isolation .

**Problems using UV or Virtual Environments:**

1. **Import Error:**

* **Symptoms:** Cannot import ``transformer_engine``
* **Solution:** Ensure your UV environment is active and that you have used ``uv pip install --no-build-isolation <te_pypi_package_or_wheel_or_source_dir>`` instead of a regular pip install to your system environment.

2. **cuDNN Sublibrary Loading Failed:**

* **Symptoms:** Errors at runtime with ``CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED``
* **Solution:** This can occur when TE is built against the container's system installation of cuDNN, but pip packages inside the virtual environment pull in pip packages for ``nvidia-cudnn-cu12/cu13``. To resolve this, when building TE from source please specify the following environment variables to point to the cuDNN in your virtual environment.


Comment on lines +322 to +323
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: extra blank line - RST should have only one blank line before code blocks (see lines 305-306 for consistent formatting)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

.. code-block:: bash

export CUDNN_PATH=$(pwd)/.venv/lib/python3.12/site-packages/nvidia/cudnn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hardcoded Python version may not work for all users - consider using a generic placeholder like pythonX.Y or explaining users should adjust this

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

export CUDNN_HOME=$CUDNN_PATH
export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$LD_LIBRARY_PATH

3. **Building Wheels:**

* **Symptoms:** Regular TE installs work correctly but UV wheel builds fail at runtime.
* **Solution:** Ensure that ``uv build --wheel --no-build-isolation -v`` is used during the wheel build as well as the pip installation of the wheel. Use ``-v`` for verbose output to verify that TE is not pulling in a mismatching version of PyTorch or JAX that differs from the UV environment's version.

**JAX-specific Common Issues and Solutions:**

1. **FFI Issues:**

* **Symptoms:** ``No registered implementation for custom call to <some_te_ffi> for platform CUDA``
* **Solution:** Ensure ``--no-build-isolation`` is used during installation. If pre-building wheels, ensure that the wheel is both built and installed with ``--no-build-isolation``. See "Problems using UV or Virtual Environments" above if using UV.

.. troubleshooting-end-marker-do-not-remove

Breaking Changes
Expand Down
Loading