Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions community/ai-vws-sizing-advisor/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

## [2.4] - 2026-01-13
Added the architecture diagram to the readme.

## [2.3] - 2026-01-08

Expand Down
81 changes: 24 additions & 57 deletions community/ai-vws-sizing-advisor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

<p align="center">
<strong>RAG-powered vGPU sizing recommendations for AI Virtual Workstations</strong><br>
Powered by NVIDIA NeMo and Nemotron models
Powered by NVIDIA NeMo and Nemotron models
</p>

<p align="center">
<a href="https://docs.nvidia.com/vgpu/toolkits/sizing-advisor/latest/intro.html">Official Documentation</a> •
<a href="#demo">Demo</a> •
<a href="#deployment">Quick Start</a> •
<a href="#deployment">Deployment</a> •
<a href="./CHANGELOG.md">Changelog</a>
</p>

Expand All @@ -27,7 +27,7 @@ AI vWS Sizing Advisor is a RAG-powered tool that helps you determine the optimal
This tool leverages **NVIDIA Nemotron models** for intelligent sizing recommendations:

- **[Llama-3.3-Nemotron-Super-49B](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1)** — Powers the RAG backend for intelligent conversational sizing guidance
- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations (FP8 optimized)
- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations

### Key Capabilities

Expand All @@ -42,6 +42,16 @@ The tool differentiates between RAG and inference workloads by accounting for em

---

## Application Workflow

The architecture follows a three-phase pipeline: document ingestion, RAG-based profile suggestion, and local deployment verification.

<p align="center">
<img src="deployment_examples/architecture_diagram.png" alt="Application Architecture" width="900">
</p>

---

## Demo

### Configuration Wizard
Expand All @@ -62,69 +72,32 @@ Validate your configuration by deploying a vLLM container locally and comparing

---

## Prerequisites

### Hardware
- **GPU:** NVIDIA RTX Pro 6000 Blackwell Server Edition, L40S, L40, L4, or A40
- **GPU Memory:** 24 GB minimum
- **System RAM:** 32 GB recommended
- **Storage:** 50 GB free space
## Deployment

### Software
- **OS:** Ubuntu 22.04 LTS
- **NVIDIA GPU Drivers:** Version 535+
**Requirements:** Ubuntu 22.04 LTS • NVIDIA GPU (L40S/L40/L4/A40, 24GB+ VRAM) • Driver 535+ • 32GB RAM • 50GB storage

**Quick Install:**
```bash
# Install Docker and npm
# 1. Install dependencies (skip if already installed)
sudo apt update && sudo apt install -y docker.io npm

# Add user to docker group (recommended) OR set socket permissions
sudo usermod -aG docker $USER && newgrp docker
# OR: sudo chmod 666 /var/run/docker.sock

# Verify installations
git --version && docker --version && npm --version && curl --version

# Test GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
```

> **Note:** Docker must be at `/usr/bin/docker` (verified in `deploy/compose/docker-compose-rag-server.yaml`). User must be in docker group or have socket permissions.

### API Keys
- **NVIDIA Build API Key** (Required) — [Get your key](https://build.nvidia.com/settings/api-keys)
- **HuggingFace Token** (Optional) — [Create token](https://huggingface.co/settings/tokens) for gated models

---

## Deployment

**1. Clone and navigate:**
```bash
# 2. Clone and navigate
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/ai-vws-sizing-advisor
```

**2. Set NGC API key:**
```bash
# 3. Set API key (get yours at https://build.nvidia.com/settings/api-keys)
export NGC_API_KEY="nvapi-your-key-here"
echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
```

**3. Start backend services:**
```bash
# 4. Start backend (first run takes 3-5 min)
./scripts/start_app.sh
```
This automatically starts all backend services (Milvus, ingestion, RAG server). First startup takes 3-5 minutes.

**4. Start frontend (in new terminal):**
```bash
cd frontend
npm install
npm run dev
# 5. Start frontend (in new terminal)
cd frontend && npm install && npm run dev
```

> **Note:** A [HuggingFace token](https://huggingface.co/settings/tokens) is required for local deployment testing with gated models (e.g., Llama).

---

## Usage
Expand Down Expand Up @@ -197,10 +170,4 @@ curl -X POST -F "file=@./vgpu_docs/your-document.pdf" http://localhost:8082/v1/i

Licensed under the Apache License, Version 2.0.

Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).

---

**Version:** 2.3 (January 2026) — See [CHANGELOG.md](./CHANGELOG.md)

**Support:** [GitHub Issues](https://github.com/NVIDIA/GenerativeAIExamples/issues) | [NVIDIA Forums](https://forums.developer.nvidia.com/) | [Official Docs](https://docs.nvidia.com/vgpu/toolkits/sizing-advisor/latest/intro.html)
Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.