diff --git a/community/ai-vws-sizing-advisor/CHANGELOG.md b/community/ai-vws-sizing-advisor/CHANGELOG.md index 9fae02c9..04c8a982 100644 --- a/community/ai-vws-sizing-advisor/CHANGELOG.md +++ b/community/ai-vws-sizing-advisor/CHANGELOG.md @@ -2,6 +2,8 @@ All notable changes to this project will be documented in this file. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. +## [2.4] - 2026-01-13 +Added the architecture diagram to the readme. ## [2.3] - 2026-01-08 diff --git a/community/ai-vws-sizing-advisor/README.md b/community/ai-vws-sizing-advisor/README.md index b1283035..e3360a8a 100644 --- a/community/ai-vws-sizing-advisor/README.md +++ b/community/ai-vws-sizing-advisor/README.md @@ -6,13 +6,13 @@

RAG-powered vGPU sizing recommendations for AI Virtual Workstations
- Powered by NVIDIA NeMo™ and Nemotron models + Powered by NVIDIA NeMo and Nemotron models

Official DocumentationDemo • - Quick Start • + DeploymentChangelog

@@ -27,7 +27,7 @@ AI vWS Sizing Advisor is a RAG-powered tool that helps you determine the optimal This tool leverages **NVIDIA Nemotron models** for intelligent sizing recommendations: - **[Llama-3.3-Nemotron-Super-49B](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1)** — Powers the RAG backend for intelligent conversational sizing guidance -- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations (FP8 optimized) +- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations ### Key Capabilities @@ -42,6 +42,16 @@ The tool differentiates between RAG and inference workloads by accounting for em --- +## Application Workflow + +The architecture follows a three-phase pipeline: document ingestion, RAG-based profile suggestion, and local deployment verification. + +

+ Application Architecture +

+ +--- + ## Demo ### Configuration Wizard @@ -62,69 +72,32 @@ Validate your configuration by deploying a vLLM container locally and comparing --- -## Prerequisites - -### Hardware -- **GPU:** NVIDIA RTX Pro 6000 Blackwell Server Edition, L40S, L40, L4, or A40 -- **GPU Memory:** 24 GB minimum -- **System RAM:** 32 GB recommended -- **Storage:** 50 GB free space +## Deployment -### Software -- **OS:** Ubuntu 22.04 LTS -- **NVIDIA GPU Drivers:** Version 535+ +**Requirements:** Ubuntu 22.04 LTS • NVIDIA GPU (L40S/L40/L4/A40, 24GB+ VRAM) • Driver 535+ • 32GB RAM • 50GB storage -**Quick Install:** ```bash -# Install Docker and npm +# 1. Install dependencies (skip if already installed) sudo apt update && sudo apt install -y docker.io npm - -# Add user to docker group (recommended) OR set socket permissions sudo usermod -aG docker $USER && newgrp docker -# OR: sudo chmod 666 /var/run/docker.sock - -# Verify installations -git --version && docker --version && npm --version && curl --version - -# Test GPU access in Docker -docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi -``` - -> **Note:** Docker must be at `/usr/bin/docker` (verified in `deploy/compose/docker-compose-rag-server.yaml`). User must be in docker group or have socket permissions. -### API Keys -- **NVIDIA Build API Key** (Required) — [Get your key](https://build.nvidia.com/settings/api-keys) -- **HuggingFace Token** (Optional) — [Create token](https://huggingface.co/settings/tokens) for gated models - ---- - -## Deployment - -**1. Clone and navigate:** -```bash +# 2. Clone and navigate git clone https://github.com/NVIDIA/GenerativeAIExamples.git cd GenerativeAIExamples/community/ai-vws-sizing-advisor -``` -**2. Set NGC API key:** -```bash +# 3. Set API key (get yours at https://build.nvidia.com/settings/api-keys) export NGC_API_KEY="nvapi-your-key-here" echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin -``` -**3. Start backend services:** -```bash +# 4. Start backend (first run takes 3-5 min) ./scripts/start_app.sh -``` -This automatically starts all backend services (Milvus, ingestion, RAG server). First startup takes 3-5 minutes. -**4. Start frontend (in new terminal):** -```bash -cd frontend -npm install -npm run dev +# 5. Start frontend (in new terminal) +cd frontend && npm install && npm run dev ``` +> **Note:** A [HuggingFace token](https://huggingface.co/settings/tokens) is required for local deployment testing with gated models (e.g., Llama). + --- ## Usage @@ -197,10 +170,4 @@ curl -X POST -F "file=@./vgpu_docs/your-document.pdf" http://localhost:8082/v1/i Licensed under the Apache License, Version 2.0. -Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/). - ---- - -**Version:** 2.3 (January 2026) — See [CHANGELOG.md](./CHANGELOG.md) - -**Support:** [GitHub Issues](https://github.com/NVIDIA/GenerativeAIExamples/issues) | [NVIDIA Forums](https://forums.developer.nvidia.com/) | [Official Docs](https://docs.nvidia.com/vgpu/toolkits/sizing-advisor/latest/intro.html) \ No newline at end of file +Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/). \ No newline at end of file diff --git a/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png b/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png new file mode 100644 index 00000000..48453519 Binary files /dev/null and b/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png differ