diff --git a/community/ai-vws-sizing-advisor/CHANGELOG.md b/community/ai-vws-sizing-advisor/CHANGELOG.md
index 9fae02c9..04c8a982 100644
--- a/community/ai-vws-sizing-advisor/CHANGELOG.md
+++ b/community/ai-vws-sizing-advisor/CHANGELOG.md
@@ -2,6 +2,8 @@
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
+## [2.4] - 2026-01-13
+Added the architecture diagram to the readme.
## [2.3] - 2026-01-08
diff --git a/community/ai-vws-sizing-advisor/README.md b/community/ai-vws-sizing-advisor/README.md
index b1283035..e3360a8a 100644
--- a/community/ai-vws-sizing-advisor/README.md
+++ b/community/ai-vws-sizing-advisor/README.md
@@ -6,13 +6,13 @@
RAG-powered vGPU sizing recommendations for AI Virtual Workstations
- Powered by NVIDIA NeMo™ and Nemotron models
+ Powered by NVIDIA NeMo and Nemotron models
Official Documentation •
Demo •
- Quick Start •
+ Deployment •
Changelog
@@ -27,7 +27,7 @@ AI vWS Sizing Advisor is a RAG-powered tool that helps you determine the optimal
This tool leverages **NVIDIA Nemotron models** for intelligent sizing recommendations:
- **[Llama-3.3-Nemotron-Super-49B](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1)** — Powers the RAG backend for intelligent conversational sizing guidance
-- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations (FP8 optimized)
+- **[Nemotron-3 Nano 30B](https://build.nvidia.com/nvidia/nvidia-nemotron-3-nano-30b-a3b-fp8)** — Default model for workload sizing calculations
### Key Capabilities
@@ -42,6 +42,16 @@ The tool differentiates between RAG and inference workloads by accounting for em
---
+## Application Workflow
+
+The architecture follows a three-phase pipeline: document ingestion, RAG-based profile suggestion, and local deployment verification.
+
+
+
+
+
+---
+
## Demo
### Configuration Wizard
@@ -62,69 +72,32 @@ Validate your configuration by deploying a vLLM container locally and comparing
---
-## Prerequisites
-
-### Hardware
-- **GPU:** NVIDIA RTX Pro 6000 Blackwell Server Edition, L40S, L40, L4, or A40
-- **GPU Memory:** 24 GB minimum
-- **System RAM:** 32 GB recommended
-- **Storage:** 50 GB free space
+## Deployment
-### Software
-- **OS:** Ubuntu 22.04 LTS
-- **NVIDIA GPU Drivers:** Version 535+
+**Requirements:** Ubuntu 22.04 LTS • NVIDIA GPU (L40S/L40/L4/A40, 24GB+ VRAM) • Driver 535+ • 32GB RAM • 50GB storage
-**Quick Install:**
```bash
-# Install Docker and npm
+# 1. Install dependencies (skip if already installed)
sudo apt update && sudo apt install -y docker.io npm
-
-# Add user to docker group (recommended) OR set socket permissions
sudo usermod -aG docker $USER && newgrp docker
-# OR: sudo chmod 666 /var/run/docker.sock
-
-# Verify installations
-git --version && docker --version && npm --version && curl --version
-
-# Test GPU access in Docker
-docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
-```
-
-> **Note:** Docker must be at `/usr/bin/docker` (verified in `deploy/compose/docker-compose-rag-server.yaml`). User must be in docker group or have socket permissions.
-### API Keys
-- **NVIDIA Build API Key** (Required) — [Get your key](https://build.nvidia.com/settings/api-keys)
-- **HuggingFace Token** (Optional) — [Create token](https://huggingface.co/settings/tokens) for gated models
-
----
-
-## Deployment
-
-**1. Clone and navigate:**
-```bash
+# 2. Clone and navigate
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/ai-vws-sizing-advisor
-```
-**2. Set NGC API key:**
-```bash
+# 3. Set API key (get yours at https://build.nvidia.com/settings/api-keys)
export NGC_API_KEY="nvapi-your-key-here"
echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
-```
-**3. Start backend services:**
-```bash
+# 4. Start backend (first run takes 3-5 min)
./scripts/start_app.sh
-```
-This automatically starts all backend services (Milvus, ingestion, RAG server). First startup takes 3-5 minutes.
-**4. Start frontend (in new terminal):**
-```bash
-cd frontend
-npm install
-npm run dev
+# 5. Start frontend (in new terminal)
+cd frontend && npm install && npm run dev
```
+> **Note:** A [HuggingFace token](https://huggingface.co/settings/tokens) is required for local deployment testing with gated models (e.g., Llama).
+
---
## Usage
@@ -197,10 +170,4 @@ curl -X POST -F "file=@./vgpu_docs/your-document.pdf" http://localhost:8082/v1/i
Licensed under the Apache License, Version 2.0.
-Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).
-
----
-
-**Version:** 2.3 (January 2026) — See [CHANGELOG.md](./CHANGELOG.md)
-
-**Support:** [GitHub Issues](https://github.com/NVIDIA/GenerativeAIExamples/issues) | [NVIDIA Forums](https://forums.developer.nvidia.com/) | [Official Docs](https://docs.nvidia.com/vgpu/toolkits/sizing-advisor/latest/intro.html)
\ No newline at end of file
+Models governed by [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf) and [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).
\ No newline at end of file
diff --git a/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png b/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png
new file mode 100644
index 00000000..48453519
Binary files /dev/null and b/community/ai-vws-sizing-advisor/deployment_examples/architecture_diagram.png differ