Added support to load model config from Metadata. #399

vantagewithai · 2026-01-07T13:26:24Z

The Diffusion Loader in ComfyUI can read the model config directly from the safetensors metadata. I’ve added the same support, so newer models like LTX2 are handled correctly, since ComfyUI loads the model configuration for LTX2 from the safetensors file header metadata.

Added function to extract metadata from GGUF files.

…ta handling.

kijai · 2026-01-09T10:17:37Z

Thanks, can confirm this works, example GGUF that includes the metadata and runs with this PR:

https://huggingface.co/Kijai/LTXV2_comfy/blob/main/diffusion_models/ltx-2-19b-distilled_Q4_K_M.gguf

vantagewithai · 2026-01-09T11:14:05Z

Yes i have them all working. Dev and distilled versions.
https://huggingface.co/vantagewithai/LTX-2-GGUF/tree/main

Heliumrich · 2026-01-09T12:13:58Z

Thanks, can confirm this works, example GGUF that includes the metadata and runs with this PR:

https://huggingface.co/Kijai/LTXV2_comfy/blob/main/diffusion_models/ltx-2-19b-distilled_Q4_K_M.gguf

Hi, can we efficiently use a dev GGUF with the BF16 LoRA to make it distilled ?
I mean in terms of memory usage, or is it better to download a distilled GGUF directly

vantagewithai · 2026-01-09T12:20:13Z

You can use the distilled model directly, or run a LoRA with the dev version. In practice, 20 full steps with CFG 4 give better results than the distilled setup alone. If memory is constraint then use distilled version. I tested dev version with LoRA on RTX 3060 12GB it ran without OOM.

Using the dev version gives you more flexibility — you can run the full 20 steps at a higher CFG (without having to download the complete model again) or run 8 steps with LoRA. This model is very fast (at least 5× faster than Wan), so even 20-step runs execute quickly.
The distilled LoRA is especially useful for upscaling, where higher CFG or more steps aren’t necessary. Overall, the best approach is to use the dev version together with the distilled LoRA.

Heliumrich · 2026-01-09T12:36:04Z

You can use the distilled model directly, or run a LoRA with the dev version. In practice, 20 full steps with CFG 4 give better results than the distilled setup alone. If memory is constraint then use distilled version. I tested dev version with LoRA on RTX 3060 12GB it ran without OOM.

Using the dev version gives you more flexibility — you can run the full 20 steps at a higher CFG (without having to download the complete model again) or run 8 steps with LoRA. This model is very fast (at least 5× faster than Wan), so even 20-step runs execute quickly. The distilled LoRA is especially useful for upscaling, where higher CFG or more steps aren’t necessary. Overall, the best approach is to use the dev version together with the distilled LoRA.

How is distilled model better for low VRAM ? I mean it should just scale linearly, less steps make it faster regardless of vram or offloading
With 16GB VRAM (and 32GB RAM), I will probably use Q6 with a tiny bit of offloading

I feel like Q5_K_M or similar will degrade quality too much
And big resolution and/or long video will take even more vram, so only Q4 would have enough headroom for 720p 5sec or something

So I think LoRA just to "learn to prompt LTX2" and mess with it, but later use only dev

vantagewithai · 2026-01-09T12:44:19Z

How is distilled model better for low VRAM ? I mean it should just scale linearly, less steps make it faster regardless of vram or offloading
With 16GB VRAM (and 32GB RAM), I will probably use Q6 with a tiny bit of offloading

Using distilled version directly, will not need to load extra LoRA, that is only minimal save, if very tight on VRAM and RAM and using distilled mode only 8 steps. So then no need to load another 7GB+ as LoRA.

With 16GB VRAM and 32GB RAM, you can even run full dev BF16 version, i was able to run it on 12GB, output was better and because of offloading added time was not that much, so quality won over little time increase. and i was able to generate 20 seconds@25fps without OOM.

zwukong · 2026-01-09T12:57:19Z

@vantagewithai
Great work , thanks so much. And it will be greater if we can have gemma-3 gguf or 4bit like this https://huggingface.co/unsloth/gemma-3-12b-it-qat-bnb-4bit/tree/main

vantagewithai · 2026-01-09T13:19:45Z

@vantagewithai Great work , thanks so much. And it will be greater if we can have gemma-3 gguf or 4bit like this https://huggingface.co/unsloth/gemma-3-12b-it-qat-bnb-4bit/tree/main

You can use the GGUF DualCLIP Loader node from this repo to load Gemma-3 GGUF models from Unsloth or Google. This node supports loading both GGUF and safetensors.

For 4-bit quantized models, you can load them directly using ComfyUI’s built-in Dual CLIP Loader node.

theOliviaRossi · 2026-01-09T13:23:29Z

@vantagewithai Great work , thanks so much. And it will be greater if we can have gemma-3 gguf or 4bit like this https://huggingface.co/unsloth/gemma-3-12b-it-qat-bnb-4bit/tree/main

here you go: https://huggingface.co/mradermacher/gemma-3-12b-it-heretic-x-i1-GGUF/tree/main

zwukong · 2026-01-09T13:25:25Z

Really?🤔

ComfyUI’s built-in Dual CLIP Loader node can load a folder ?

gguf https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main got this error：

Unexpected text model architecture type in GGUF file: 'gemma3'

rmcc3 · 2026-01-09T13:37:25Z

This PR keeps giving me 'VAE' object has no attribute 'latent_frequency_bins', even when using "working" workflows.

vantagewithai · 2026-01-09T13:54:13Z

This PR keeps giving me 'VAE' object has no attribute 'latent_frequency_bins', even when using "working" workflows.

This happens because the audio VAE can’t be loaded using ComfyUI’s internal VAE Loader. You need to either use Kijai’s VAE Loader (loads from vae folder), or use ComfyUI’s LTXV Audio VAE Loader and copy the audio VAE into the checkpoints folder (Only applicable for ComfyUI’s LTXV Audio VAE Loader).

zwukong · 2026-01-09T14:05:33Z

@vantagewithai thanks for your reply ,but i have tried again.none of your method works, int4 folder can't be loaded by Dual CLIP Loader , gguf gemma3 architecture error by GGUF DualCLIP Loader

vantagewithai · 2026-01-09T14:06:54Z

Really?🤔

ComfyUI’s built-in Dual CLIP Loader node can load a folder ?

gguf https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main got this error：
Unexpected text model architecture type in GGUF file: 'gemma3'

Nope — the Dual CLIP Loader can’t load folders.

You’re right about the Gemma GGUF though — that one can be loaded using the GGUF Dual CLIP Loader node. I’ll take a look and see how it can be supported.

Bradley-Liu · 2026-01-09T14:41:31Z

@vantagewithai I use kj's loadvaekj to load the audio vae but the code gives me "'VAE' object has no attribute 'latent_frequency_bins'“ error. could you tell me how i can fix this?

LIQUIDMIND111 · 2026-01-09T14:51:01Z

Tried the GGUF q4 from kijai, and got this AMAZING RESULT, LMDAO!!!!!

LTX_2.0_i2v_00102_.mp4

YarvixPA · 2026-01-09T14:58:18Z

You should replace the "nodes.py" and "loader.py" of the custom node with this one of the PR.

This is an example output (Q3_K_S dev model + distilled lora at 1080p) of my GGUF quant available at https://huggingface.co/QuantStack/LTX-2-GGUF

LTX-2.20GGUFs.20coming.mp4

LIQUIDMIND111 · 2026-01-09T15:06:54Z

You should replace the "nodes.py" and "loader.py" of the custom node with this one of the PR.

This is an example output (Q3_K_S dev model + distilled lora at 1080p) of my GGUF quant available at https://huggingface.co/QuantStack/LTX-2-GGUF

LTX-2.20GGUFs.20coming.mp4

i got the files downloaded, now, if i am using the NATIVE comfy workflows with kijai nodes, where do i find them on comfy....?? i cant see them on the search bar.....

YarvixPA · 2026-01-09T15:09:43Z

You should replace the "nodes.py" and "loader.py" of the custom node with this one of the PR.

This is an example output (Q3_K_S dev model + distilled lora at 1080p) of my GGUF quant available at https://huggingface.co/QuantStack/LTX-2-GGUF

LTX-2.20GGUFs.20coming.mp4

i got the files downloaded, now, if i am using the NATIVE comfy workflows with kijai nodes, where do i find them on comfy....?? i cant see them on the search bar.....

You should go to custom nodes (folder) > ComfyUI-GGUF (folder) and replace that files there

LIQUIDMIND111 · 2026-01-09T15:11:33Z

You should replace the "nodes.py" and "loader.py" of the custom node with this one of the PR.

This is an example output (Q3_K_S dev model + distilled lora at 1080p) of my GGUF quant available at https://huggingface.co/QuantStack/LTX-2-GGUF

LTX-2.20GGUFs.20coming.mp4

i got the files downloaded, now, if i am using the NATIVE comfy workflows with kijai nodes, where do i find them on comfy....?? i cant see them on the search bar.....

You should go to custom nodes (folder) > ComfyUI-GGUF (folder) and replace that files there

ahh got it, done! let me try now a render, thanks!

Bradley-Liu · 2026-01-09T15:37:32Z

@YarvixPA do you think using dev+lora has a better result than using distilled alone?

vantagewithai · 2026-01-09T16:35:42Z

@YarvixPA do you think using dev+lora has a better result than using distilled alone?

I tried it both ways and the results were very similar — I didn’t notice any major degradation or improvement. Since this model is quite fast, I prefer using the dev version with 20 steps and CFG 4 (without distillation) for production, and dev + distilled LoRA for prototyping.

LostnD · 2026-01-09T16:39:50Z

workflow? please any one I downloaded the Quantstack ggufs! now which workflow to use? kijai's or ComfyUI ltx official's?

LostnD · 2026-01-09T16:57:09Z

I'm getting this error

I tried gemma 4bit safetensors file bot of them from one folder and also gemma fp8 _e4m3fn

YarvixPA · 2026-01-09T18:15:37Z

@YarvixPA do you think using dev+lora has a better result than using distilled alone?

No, I'm also going to upload the GGUF quants for the distilled version once I'm back. Since 'Dev' is the base, you can just apply the LoRA to it. However, Dev will always give you better quality as it's meant for higher step counts.

guiteubeuh · 2026-01-09T18:52:48Z

Tried the GGUF q4 from kijai, and got this AMAZING RESULT, LMDAO!!!!!
LTX_2.0_i2v_00102_.mp4

I have the same issue , I downloaded the 2 files, did you fix it ?

Heliumrich · 2026-01-10T17:15:05Z

can anybody here Post as an Attachement those Two files? - loader.py - and - nodes.py - so we can download and use them

https://raw.githubusercontent.com/city96/ComfyUI-GGUF/5f715d6fda151d21f621d9ec801975d938332305/loader.py
https://raw.githubusercontent.com/city96/ComfyUI-GGUF/f083506720f2f049631ed6b6e937440f5579f6c7/nodes.py

Right-click, save target as..., and replace the same files in GGUF custom_nodes folder
If, for some reason, this PR get updated more, these links won't reflect the newer changes

kijai · 2026-01-10T17:17:36Z

Hmm, don't unsloth run some test to find which blocks are more "important" and do some logic specific for each model ? That's what they do for LLMs at least. They auto-correct in an iterative way

Okay on some of them, below Q5 it seems there's some mixed weights yes.

Heliumrich · 2026-01-10T17:18:39Z

I mean your quants are already pretty great and really similar, nothing would beat Nunchaku either way...
(please Kijai fork nunchaku and add support, the project is basically dead 😭 )

SVDQuant formats (W4A16/W4A4/W8A8/W4A8KV4) are so much better (and faster) than GGUF
DeepCompressor and Nunchaku projects are so slow to add new models :/

vantagewithai · 2026-01-10T17:20:35Z

Hmm, don't unsloth run some test to find which blocks are more "important" and do some logic specific for each model ?
That's what they do for LLMs at least. They auto-correct in an iterative way

Usually, the most important blocks in diffusion models are the first few blocks, which refine the initial latent, and the last blocks, which produce the final latent output.

But you’re right — in the case of Qwen Image Layered, I also had to quantize two middle blocks to get the best results. There’s no mathematical formula for this; you really have to test and see what works best.

kijai · 2026-01-10T17:21:58Z

I mean your quants are already pretty great and really similar, nothing would beat Nunchaku either way... (please Kijai fork nunchaku and add support, the project is basically dead 😭 )

SVDQuant formats (W4A16/W4A4/W8A8/W4A8KV4) are so much better (and faster) than GGUF

I didn't do anything special with them, just city96's script. The mixed models should technically perform better, I wish they were marked as such though, because not all of them are and for precisions such as Q6 or Q8 there's no difference.

vantagewithai · 2026-01-10T17:24:37Z

I mean your quants are already pretty great and really similar, nothing would beat Nunchaku either way... (please Kijai fork nunchaku and add support, the project is basically dead 😭 )

SVDQuant formats (W4A16/W4A4/W8A8/W4A8KV4) are so much better (and faster) than GGUF

Nunchaku is quite good — they use SVDQuant, a 4-bit quantization scheme that significantly improves speed. GGUF, on the other hand, follows a different architecture.

They were planning to add support for Wan and video models. I haven’t followed up recently, but at the time it seemed they were still limited to image models. I might be mistaken though — the last time I checked the Nunchaku project was at least a month ago.

Let's Hope Kijai takes it over. :)

vantagewithai · 2026-01-10T17:26:34Z

I mean your quants are already pretty great and really similar, nothing would beat Nunchaku either way... (please Kijai fork nunchaku and add support, the project is basically dead 😭 )
SVDQuant formats (W4A16/W4A4/W8A8/W4A8KV4) are so much better (and faster) than GGUF

I didn't do anything special with them, just city96's script. The mixed models should technically perform better, I wish they were marked as such though, because not all of them are and for precisions such as Q6 or Q8 there's no difference.

I modified llama.cpp and added support for higher quantization on 6 blocks. For lower quant versions.

if (arch == LLM_ARCH_LTXV){
if (
(name.find("transformer_blocks.0.") != std::string::npos) ||
(name.find("transformer_blocks.1.") != std::string::npos) ||
(name.find("transformer_blocks.2.") != std::string::npos) ||
// (name.find("transformer_blocks.29.") != std::string::npos) ||
// (name.find("transformer_blocks.30.") != std::string::npos) ||
(name.find("transformer_blocks.45.") != std::string::npos) ||
(name.find("transformer_blocks.46.") != std::string::npos) ||
(name.find("transformer_blocks.47.") != std::string::npos) // this should be dynamic
) {
if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_S || ftype == LLAMA_FTYPE_MOSTLY_Q2_K) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M || ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_0 || ftype == LLAMA_FTYPE_MOSTLY_Q4_1) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_0 || ftype == LLAMA_FTYPE_MOSTLY_Q5_1) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_S) {
new_type = GGML_TYPE_Q5_K;
}
else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
new_type = GGML_TYPE_Q6_K;
}
}
}

YarvixPA · 2026-01-10T18:42:57Z

Same thing on QuantStack GGUF. This is something already have been implemented on quants of Qwen Image

+    // LTX-2: first/last block high precision for lower quants
+    if (arch == LLM_ARCH_LTXV) {
+        if (
+            (name.find("transformer_blocks.0.") != std::string::npos) ||
+            (name.find("transformer_blocks.47.") != std::string::npos) // 48 blocks total (0-47)
+        ) {
+            if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q3_K_S ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q4_0 ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q4_1 ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S ||
+                ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) {
+                new_type = GGML_TYPE_Q5_K;  // Minimum Q5_K for low quants
+            }
+            else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
+                new_type = GGML_TYPE_Q6_K;
+            }
+        }
+    }

vantagewithai · 2026-01-10T18:51:53Z

Same thing on QuantStack GGUF. This is something already have been implemented on quants of Qwen Image

for Qwen Image Layered, i found best results using this setup.

static bool qwen_image_needs_protection(enum llama_ftype ftype) {
switch (ftype) {
case LLAMA_FTYPE_MOSTLY_Q2_K:
case LLAMA_FTYPE_MOSTLY_Q3_K_S:
case LLAMA_FTYPE_MOSTLY_Q3_K_M:
case LLAMA_FTYPE_MOSTLY_Q4_K_M:
case LLAMA_FTYPE_MOSTLY_Q4_K_S:
case LLAMA_FTYPE_MOSTLY_Q5_K_S:
case LLAMA_FTYPE_MOSTLY_Q5_0:
case LLAMA_FTYPE_MOSTLY_Q4_0:
case LLAMA_FTYPE_MOSTLY_Q4_1:
return true;
default:
return false;
}
}

static bool qwen_image_force_q5(const std::string & name, int block_id) {
// Attention Q/K projections
if (name.find(".attn.") != std::string::npos) {
if (name.find("q_proj") != std::string::npos ||
name.find("k_proj") != std::string::npos ||
name.find("to_q") != std::string::npos ||
name.find("to_k") != std::string::npos) {
return true;
}
}

// Early MLP + modulation layers
if (block_id >= 0 && block_id <= 5) {
    if (name.find(".img_mlp.") != std::string::npos ||
        name.find(".txt_mlp.") != std::string::npos ||
        name.find(".img_mod.") != std::string::npos ||
        name.find(".txt_mod.") != std::string::npos) {
        return true;
    }
}

return false;

}

if (arch == LLM_ARCH_QWEN_IMAGE){
if (
(name.find("transformer_blocks.0.") != std::string::npos) ||
(name.find("transformer_blocks.1.") != std::string::npos) ||
(name.find("transformer_blocks.2.") != std::string::npos) ||
(name.find("transformer_blocks.29.") != std::string::npos) ||
(name.find("transformer_blocks.30.") != std::string::npos) ||
(name.find("transformer_blocks.57.") != std::string::npos) ||
(name.find("transformer_blocks.58.") != std::string::npos) ||
(name.find("transformer_blocks.59.") != std::string::npos) // this should be dynamic
) {
if (qwen_image_needs_protection(ftype)) {
const int block_id = get_block_id(name);
if (qwen_image_force_q5(name, block_id)) {
new_type = GGML_TYPE_Q4_K;
}
}
}
}

For LTX-2 i used first 3 and last 3 blocks.

Arjun-Haridasan · 2026-01-10T19:02:33Z

@kijai @vantagewithai i don't understand a shit about what you guys are talking about (though ik a little bit of what is happening here).. can you guys help me setup a working workflow to user LTX-2 in comfy UI for a 12 gb V Ram device?

vantagewithai · 2026-01-10T19:04:37Z

@kijai @vantagewithai i don't understand a shit about what you guys are talking about.. can you guys help me setup a working workflow to user LTX-2 in comfy UI for a 12 gb V Ram device?

Try this. Supports both T2V/I2V and safetensors/gguf in one workflow. I have tested it on 12GB VRAM and 48GB System RAM setup.

Since the PR hasn’t been merged yet, this workflow is using my own custom node based on ComfyUI-GGUF to load GGUF models. If you already have the merged PR changes on your side, you can safely replace it with the ComfyUI-GGUF node.

https://github.com/vantagewithai/Vantage-Nodes

To run I2V mode in workflow you would need to do lots of offloading by running ComfyUI with these params. T2V mode works fine without needing to reserve VRAM.

python main.py --lowvram --reserve-vram 10

https://huggingface.co/vantagewithai/LTX-2-Split/resolve/main/Vantage-LTX2-Advanced-Workflow-GGUF-Support.json?download=true

Arjun-Haridasan · 2026-01-10T19:37:03Z

I was able to run I2V Wan 14b model with 12 gb vram and 32 gb ram(though it takes 45 mins to generate a 1080p 5 sec video).. i need something like this that works. But after updating none of the workflows that i have are working everything has an some or the other error... could you help me by telling how to setup comfyui so that errors can be avoided after updating???

FlowDownTheRiver · 2026-01-10T20:03:14Z

Great job! I will try this one as my video gens with gguf had those pixelated effect and broken audio. Hey @vantagewithai your youtube channel is also a really good one. Thanks for the PR!

vantagewithai · 2026-01-10T20:35:28Z

Great job! I will try this one as my video gens with gguf had those pixelated effect and broken audio. Hey @vantagewithai your youtube channel is also a really good one. Thanks for the PR!

You are most welcome! :)

LostnD · 2026-01-10T23:01:40Z

@vantagewithai please dude do something about the gemma I have 6GB vram! make the gemma gguf work with ltx 2, exactly on gemma part it kind of like stucks and doesn't pass from there in comfyUI!

zwukong · 2026-01-10T23:59:44Z

@vantagewithai Qwen Image Layered gguf is needed. There is no quality Q3 yet (unsloth and quantstack both bad,unsoth worse), thanks for your codes. can you provide a gguf link

shimmyshimmer · 2026-01-11T00:39:28Z

@vantagewithai Qwen Image Layered gguf is needed. There is no quality Q3 yet (unsloth and quantstack both bad,unsoth worse), thanks for your codes. can you provide a gguf link

You might've used an old version of the Qwen Image layered by Unsloth. We just updated like a day ago for dynamic quantization. Try it out and see if you still get bad performance: https://huggingface.co/unsloth/Qwen-Image-Layered-GGUF/tree/main

We're always trying to improve our formula. And we run an analysis/search to find quant configs and we are continuing to evolve methodology.

zwukong · 2026-01-11T01:00:08Z

Ok i will try. The most useful way to check the gguf quality is Q3 i think,if Q3 is fine, others will be very good

vantagewithai · 2026-01-11T06:49:24Z

@vantagewithai Qwen Image Layered gguf is needed. There is no quality Q3 yet (unsloth and quantstack both bad,unsoth worse), thanks for your codes. can you provide a gguf link

You might've used an old version of the Qwen Image layered by Unsloth. We just updated like a day ago for dynamic quantization. Try it out and see if you still get bad performance: https://huggingface.co/unsloth/Qwen-Image-Layered-GGUF/tree/main

We're always trying to improve our formula. And we run an analysis/search to find quant configs and we are continuing to evolve methodology.

@shimmyshimmer No, I didn’t use one from Unsloth. I always quantize the models myself. What I shared was simply the method that gave me the best results, especially in terms of layer-splitting accuracy.

vantagewithai · 2026-01-11T07:19:55Z

Ok i will try. The most useful way to check the gguf quality is Q3 i think,if Q3 is fine, others will be very good

@zwukong @shimmyshimmer @YarvixPA @kijai Unsloth puts a lot of effort into quantizing models and keeps improving them. Also, a shoutout to QuantStack, and offcouse Kiaji for his fp8 versions — they all do great work for the community by providing high-quality quantized models for everyone. That said, for Qwen Image Layered, I’d recommend sticking with Q4_K_M or higher — even the FP8 version doesn’t perform as well as the BF16 weights.

nizamani · 2026-01-11T16:39:26Z

I keep on getting this error when trying gguf gemma 3

I tried UnSloth gemma 3 gguf as well as this one https://huggingface.co/mradermacher/gemma-3-12b-it-heretic-x-i1-GGUF/tree/main

JosephMillsAtWork · 2026-01-11T16:55:44Z

@vantagewithai thanks for this. After applying the patch and altering #398 I was able to get this "running" on my laptop with 6gb of vram(some audio sync issues but hey it's 6gb of vram on a laptop) . I also tested on my 8gb and 16gb card. Again thanks for you time and effort that you put into this.

@city96 I know we all get busy with life and everything just a friendly bump for the merge.

city96 · 2026-01-11T18:29:31Z

Sorry, yeah, I've had a lot of stuff to deal with and barely have a working PC to test on, so I'm really behind on new models and issues.

Anyway, I checked out this PR. It does seem to break loading quantized text encoders as-is since it changes the number of elements that gguf_sd_loader returns. I'll push a fix to this branch that just changes it around a bit to return a dict instead, hopefully that's a better solution long term since we can add more stuff to it (and we no longer need to have the return_arch arg either). It does mean a breaking change either way for any node pack that tries to call gguf_sd_loader directly this one time, though not sure if that's really that common to do.

For the sake of speed, I'll merge it with those changes. If anything breaks, I'll be around for at least a few days so I can try and fix stuff faster. I'll also try to look at gemma3.

This should be more future proof in case we need to return other attributes in the future. Possible breaking change for anyone using `gguf_sd_loader` directly either way, though.

vantagewithai · 2026-01-11T19:01:08Z

@city96 Thanks a lot.

Since you’ve merged the PR and mentioned that it breaks a few things, I think this can be handled in a non-breaking way. We can add a helper function that simply checks whether config key is present in the metadata. If it is, it returns the required metadata key or full metadata. We do this in the just this node's definition, so all other functions remain same.

This way, the old implementation remains untouched, nothing breaks, and the new metadata support works seamlessly alongside it.

I also made a small local change to convert.py to add support for carrying metadata into the generated GGUF files. I think it would be better if this were parameter-driven, so could you please consider adding this functionality to convert.py?

Or, if you’re currently busy, i could add a parameter like --add-metadata, and based on that flag, call the required functions when generating the GGUF. I can then submit a separate PR for these changes.

Apart from these changes, I’ve also added support for several new, previously unrecognized models in my local copy of convert.py. I can submit a separate PR for those changes as well, if you’d like.

I added the following helper function:

def load_state_dict_with_metadata(path):
    # Load state dict and extract safetensors metadata
    if any(path.endswith(x) for x in [".ckpt", ".pt", ".bin", ".pth"]):
        state_dict = torch.load(path, map_location="cpu", weights_only=True)
        metadata = {}  # Legacy formats do not contain metadata
        for subkey in ["model", "module"]:
            if subkey in state_dict:
                state_dict = state_dict[subkey]
                break
    else:
        # Parse safetensors header for metadata
        import struct, json
        with open(path, "rb") as f:
            length = struct.unpack("<Q", f.read(8))[0]
            header = json.loads(f.read(length))
            metadata = header.get("__metadata__", {})
        
        state_dict = load_file(path)
        logging.info(f"Extracted {len(metadata)} metadata keys from safetensors")

    state_dict = strip_prefix(state_dict)
    return state_dict, metadata

Then, inside convert_file, I made the following changes:

state_dict, safetensors_metadata = load_state_dict_with_metadata(path)

# After writer creation
add_metadata_with_type(writer, safetensors_metadata)
logging.info(f"Copied {len(safetensors_metadata)} metadata keys to GGUF")

city96 · 2026-01-11T22:22:59Z

@vantagewithai

Since you’ve merged the PR and mentioned that it breaks a few things, I think this can be handled in a non-breaking way.

I think long term the current approach makes the most sense, since we might need to add other returned info to sd loader eventually, so just ripping the bandaid off and changing it once like this is likely the least painful long term. I checked a few custom nodes that I could think of, but it shouldn't break ComfyUI-MultiGPU and the other node packs I think just have a copy of the loader code instead of relying on this code directly.

I also made a small local change to convert.py to add support for carrying metadata into the generated GGUF files. I think it would be better if this were parameter-driven, so could you please consider adding this functionality to convert.py?

Yeah, I think that makes a lot of sense to have, though the convert code at the moment is a bit all over the place since half the updates are on a different branch. It'll have to be merged to master first, plus I guess a lot of the new model architectures are likely to be missing.

As a bonus, we keep the actual metadata from any model that does have it, though I guess we might want to wrap a try-catch per metadata line on the off-chance one has something weird in it that might break it. i.e. flux schnell just straight up has a base64 encoded jpeg thumbnail in the metadata. Not sure how well that gets handled.

FlowDownTheRiver · 2026-01-11T23:52:07Z

@city96 Thanks for merging with the fix. After I tried @vantagewithai 's implementation which was working great for the LTX models ,I realized that it was breaking the gguf loading for the qwen clips. Now it supports that too. However there is still a minor request I want you to push to the main repo if possible. on this PR #402 it is adding Gemma3 12b support, and at the same topic this file #402 (comment) was shared on top of @vantagewithai 's implementation which was basically supporting ltx models and Gemma model at the same time. So now that you have pushed this to the main repo can you also include the Gemma support on the clip loaders? especially on the dual clip loader. So we can have all fixed and supported.

Edit : I have seen your comments lately on that PR, so you know about the subject. Thanks for the great work to this date...

For #407 since old comfy versions don't support passing metadata (added in #399 )

vantagewithai added 2 commits January 7, 2026 18:50

Implement GGUF metadata extraction function

5f715d6

Added function to extract metadata from GGUF files.

Updated the GGUF model loading and patching classes to include metada…

f083506

…ta handling.

MeiYi-dev mentioned this pull request Jan 9, 2026

ltx2 gguf calcuis/gguf#59

Open

mountaintops mentioned this pull request Jan 9, 2026

Feature Request: Support for smaller text encoders (Gemma 3 4B / quantized Gemma) for <32GB VRAM GPUs Lightricks/ComfyUI-LTXVideo#303

Closed

Clean up return logic for extra metadata

e9963a6

This should be more future proof in case we need to return other attributes in the future. Possible breaking change for anyone using `gguf_sd_loader` directly either way, though.

city96 merged commit 58625e1 into city96:main Jan 11, 2026

city96 added a commit that referenced this pull request Jan 12, 2026

Only include metadata on new comfy versions

6ea2651

For #407 since old comfy versions don't support passing metadata (added in #399 )

kukalikuk mentioned this pull request Jan 13, 2026

GGUFLoaderKJ error 'dict' object has no attribute 'startswith' kijai/ComfyUI-KJNodes#498

Closed

Added support to load model config from Metadata. #399

Added support to load model config from Metadata. #399

Conversation

vantagewithai commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kijai commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Heliumrich commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Heliumrich commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vantagewithai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zwukong commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theOliviaRossi commented Jan 9, 2026

Uh oh!

zwukong commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmcc3 commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zwukong commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026

Uh oh!

Bradley-Liu commented Jan 9, 2026

Uh oh!

LIQUIDMIND111 commented Jan 9, 2026

Uh oh!

YarvixPA commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LIQUIDMIND111 commented Jan 9, 2026

Uh oh!

YarvixPA commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LIQUIDMIND111 commented Jan 9, 2026

Uh oh!

Bradley-Liu commented Jan 9, 2026

Uh oh!

vantagewithai commented Jan 9, 2026

Uh oh!

LostnD commented Jan 9, 2026

Uh oh!

LostnD commented Jan 9, 2026

Uh oh!

YarvixPA commented Jan 9, 2026

Uh oh!

guiteubeuh commented Jan 9, 2026

Uh oh!

Heliumrich commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kijai commented Jan 10, 2026

Uh oh!

Heliumrich commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vantagewithai commented Jan 10, 2026

Uh oh!

kijai commented Jan 10, 2026

Uh oh!

vantagewithai commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

vantagewithai commented Jan 7, 2026 •

edited

Loading

vantagewithai commented Jan 9, 2026 •

edited

Loading

vantagewithai commented Jan 9, 2026 •

edited

Loading

Heliumrich commented Jan 9, 2026 •

edited

Loading

vantagewithai commented Jan 9, 2026 •

edited

Loading

vantagewithai commented Jan 9, 2026 •

edited

Loading

zwukong commented Jan 9, 2026 •

edited

Loading

vantagewithai commented Jan 9, 2026 •

edited

Loading

YarvixPA commented Jan 9, 2026 •

edited

Loading

YarvixPA commented Jan 9, 2026 •

edited

Loading

Heliumrich commented Jan 10, 2026 •

edited

Loading

Heliumrich commented Jan 10, 2026 •

edited

Loading

vantagewithai commented Jan 10, 2026 •

edited

Loading

vantagewithai commented Jan 10, 2026 •

edited

Loading

YarvixPA commented Jan 10, 2026 •

edited

Loading

vantagewithai commented Jan 10, 2026 •

edited

Loading

Arjun-Haridasan commented Jan 10, 2026 •

edited

Loading

vantagewithai commented Jan 10, 2026 •

edited

Loading

Arjun-Haridasan commented Jan 10, 2026 •

edited

Loading

zwukong commented Jan 10, 2026 •

edited

Loading

shimmyshimmer commented Jan 11, 2026 •

edited

Loading

vantagewithai commented Jan 11, 2026 •

edited

Loading

vantagewithai commented Jan 11, 2026 •

edited

Loading

JosephMillsAtWork commented Jan 11, 2026 •

edited

Loading

vantagewithai commented Jan 11, 2026 •

edited

Loading

FlowDownTheRiver commented Jan 11, 2026 •

edited

Loading