-
Notifications
You must be signed in to change notification settings - Fork 255
Added support to load model config from Metadata. #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added function to extract metadata from GGUF files.
|
Thanks, can confirm this works, example GGUF that includes the metadata and runs with this PR: https://huggingface.co/Kijai/LTXV2_comfy/blob/main/diffusion_models/ltx-2-19b-distilled_Q4_K_M.gguf |
|
Yes i have them all working. Dev and distilled versions. |
Hi, can we efficiently use a dev GGUF with the BF16 LoRA to make it distilled ? |
|
You can use the distilled model directly, or run a LoRA with the dev version. In practice, 20 full steps with CFG 4 give better results than the distilled setup alone. If memory is constraint then use distilled version. I tested dev version with LoRA on RTX 3060 12GB it ran without OOM. Using the dev version gives you more flexibility — you can run the full 20 steps at a higher CFG (without having to download the complete model again) or run 8 steps with LoRA. This model is very fast (at least 5× faster than Wan), so even 20-step runs execute quickly. |
How is distilled model better for low VRAM ? I mean it should just scale linearly, less steps make it faster regardless of vram or offloading I feel like Q5_K_M or similar will degrade quality too much So I think LoRA just to "learn to prompt LTX2" and mess with it, but later use only dev |
Using distilled version directly, will not need to load extra LoRA, that is only minimal save, if very tight on VRAM and RAM and using distilled mode only 8 steps. So then no need to load another 7GB+ as LoRA. With 16GB VRAM and 32GB RAM, you can even run full dev BF16 version, i was able to run it on 12GB, output was better and because of offloading added time was not that much, so quality won over little time increase. and i was able to generate 20 seconds@25fps without OOM. |
|
@vantagewithai |
You can use the GGUF DualCLIP Loader node from this repo to load Gemma-3 GGUF models from Unsloth or Google. This node supports loading both GGUF and safetensors. For 4-bit quantized models, you can load them directly using ComfyUI’s built-in Dual CLIP Loader node. |
here you go: https://huggingface.co/mradermacher/gemma-3-12b-it-heretic-x-i1-GGUF/tree/main |
|
Really?🤔 ComfyUI’s built-in Dual CLIP Loader node can load a folder ? gguf https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main got this error: |
|
This PR keeps giving me |
This happens because the audio VAE can’t be loaded using ComfyUI’s internal VAE Loader. You need to either use Kijai’s VAE Loader (loads from vae folder), or use ComfyUI’s LTXV Audio VAE Loader and copy the audio VAE into the checkpoints folder (Only applicable for ComfyUI’s LTXV Audio VAE Loader). |
|
@vantagewithai thanks for your reply ,but i have tried again.none of your method works, int4 folder can't be loaded by Dual CLIP Loader , gguf gemma3 architecture error by GGUF DualCLIP Loader |
Nope — the Dual CLIP Loader can’t load folders. You’re right about the Gemma GGUF though — that one can be loaded using the GGUF Dual CLIP Loader node. I’ll take a look and see how it can be supported. |
|
@vantagewithai I use kj's loadvaekj to load the audio vae but the code gives me "'VAE' object has no attribute 'latent_frequency_bins'“ error. could you tell me how i can fix this? |
|
Tried the GGUF q4 from kijai, and got this AMAZING RESULT, LMDAO!!!!! LTX_2.0_i2v_00102_.mp4 |
|
You should replace the "nodes.py" and "loader.py" of the custom node with this one of the PR. This is an example output (Q3_K_S dev model + distilled lora at 1080p) of my GGUF quant available at https://huggingface.co/QuantStack/LTX-2-GGUF LTX-2.20GGUFs.20coming.mp4 |
i got the files downloaded, now, if i am using the NATIVE comfy workflows with kijai nodes, where do i find them on comfy....?? i cant see them on the search bar..... |
You should go to custom nodes (folder) > ComfyUI-GGUF (folder) and replace that files there |
ahh got it, done! let me try now a render, thanks! |
|
@YarvixPA do you think using dev+lora has a better result than using distilled alone? |
I tried it both ways and the results were very similar — I didn’t notice any major degradation or improvement. Since this model is quite fast, I prefer using the dev version with 20 steps and CFG 4 (without distillation) for production, and dev + distilled LoRA for prototyping. |
|
workflow? please any one I downloaded the Quantstack ggufs! now which workflow to use? kijai's or ComfyUI ltx official's? |
No, I'm also going to upload the GGUF quants for the distilled version once I'm back. Since 'Dev' is the base, you can just apply the LoRA to it. However, Dev will always give you better quality as it's meant for higher step counts. |
I have the same issue , I downloaded the 2 files, did you fix it ? |
https://raw.githubusercontent.com/city96/ComfyUI-GGUF/5f715d6fda151d21f621d9ec801975d938332305/loader.py Right-click, save target as..., and replace the same files in GGUF custom_nodes folder |
Okay on some of them, below Q5 it seems there's some mixed weights yes. |
|
I mean your quants are already pretty great and really similar, nothing would beat Nunchaku either way... SVDQuant formats (W4A16/W4A4/W8A8/W4A8KV4) are so much better (and faster) than GGUF |
Usually, the most important blocks in diffusion models are the first few blocks, which refine the initial latent, and the last blocks, which produce the final latent output. But you’re right — in the case of Qwen Image Layered, I also had to quantize two middle blocks to get the best results. There’s no mathematical formula for this; you really have to test and see what works best. |
I didn't do anything special with them, just city96's script. The mixed models should technically perform better, I wish they were marked as such though, because not all of them are and for precisions such as Q6 or Q8 there's no difference. |
Nunchaku is quite good — they use SVDQuant, a 4-bit quantization scheme that significantly improves speed. GGUF, on the other hand, follows a different architecture. They were planning to add support for Wan and video models. I haven’t followed up recently, but at the time it seemed they were still limited to image models. I might be mistaken though — the last time I checked the Nunchaku project was at least a month ago. Let's Hope Kijai takes it over. :) |
I modified llama.cpp and added support for higher quantization on 6 blocks. For lower quant versions. if (arch == LLM_ARCH_LTXV){ |
|
Same thing on QuantStack GGUF. This is something already have been implemented on quants of Qwen Image |
for Qwen Image Layered, i found best results using this setup. static bool qwen_image_needs_protection(enum llama_ftype ftype) { static bool qwen_image_force_q5(const std::string & name, int block_id) { } if (arch == LLM_ARCH_QWEN_IMAGE){ For LTX-2 i used first 3 and last 3 blocks. |
|
@kijai @vantagewithai i don't understand a shit about what you guys are talking about (though ik a little bit of what is happening here).. can you guys help me setup a working workflow to user LTX-2 in comfy UI for a 12 gb V Ram device? |
Try this. Supports both T2V/I2V and safetensors/gguf in one workflow. I have tested it on 12GB VRAM and 48GB System RAM setup. Since the PR hasn’t been merged yet, this workflow is using my own custom node based on ComfyUI-GGUF to load GGUF models. If you already have the merged PR changes on your side, you can safely replace it with the ComfyUI-GGUF node. https://github.com/vantagewithai/Vantage-Nodes To run I2V mode in workflow you would need to do lots of offloading by running ComfyUI with these params. T2V mode works fine without needing to reserve VRAM. python main.py --lowvram --reserve-vram 10
|
|
Great job! I will try this one as my video gens with gguf had those pixelated effect and broken audio. Hey @vantagewithai your youtube channel is also a really good one. Thanks for the PR! |
You are most welcome! :) |
|
@vantagewithai please dude do something about the gemma I have 6GB vram! make the gemma gguf work with ltx 2, exactly on gemma part it kind of like stucks and doesn't pass from there in comfyUI! |
|
@vantagewithai Qwen Image Layered gguf is needed. There is no quality Q3 yet (unsloth and quantstack both bad,unsoth worse), thanks for your codes. can you provide a gguf link |
You might've used an old version of the Qwen Image layered by Unsloth. We just updated like a day ago for dynamic quantization. Try it out and see if you still get bad performance: https://huggingface.co/unsloth/Qwen-Image-Layered-GGUF/tree/main We're always trying to improve our formula. And we run an analysis/search to find quant configs and we are continuing to evolve methodology. |
|
Ok i will try. The most useful way to check the gguf quality is Q3 i think,if Q3 is fine, others will be very good |
@shimmyshimmer No, I didn’t use one from Unsloth. I always quantize the models myself. What I shared was simply the method that gave me the best results, especially in terms of layer-splitting accuracy. |
@zwukong @shimmyshimmer @YarvixPA @kijai Unsloth puts a lot of effort into quantizing models and keeps improving them. Also, a shoutout to QuantStack, and offcouse Kiaji for his fp8 versions — they all do great work for the community by providing high-quality quantized models for everyone. That said, for Qwen Image Layered, I’d recommend sticking with Q4_K_M or higher — even the FP8 version doesn’t perform as well as the BF16 weights. |
|
I keep on getting this error when trying gguf gemma 3 |
|
@vantagewithai thanks for this. After applying the patch and altering #398 I was able to get this "running" on my laptop with 6gb of vram(some audio sync issues but hey it's 6gb of vram on a laptop) . I also tested on my 8gb and 16gb card. Again thanks for you time and effort that you put into this. @city96 I know we all get busy with life and everything just a friendly bump for the merge. |
|
Sorry, yeah, I've had a lot of stuff to deal with and barely have a working PC to test on, so I'm really behind on new models and issues. Anyway, I checked out this PR. It does seem to break loading quantized text encoders as-is since it changes the number of elements that For the sake of speed, I'll merge it with those changes. If anything breaks, I'll be around for at least a few days so I can try and fix stuff faster. I'll also try to look at gemma3. |
This should be more future proof in case we need to return other attributes in the future. Possible breaking change for anyone using `gguf_sd_loader` directly either way, though.
|
@city96 Thanks a lot. Since you’ve merged the PR and mentioned that it breaks a few things, I think this can be handled in a non-breaking way. We can add a helper function that simply checks whether config key is present in the metadata. If it is, it returns the required metadata key or full metadata. We do this in the just this node's definition, so all other functions remain same. This way, the old implementation remains untouched, nothing breaks, and the new metadata support works seamlessly alongside it. I also made a small local change to convert.py to add support for carrying metadata into the generated GGUF files. I think it would be better if this were parameter-driven, so could you please consider adding this functionality to convert.py? Or, if you’re currently busy, i could add a parameter like --add-metadata, and based on that flag, call the required functions when generating the GGUF. I can then submit a separate PR for these changes. Apart from these changes, I’ve also added support for several new, previously unrecognized models in my local copy of convert.py. I can submit a separate PR for those changes as well, if you’d like. I added the following helper function: Then, inside convert_file, I made the following changes: |
|
@city96 Thanks for merging with the fix. After I tried @vantagewithai 's implementation which was working great for the LTX models ,I realized that it was breaking the gguf loading for the qwen clips. Now it supports that too. However there is still a minor request I want you to push to the main repo if possible. on this PR #402 it is adding Gemma3 12b support, and at the same topic this file #402 (comment) was shared on top of @vantagewithai 's implementation which was basically supporting ltx models and Gemma model at the same time. So now that you have pushed this to the main repo can you also include the Gemma support on the clip loaders? especially on the dual clip loader. So we can have all fixed and supported. Edit : I have seen your comments lately on that PR, so you know about the subject. Thanks for the great work to this date... |





The Diffusion Loader in ComfyUI can read the model config directly from the safetensors metadata. I’ve added the same support, so newer models like LTX2 are handled correctly, since ComfyUI loads the model configuration for LTX2 from the safetensors file header metadata.