Skip to content

Conversation

@scottmudge
Copy link

@scottmudge scottmudge commented Jan 10, 2026

This is a concise implementation for loading the Gemma3 tokenizer, with the correct configuration values and an appropriate method of loading the tokenizer (spiece model) and mmproj alongside existing GGUF quants available on huggingface.

I've extracted the tokenizer into a standalone model, along with the mmproj. Note that these two supplementary models are required:

https://huggingface.co/smhf72/gemma-3-12b-it-extras-comfy

These two models should be placed in the models/clip directory, and they are automatically loaded when a gemma3 architecture is detected.

It works as expected with these GGUFs:

and others.


EDIT:

If testing, this expects #399 to be merged first.

Small fixes

Update to spiece tokenizer loading.

Switch to pathlib
@zwukong
Copy link

zwukong commented Jan 10, 2026

cool , but is mmproj.gguf necessary? that pr doesn't need that

@scottmudge
Copy link
Author

scottmudge commented Jan 10, 2026

Yes it is necessary for visual reasoning. The original model distributed with the ComfyUI Gemma 3 safetensor release included the mmproj tensors.

@zwukong
Copy link

zwukong commented Jan 10, 2026

OK,i tried i2v without mmproj,seems work

@scottmudge
Copy link
Author

scottmudge commented Jan 10, 2026

Well by all means if you don't want it for whatever reason, don't use it. But I figured they included it for a reason. This PR should work fine without it (as in it won't raise an exception), but it will issue a warning.

In the ComfyUI-LTX repo's example workflows, they use a prompt "enhancer" node, which uses the image as an input to Gemma 3 and asks it to improve the prompt based on what is in the input image.

You cannot do this without the mmproj model, for example. The model won't know what the image tokens are.

@zwukong
Copy link

zwukong commented Jan 10, 2026

i know what you mean, something like VL image edit model all need that. But native comfyui clip node can not give string as result, so VL model can not be used very efficiently

@scottmudge
Copy link
Author

scottmudge commented Jan 10, 2026

That is true, however other custom nodes may want to make use of visual reasoning using the GGUF loader, so allowing the mmproj file to be loaded, as it already does with Qwen, made sense.

When it comes to making things easier for the user, it's generally better to include things by default, so they won't run into issues later if they suddenly need or want it (perhaps without even knowing what it is).

@Andy92j
Copy link

Andy92j commented Jan 10, 2026

I always get this error: not enough values to unpack (expected 3, got 2)
I've tried various gguf's (mostly the Q2 version, though)

@scottmudge
Copy link
Author

Sorry, you need to merge #399 first, then merge this PR.

#399 is needed for loading LTX-2's models (not the CLIP) as GGUF anyway.

@frost-byte
Copy link

frost-byte commented Jan 10, 2026

Sorry, you need to merge #399 first, then merge this PR.

#399 is needed for loading LTX-2's models (not the CLIP) as GGUF anyway.

Is #402 also needed? I was having an issue when using a gemma gguf with those prs merged - it was just garbled output. I have already downloaded all of the files mentioned, including the .mmproj. I assume they just need to reside in the text_encoders folder. (I did try the fp8_e4m3n version and that did work.
Actually, I guess the workflow that was able to run was t2v, trying i2v with fp8_e4m3n produced the same garbled output. So I must have something else incorrect.

@zwukong
Copy link

zwukong commented Jan 11, 2026

i don't think your mothed is right. ram cost is huge, gemma3 q4km 7G cost more than 50G ram to load. the other pr do not have that problem

@scottmudge scottmudge closed this Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants