Skip to content

Conversation

alvarobartt
Copy link
Member

What does this PR do?

This PR adds support for the Gemma3 architecture only for text, running on FP32 precision on CPU, MPS and CUDA devices.

Note

Neither FP16 nor Flash Attention are supported for Gemma3, given that FP16 is not supported due to weight instability leading to nan values in the hidden states.

Additionally, this PR also adds support for the ONNX export, which contains the embeddings at sentence_embeddings rather than the last_hidden_states from the Transformer layer, as some models might contain Transformer + Pooling + Dense modules, meaning that the ONNX export comes with all those packed together, unlike standard Transformer + Dense ONNX exports that output the last_hidden_states only so that the pooling is applied separately afterwards.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@Narsil

alvarobartt and others added 30 commits July 25, 2025 16:46
kaixuanliu and others added 11 commits August 21, 2025 09:52
Signed-off-by: Liu, Kaixuan <[email protected]>
Some models contain Dense layers after the Transformer + Pooling, which
means that the ONNX export might already contain the sentence embeddings
given that the export needs to come with Transformer + Pooling + Dense.
This commit adds the `outputs.contains_key("sentence_embedding")`
handling to overcome that.
Otherwise it will default to `DType::Float16` when `--features
candle-cuda`
@alvarobartt alvarobartt requested a review from Narsil September 4, 2025 14:32
@alvarobartt alvarobartt mentioned this pull request Sep 4, 2025
5 tasks
@alvarobartt alvarobartt merged commit abf7d42 into main Sep 4, 2025
14 checks passed
@alvarobartt alvarobartt deleted the new-model-addition branch September 4, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants