Skip to content

Conversation

rwightman
Copy link
Collaborator

Switching to RMS Norm

Fixes for correct export

Tinkering with 'mobiletnetv5' details, fixing some issues with msfa

A few tweaks and comments to example MNV5 impl

Update RmsNorm2d modules to use own 2d eager kernel instead of torch rms_norm w/ permute

Fix propagation of act_layer to RmsNormAct*, use ConvNormAct for stem instead of just Conv2d

Fixes from weights conversion

Plumbing norm_layer through to MultiQueryAttention2d

impl forward_features for Transformers compatibility

Adding forward_* APIs to MobileNetV5Encoder

cleanup

cleanup, model entrypt rename

Large redundant with 300m

Update input size for configs

Fix stem conv layer name

fix: always norm in MSFA

Always call final MSFA norm layer

Remove some FIXME, fix MSFA docstring. Remove use_layer_scale and rely on values == None, not used currently in any case.

Adds a TSV describing the full params from an Orbax checkpoint

Switching to RMS Norm

Fixes for correct export

Tinkering with 'mobiletnetv5' details, fixing some issues with msfa

A few tweaks and comments to example MNV5 impl

Update RmsNorm2d modules to use own 2d eager kernel instead of torch rms_norm w/ permute

Fix propagation of act_layer to RmsNormAct*, use ConvNormAct for stem instead of just Conv2d

Fixes from weights conversion

Plumbing norm_layer through to MultiQueryAttention2d

impl forward_features for Transformers compatibility

Adding forward_* APIs to MobileNetV5Encoder

cleanup

cleanup, model entrypt rename

Large redundant with 300m

Update input size for configs

Fix stem conv layer name

fix: always norm in MSFA

Always call final MSFA norm layer

Remove some FIXME, fix MSFA docstring. Remove use_layer_scale and rely on values == None, not used currently in any case.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman rwightman merged commit 1f69a52 into main Jun 26, 2025
26 checks passed
@JulienMaille
Copy link

Cool, do you have any plans to pretrain it?

@rwightman
Copy link
Collaborator Author

@JulienMaille possibly yes, I may do something at a 'base' or slighty smaller size as a reference.

More near term, I will bring over the image encoder only weights as timm models for fine-tune. Currently the model def is just being utilized as the encoder for the full weights as loaded in transformers. There wasn't time to co-ordinate the other bits before release.

@rwightman
Copy link
Collaborator Author

It should be pointed out that the '300m' size is the only official google size as used in gemma 3n. The base definition there is my own scale down that I used for testing, validation of the architecture. I did some initial epochs of pretrain, etc in sanity checks. Though I might tweak / change that model def before any final weights are trained on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants