Skip to content

Commit ce83dda

Browse files
authored
Add support for GraniteMoeHybrid (#1426)
1 parent d6b3998 commit ce83dda

File tree

4 files changed

+11
-0
lines changed

4 files changed

+11
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
344344
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
345345
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
346346
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://huggingface.co/papers/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
347+
1. **[GraniteMoeHybrid](https://huggingface.co/docs/transformers/main/model_doc/granitemoehybrid)** (from IBM) released with the blog post [IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise](https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models) by the IBM Granite team.
347348
1. **[Grounding DINO](https://huggingface.co/docs/transformers/model_doc/grounding-dino)** (from IDEA-Research) released with the paper [Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://huggingface.co/papers/2303.05499) by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
348349
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://huggingface.co/papers/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
349350
1. **[Helium](https://huggingface.co/docs/transformers/main/model_doc/helium)** (from the Kyutai Team) released with the blog post [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@
5858
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
5959
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
6060
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://huggingface.co/papers/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
61+
1. **[GraniteMoeHybrid](https://huggingface.co/docs/transformers/main/model_doc/granitemoehybrid)** (from IBM) released with the blog post [IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise](https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models) by the IBM Granite team.
6162
1. **[Grounding DINO](https://huggingface.co/docs/transformers/model_doc/grounding-dino)** (from IDEA-Research) released with the paper [Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://huggingface.co/papers/2303.05499) by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
6263
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://huggingface.co/papers/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
6364
1. **[Helium](https://huggingface.co/docs/transformers/main/model_doc/helium)** (from the Kyutai Team) released with the blog post [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ function getNormalizedConfig(config) {
119119
case 'olmo2':
120120
case 'mobilellm':
121121
case 'granite':
122+
case 'granitemoehybrid':
122123
case 'cohere':
123124
case 'mistral':
124125
case 'starcoder2':

src/models.js

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4653,6 +4653,12 @@ export class GraniteModel extends GranitePreTrainedModel { }
46534653
export class GraniteForCausalLM extends GranitePreTrainedModel { }
46544654
//////////////////////////////////////////////////
46554655

4656+
//////////////////////////////////////////////////
4657+
// GraniteMoeHybrid models
4658+
export class GraniteMoeHybridPreTrainedModel extends PreTrainedModel { }
4659+
export class GraniteMoeHybridModel extends GraniteMoeHybridPreTrainedModel { }
4660+
export class GraniteMoeHybridForCausalLM extends GraniteMoeHybridPreTrainedModel { }
4661+
//////////////////////////////////////////////////
46564662

46574663
//////////////////////////////////////////////////
46584664
// Cohere models
@@ -7841,6 +7847,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
78417847
['olmo2', ['Olmo2Model', Olmo2Model]],
78427848
['mobilellm', ['MobileLLMModel', MobileLLMModel]],
78437849
['granite', ['GraniteModel', GraniteModel]],
7850+
['granitemoehybrid', ['GraniteMoeHybridModel', GraniteMoeHybridModel]],
78447851
['cohere', ['CohereModel', CohereModel]],
78457852
['gemma', ['GemmaModel', GemmaModel]],
78467853
['gemma2', ['Gemma2Model', Gemma2Model]],
@@ -7951,6 +7958,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
79517958
['olmo2', ['Olmo2ForCausalLM', Olmo2ForCausalLM]],
79527959
['mobilellm', ['MobileLLMForCausalLM', MobileLLMForCausalLM]],
79537960
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
7961+
['granitemoehybrid', ['GraniteMoeHybridForCausalLM', GraniteMoeHybridForCausalLM]],
79547962
['cohere', ['CohereForCausalLM', CohereForCausalLM]],
79557963
['gemma', ['GemmaForCausalLM', GemmaForCausalLM]],
79567964
['gemma2', ['Gemma2ForCausalLM', Gemma2ForCausalLM]],

0 commit comments

Comments
 (0)