huggingface
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 4 deletions b/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎docs/source/en/attention_interface.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/en/attention_interface.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/en/auto_docstring.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/en/auto_docstring.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/en/cache_explanation.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/en/cache_explanation.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/en/chat_extras.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/en/chat_extras.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/en/chat_templating.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/en/chat_templating.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/en/cursor.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/en/cursor.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/en/generation_strategies.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/source/en/generation_strategies.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/source/en/glossary.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/en/glossary.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/en/how_to_hack_models.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/en/how_to_hack_models.md‎
Lines changed: 1 addition & 1 deletion
@@ -278,13 +278,14 @@ are working on it).<br>
 useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.<br>
 ☐ Make sure existing tests pass.<br>
 ☐ If adding a new feature, also add tests for it.<br>
-   - If you are adding a new model, make sure you use
+
+- If you are adding a new model, make sure you use
      `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` to trigger the common tests.
-   - If you are adding new `@slow` tests, make sure they pass using
+- If you are adding new `@slow` tests, make sure they pass using
      `RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py`.
-   - If you are adding a new tokenizer, write tests and make sure
+- If you are adding a new tokenizer, write tests and make sure
      `RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` passes.
-   - CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
+- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
 
 ☐ All public methods must have informative docstrings (see
 [`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)
@@ -340,6 +341,7 @@ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/t
 ```
 
 Like the slow tests, there are other environment variables available which are not enabled by default during testing:
+
 - `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers.
 
 More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py).
 
@@ -193,4 +193,4 @@ def custom_attention_mask(
 
 It mostly works thanks to the `mask_function`, which is a `Callable` in the form of [torch's mask_mod functions](https://pytorch.org/blog/flexattention/), taking 4 indices as input and returning a boolean to indicate if this position should take part in the attention computation.
 
-If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
+If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
@@ -210,9 +210,9 @@ There are some rules for documenting different types of arguments and they're li
         This can span multiple lines.
     ```
 
-    * Include `type` in backticks.
-    * Add *optional* if the argument is not required or has a default value.
-    * Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
+  * Include `type` in backticks.
+  * Add *optional* if the argument is not required or has a default value.
+  * Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
 
     These arguments can also be passed to `@auto_docstring` as a `custom_args` argument. It is used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
 
 
@@ -162,6 +162,7 @@ generated_ids = model.generate(**inputs, use_cache=True, max_new_tokens=10)
 Before the [`Cache`] class, the cache used to be stored as a tuple of tuples of tensors. This format is dynamic because it grows as text is generated, similar to [`DynamicCache`].
 
 The legacy format is essentially the same data structure but organized differently.
+
 - It's a tuple of tuples, where each inner tuple contains the key and value tensors for a layer.
 - The tensors have the same shape `[batch_size, num_heads, seq_len, head_dim]`.
 - The format is less flexible and doesn't support features like quantization or offloading.
 
@@ -221,4 +221,4 @@ model_input = tokenizer.apply_chat_template(
     messages,
     tools = [current_time, multiply]
 )
-```
+```
@@ -77,9 +77,9 @@ Mistral-7B-Instruct uses `[INST]` and `[/INST]` tokens to indicate the start and
 
 The input to `apply_chat_template` should be structured as a list of dictionaries with `role` and `content` keys. The `role` key specifies the speaker, and the `content` key contains the message. The common roles are:
 
- - `user` for messages from the user
- - `assistant` for messages from the model
- - `system` for directives on how the model should act (usually placed at the beginning of the chat)
+- `user` for messages from the user
+- `assistant` for messages from the model
+- `system` for directives on how the model should act (usually placed at the beginning of the chat)
 
 [`apply_chat_template`] takes this list and returns a formatted sequence. Set `tokenize=True` if you want to tokenize the sequence.
 
 
@@ -21,6 +21,7 @@ where `port` is the port used by `transformers serve` (`8000` by default). On th
 </h3>
 
 You're now ready to set things up on the app side! In Cursor, while you can't set a new provider, you can change the endpoint for OpenAI requests in the model selection settings. First, navigate to "Settings" > "Cursor Settings", "Models" tab, and expand the "API Keys" collapsible. To set your `transformers serve` endpoint, follow this order:
+
 1. Unselect ALL models in the list above (e.g. `gpt4`, ...);
 2. Add and select the model you want to use (e.g. `Qwen/Qwen3-4B`)
 3. Add some random text to OpenAI API Key. This field won't be used, but it can't be empty;
 
@@ -229,6 +229,7 @@ tokenizer.batch_decode(outputs, skip_special_tokens=True)
 ## Custom generation methods
 
 Custom generation methods enable specialized behavior such as:
+
 - have the model continue thinking if it is uncertain;
 - roll back generation if the model gets stuck;
 - handle special tokens with custom logic;
@@ -301,6 +302,7 @@ Updating your Python requirements accordingly will remove this error message.
 ### Creating a custom generation method
 
 To create a new generation method, you need to create a new [**Model**](https://huggingface.co/new) repository and push a few files into it.
+
 1. The model you've designed your generation method with.
 2. `custom_generate/generate.py`, which contains all the logic for your custom generation method.
 3. `custom_generate/requirements.txt`, used to optionally add new Python requirements and/or lock specific versions to correctly use your method.
@@ -377,6 +379,7 @@ def generate(model, input_ids, generation_config=None, left_padding=None, **kwar
 ```
 
 Follow the recommended practices below to ensure your custom generation method works as expected.
+
 - Feel free to reuse the logic for validation and input preparation in the original [`~GenerationMixin.generate`].
 - Pin the `transformers` version in the requirements if you use any private method/attribute in `model`.
 - Consider adding model validation, input validation, or even a separate test file to help users sanity-check your code in their environment.
@@ -410,6 +413,7 @@ tags:
 ```
 
 Recommended practices:
+
 - Document input and output differences in [`~GenerationMixin.generate`].
 - Add self-contained examples to enable quick experimentation.
 - Describe soft-requirements such as if the method only works well with a certain family of models.
@@ -442,6 +446,7 @@ output = model.generate(
 ### Finding custom generation methods
 
 You can find all custom generation methods by [searching for their custom tag.](https://huggingface.co/models?other=custom_generate), `custom_generate`. In addition to the tag, we curate two collections of `custom_generate` methods:
+
 - [Custom generation methods - Community](https://huggingface.co/collections/transformers-community/custom-generation-methods-community-6888fb1da0efbc592d3a8ab6) -- a collection of powerful methods contributed by the community;
 - [Custom generation methods - Tutorials](https://huggingface.co/collections/transformers-community/custom-generation-methods-tutorials-6823589657a94940ea02cfec) -- a collection of reference implementations for methods that previously were part of `transformers`, as well as tutorials for `custom_generate`.
 
 
@@ -185,9 +185,9 @@ See the [Fine-tune a pretrained model](https://huggingface.co/docs/transformers/
 
 The model head refers to the last layer of a neural network that accepts the raw hidden states and projects them onto a different dimension. There is a different model head for each task. For example:
 
-  * [`GPT2ForSequenceClassification`] is a sequence classification head - a linear layer - on top of the base [`GPT2Model`].
-  * [`ViTForImageClassification`] is an image classification head - a linear layer on top of the final hidden state of the `CLS` token - on top of the base [`ViTModel`].
-  * [`Wav2Vec2ForCTC`] is a language modeling head with [CTC](#connectionist-temporal-classification-ctc) on top of the base [`Wav2Vec2Model`].
+* [`GPT2ForSequenceClassification`] is a sequence classification head - a linear layer - on top of the base [`GPT2Model`].
+* [`ViTForImageClassification`] is an image classification head - a linear layer on top of the final hidden state of the `CLS` token - on top of the base [`ViTModel`].
+* [`Wav2Vec2ForCTC`] is a language modeling head with [CTC](#connectionist-temporal-classification-ctc) on top of the base [`Wav2Vec2Model`].
 
 ## I
 
 
@@ -149,4 +149,4 @@ Call [print_trainable_parameters](https://huggingface.co/docs/peft/package_refer
 ```py
 model.print_trainable_parameters()
 "trainable params: 589,824 || all params: 94,274,096 || trainable%: 0.6256"
-```
+```
Original file line number	Diff line number	Diff line change
`@@ -193,4 +193,4 @@ def custom_attention_mask(`
`193`	`193`
`194`	`194`	It mostly works thanks to the `mask_function`, which is a `Callable` in the form of [torch's mask_mod functions](https://pytorch.org/blog/flexattention/), taking 4 indices as input and returning a boolean to indicate if this position should take part in the attention computation.
`195`	`195`
`196`		-If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
	`196`	+If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
Original file line number	Diff line number	Diff line change
`@@ -221,4 +221,4 @@ model_input = tokenizer.apply_chat_template(`
`221`	`221`	`messages,`
`222`	`222`	`tools = [current_time, multiply]`
`223`	`223`	`)`
`224`		-```
	`224`	+```