[question] questions about woq_int4 quantization

Hi，At first, I intended to use torch.compile to speed up inference, but an error occurred：

<details>

<summary> xe_linear.forward_new </summary>

```
from user code:
   File "D:\miniconda3\envs\compile\Lib\site-packages\transformers\models\qwen2_5_omni\modeling_qwen2_5_omni.py", line 1838, in _forward_native
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\miniconda3\envs\compile\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\miniconda3\envs\compile\Lib\site-packages\ipex_llm\transformers\models\qwen2_5_omni.py", line 251, in qwen2_5_omni_attention_forward
    qkv = self.qkv_proj(hidden_states)
  File "D:\miniconda3\envs\compile\Lib\site-packages\ipex_llm\transformers\low_bit_linear.py", line 711, in forward
    result = xe_linear.forward_new(x_2d, w, self.qtype, self.out_len)

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
```

</details>

when i try to use F.linear to instead self.qkv_proj get error:
RuntimeError: self and mat2 must have the same dtype, but got BFloat16 and Byte

so i konw there are two operations in xe_linear.forward_new : dequantize + GEMM.
Therefore, I want to implement a custom operator for de-quantization.
**Here is my questions:**
1. When low_bit='woq_int4', how can i get scale parameter ？ In the model weights, there are quantified weights. As I understand, each uint8 stores 2 int4 weights, and then after every 64 weights (32 uint8 weights), there is a scale.But i'm not sure **how to get the scale**.  Since QK=64, block_size_in_bytes=34. maybe every 34 bytes is 64*int4 weight + fp16 scale.
2. Is the quantitative range [-8,7] or [-7,7]?
3. packing sequence
Can you give me some advice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] questions about woq_int4 quantization #13316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[question] questions about woq_int4 quantization #13316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions