Skip to content

[question] questions about woq_int4 quantization #13316

@xiaohoua

Description

@xiaohoua

Hi,At first, I intended to use torch.compile to speed up inference, but an error occurred:

xe_linear.forward_new
from user code:
   File "D:\miniconda3\envs\compile\Lib\site-packages\transformers\models\qwen2_5_omni\modeling_qwen2_5_omni.py", line 1838, in _forward_native
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\miniconda3\envs\compile\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\miniconda3\envs\compile\Lib\site-packages\ipex_llm\transformers\models\qwen2_5_omni.py", line 251, in qwen2_5_omni_attention_forward
    qkv = self.qkv_proj(hidden_states)
  File "D:\miniconda3\envs\compile\Lib\site-packages\ipex_llm\transformers\low_bit_linear.py", line 711, in forward
    result = xe_linear.forward_new(x_2d, w, self.qtype, self.out_len)

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

when i try to use F.linear to instead self.qkv_proj get error:
RuntimeError: self and mat2 must have the same dtype, but got BFloat16 and Byte

so i konw there are two operations in xe_linear.forward_new : dequantize + GEMM.
Therefore, I want to implement a custom operator for de-quantization.
Here is my questions:

  1. When low_bit='woq_int4', how can i get scale parameter ? In the model weights, there are quantified weights. As I understand, each uint8 stores 2 int4 weights, and then after every 64 weights (32 uint8 weights), there is a scale.But i'm not sure how to get the scale. Since QK=64, block_size_in_bytes=34. maybe every 34 bytes is 64*int4 weight + fp16 scale.
  2. Is the quantitative range [-8,7] or [-7,7]?
  3. packing sequence
    Can you give me some advice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions