-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
This project was able to successfully run the qwen1.5B model.
However, when running the qwen7b model, there is a probability of generating softmax overflow errors during inference, resulting in runtime failure.
The model comes from the script:
python3 convert.py --model_id qwen/Qwen2-7B-Instruct --precision int4 --output {your_path}/Qwen2-7B-Instruct-ov --modelscope
Device is: Intel ultra 7 GPU
Metadata
Metadata
Assignees
Labels
No labels