You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/gpu/llm/README.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,12 +2,12 @@
2
2
3
3
Here you can find examples for large language models (LLM) text generation. These scripts:
4
4
5
-
> [!NOTE]
6
-
> New Llama models like Llama3.2-1B, Llama3.2-3B and Llama3.3-7B are also supported from release v2.8.10+xpu.
5
+
> [!NOTE]
6
+
> New Llama models like Qwen3-4B and Qwen3-8B are also supported from release v2.8.10+xpu.
7
7
8
8
- Include both inference/finetuning(lora)/bitsandbytes(qlora-finetuning).
9
9
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
10
-
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
10
+
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
11
11
- Cover model generation inference with low precision cases for different models with best performance and accuracy (fp16 AMP and weight only quantization)
12
12
13
13
## Environment Setup
@@ -124,7 +124,7 @@ where <br />
124
124
125
125
126
126
<br />
127
-
127
+
128
128
## How To Run LLM with ipex.llm
129
129
130
130
Inference and fine-tuning are supported in individual directories.
Copy file name to clipboardExpand all lines: examples/gpu/llm/inference/README.md
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Here you can find the inference examples for large language models (LLM) text generation. These scripts:
4
4
5
-
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other Chinese models such as GLM4-9B, Baichuan2-13B and Phi3-mini.
5
+
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other Chinese models such as GLM4-9B, Baichuan2-13B and Phi3-mini.
6
6
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
7
7
- Cover model generation inference with low precision cases for different models with best performance and accuracy (fp16 AMP and weight only quantization)
8
8
@@ -11,14 +11,14 @@ Here you can find the inference examples for large language models (LLM) text ge
11
11
12
12
Currently, only support Transformers 4.48.3. Support for newer versions of Transformers and more models will be available in the future.
13
13
14
-
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) |
14
+
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) |
@@ -27,16 +27,16 @@ Currently, only support Transformers 4.48.3. Support for newer versions of Trans
27
27
- ✅ signifies that it is supported.
28
28
29
29
- A blank signifies that it is not supported yet.
30
-
30
+
31
31
- 1: signifies that Llama-2-7b-hf is verified.
32
-
32
+
33
33
- 2: signifies that Meta-Llama-3-8B is verified.
34
-
34
+
35
35
- 3: signifies that Phi-3-mini-4k-instruct is verified.
36
36
37
37
38
38
39
-
**Note**: The verified models mentioned above (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well-supported with all optimizations like indirect access KV cache and fused ROPE. For other LLM families, we are actively working to implement these optimizations, which will be reflected in the expanded model list above.
39
+
**Note**: The verified models mentioned above (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well-supported with all optimizations like indirect access KV cache and fused ROPE. For other LLM families, we are actively working to implement these optimizations, which will be reflected in the expanded model list above.
0 commit comments