@@ -119,7 +119,48 @@ export SHIMMY_MMAP=true
119
119
120
120
### GPU Support
121
121
122
- Currently, shimmy uses CPU-only inference. GPU support is planned for future releases.
122
+ Shimmy supports GPU acceleration through multiple backends:
123
+
124
+ #### NVIDIA CUDA Support ✅
125
+ - ** Status** : Available with ` --features llama ` build flag
126
+ - ** Requirements** : NVIDIA GPU with CUDA support, CUDA toolkit installed
127
+ - ** Automatic Detection** : Models are automatically offloaded to GPU when available
128
+ - ** Docker Support** : Use NVIDIA runtime (` --runtime=nvidia ` or ` --gpus all ` )
129
+
130
+ #### Apple Metal Support ✅
131
+ - ** Status** : Automatic on macOS with Apple Silicon or discrete GPUs
132
+ - ** Performance** : Significant acceleration confirmed on M1/M2 and AMD Radeon Pro GPUs
133
+ - ** Detection** : Automatic, no configuration required
134
+
135
+ #### CPU Fallback
136
+ - ** Status** : Always available as fallback
137
+ - ** Performance** : Multi-threaded CPU inference for systems without GPU support
138
+
139
+ #### Build Configuration
140
+
141
+ To enable GPU support, build with:
142
+ ``` bash
143
+ cargo build --release --features llama
144
+ ```
145
+
146
+ Or install via cargo with GPU features:
147
+ ``` bash
148
+ cargo install shimmy --features llama
149
+ ```
150
+
151
+ #### Docker GPU Usage
152
+
153
+ ``` dockerfile
154
+ # Use NVIDIA runtime
155
+ docker run --runtime=nvidia --gpus all shimmy:latest
156
+
157
+ # Or with docker-compose
158
+ services:
159
+ shimmy:
160
+ runtime: nvidia
161
+ environment:
162
+ - NVIDIA_VISIBLE_DEVICES=all
163
+ ```
123
164
124
165
## Security Considerations
125
166
0 commit comments