|
8 | 8 | "source": [
|
9 | 9 | "# Visual-language assistant with MiniCPM-V and OpenVINO\n",
|
10 | 10 | "\n",
|
11 |
| - "MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. \n", |
12 |
| - "More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-4) and original [repo](https://github.com/OpenBMB/MiniCPM-V).\n", |
13 |
| - "\n", |
| 11 | + "MiniCPM-V 4.5 is the latest and most capable model in the MiniCPM-V series. The model is built on Qwen3-8B and SigLIP2-400M with a total of 8B parameters. It exhibits a significant performance improvement over previous MiniCPM-V and MiniCPM-o models.\n", |
| 12 | + "More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-4_5) and original [repo](https://github.com/OpenBMB/MiniCPM-V).\n", |
14 | 13 | "\n",
|
15 | 14 | "In this tutorial we consider how to convert and optimize MiniCPM-V model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using [NNCF](https://github.com/openvinotoolkit/nncf)\n",
|
16 | 15 | "\n",
|
|
96 | 95 | },
|
97 | 96 | {
|
98 | 97 | "cell_type": "code",
|
99 |
| - "execution_count": 2, |
| 98 | + "execution_count": 1, |
100 | 99 | "id": "1534e378-1b87-4f1b-94e8-09061e960700",
|
101 | 100 | "metadata": {},
|
102 | 101 | "outputs": [],
|
|
149 | 148 | "## Select model\n",
|
150 | 149 | "[back to top ⬆️](#Table-of-contents:)\n",
|
151 | 150 | "\n",
|
| 151 | + "* **MiniCPM-V-4_5**: MiniCPM-V 4.5 is the latest and most capable model in the MiniCPM-V series. The model is built on Qwen3-8B and SigLIP2-400M with a total of 8B parameters. It exhibits a significant performance improvement over previous MiniCPM-V and MiniCPM-o models. \n", |
152 | 152 | "* **MiniCPM-V-4**: MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. \n",
|
153 | 153 | "* **MiniCPM-V-2_6**: MiniCPM-V 2.6 is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding."
|
154 | 154 | ]
|
155 | 155 | },
|
156 | 156 | {
|
157 | 157 | "cell_type": "code",
|
158 |
| - "execution_count": 3, |
| 158 | + "execution_count": 4, |
159 | 159 | "id": "a0851b3c",
|
160 | 160 | "metadata": {
|
161 | 161 | "test_replace": {
|
162 |
| - "openbmb/MiniCPM-V-4": "katuni4ka/tiny-random-minicpmv-2_6" |
| 162 | + "openbmb/MiniCPM-V-4_5": "katuni4ka/tiny-random-minicpmv-2_6" |
163 | 163 | }
|
164 | 164 | },
|
165 | 165 | "outputs": [
|
166 | 166 | {
|
167 | 167 | "data": {
|
168 | 168 | "application/vnd.jupyter.widget-view+json": {
|
169 |
| - "model_id": "289c2574f5604076bdcd8eccabc4a14f", |
| 169 | + "model_id": "0b19f64efc1941acaa7f8c482ee5e78b", |
170 | 170 | "version_major": 2,
|
171 | 171 | "version_minor": 0
|
172 | 172 | },
|
173 | 173 | "text/plain": [
|
174 |
| - "Dropdown(description='Model:', options=('openbmb/MiniCPM-V-4', 'openbmb/MiniCPM-V-2_6'), value='openbmb/MiniCP…" |
| 174 | + "Dropdown(description='Model:', options=('openbmb/MiniCPM-V-4_5', 'openbmb/MiniCPM-V-4', 'openbmb/MiniCPM-V-2_6…" |
175 | 175 | ]
|
176 | 176 | },
|
177 |
| - "execution_count": 3, |
| 177 | + "execution_count": 4, |
178 | 178 | "metadata": {},
|
179 | 179 | "output_type": "execute_result"
|
180 | 180 | }
|
181 | 181 | ],
|
182 | 182 | "source": [
|
183 | 183 | "import ipywidgets as widgets\n",
|
184 | 184 | "\n",
|
185 |
| - "model_ids = [\"openbmb/MiniCPM-V-4\", \"openbmb/MiniCPM-V-2_6\"]\n", |
| 185 | + "model_ids = [\"openbmb/MiniCPM-V-4_5\", \"openbmb/MiniCPM-V-4\", \"openbmb/MiniCPM-V-2_6\"]\n", |
186 | 186 | "\n",
|
187 | 187 | "model_selector = widgets.Dropdown(\n",
|
188 | 188 | " options=model_ids,\n",
|
|
223 | 223 | },
|
224 | 224 | {
|
225 | 225 | "cell_type": "code",
|
226 |
| - "execution_count": null, |
| 226 | + "execution_count": 5, |
227 | 227 | "id": "82e846bb",
|
228 | 228 | "metadata": {},
|
229 | 229 | "outputs": [
|
|
242 | 242 | {
|
243 | 243 | "data": {
|
244 | 244 | "text/markdown": [
|
245 |
| - "`optimum-cli export openvino --model openbmb/MiniCPM-V-4 MiniCPM-V-4-ov --trust-remote-code --weight-format fp16 --task image-text-to-text`" |
| 245 | + "`optimum-cli export openvino --model openbmb/MiniCPM-V-4_5 MiniCPM-V-4_5-ov --trust-remote-code --weight-format fp16 --task image-text-to-text`" |
246 | 246 | ],
|
247 | 247 | "text/plain": [
|
248 | 248 | "<IPython.core.display.Markdown object>"
|
|
251 | 251 | "metadata": {},
|
252 | 252 | "output_type": "display_data"
|
253 | 253 | },
|
| 254 | + { |
| 255 | + "name": "stderr", |
| 256 | + "output_type": "stream", |
| 257 | + "text": [ |
| 258 | + "/home/ethan/intel/openvino_notebooks/openvino_env/lib/python3.10/site-packages/openvino/runtime/__init__.py:10: DeprecationWarning: The `openvino.runtime` module is deprecated and will be removed in the 2026.0 release. Please replace `openvino.runtime` with `openvino`.\n", |
| 259 | + " warnings.warn(\n" |
| 260 | + ] |
| 261 | + }, |
254 | 262 | {
|
255 | 263 | "name": "stdout",
|
256 | 264 | "output_type": "stream",
|
257 | 265 | "text": [
|
258 |
| - "WARNING:nncf:NNCF provides best results with torch==2.7.*, while current torch version is 2.5.1+cpu. If you encounter issues, consider switching to torch==2.7.*\n", |
259 | 266 | "INFO:nncf:Statistics of the bitwidth distribution:\n",
|
260 | 267 | "┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑\n",
|
261 | 268 | "│ Weight compression mode │ % all parameters (layers) │ % ratio-defining parameters (layers) │\n",
|
262 | 269 | "┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥\n",
|
263 |
| - "│ int4_sym │ 100% (225 / 225) │ 100% (225 / 225) │\n", |
| 270 | + "│ int4_sym │ 100% (253 / 253) │ 100% (253 / 253) │\n", |
264 | 271 | "┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙\n"
|
265 | 272 | ]
|
266 | 273 | },
|
267 | 274 | {
|
268 | 275 | "data": {
|
269 | 276 | "application/vnd.jupyter.widget-view+json": {
|
270 |
| - "model_id": "e5a6ec13d42f41109d029aced33475ff", |
| 277 | + "model_id": "e8e93d5dc404481ea7a248a7ce08d08d", |
271 | 278 | "version_major": 2,
|
272 | 279 | "version_minor": 0
|
273 | 280 | },
|
|
277 | 284 | },
|
278 | 285 | "metadata": {},
|
279 | 286 | "output_type": "display_data"
|
| 287 | + }, |
| 288 | + { |
| 289 | + "data": { |
| 290 | + "text/html": [ |
| 291 | + "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n" |
| 292 | + ], |
| 293 | + "text/plain": [] |
| 294 | + }, |
| 295 | + "metadata": {}, |
| 296 | + "output_type": "display_data" |
280 | 297 | }
|
281 | 298 | ],
|
282 | 299 | "source": [
|
|
341 | 358 | },
|
342 | 359 | {
|
343 | 360 | "cell_type": "code",
|
344 |
| - "execution_count": null, |
| 361 | + "execution_count": 6, |
345 | 362 | "id": "626fef57",
|
346 | 363 | "metadata": {},
|
347 |
| - "outputs": [], |
| 364 | + "outputs": [ |
| 365 | + { |
| 366 | + "data": { |
| 367 | + "application/vnd.jupyter.widget-view+json": { |
| 368 | + "model_id": "c4b32db22ceb4213aa2e7023e93756e7", |
| 369 | + "version_major": 2, |
| 370 | + "version_minor": 0 |
| 371 | + }, |
| 372 | + "text/plain": [ |
| 373 | + "Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')" |
| 374 | + ] |
| 375 | + }, |
| 376 | + "execution_count": 6, |
| 377 | + "metadata": {}, |
| 378 | + "output_type": "execute_result" |
| 379 | + } |
| 380 | + ], |
348 | 381 | "source": [
|
349 | 382 | "from notebook_utils import device_widget\n",
|
350 | 383 | "\n",
|
|
355 | 388 | },
|
356 | 389 | {
|
357 | 390 | "cell_type": "code",
|
358 |
| - "execution_count": null, |
| 391 | + "execution_count": 7, |
359 | 392 | "id": "e7af404b",
|
360 | 393 | "metadata": {},
|
361 | 394 | "outputs": [],
|
|
381 | 414 | },
|
382 | 415 | {
|
383 | 416 | "cell_type": "code",
|
384 |
| - "execution_count": 6, |
| 417 | + "execution_count": 8, |
385 | 418 | "id": "e56db20f-7cf0-4ead-b6af-8e048e61b059",
|
386 | 419 | "metadata": {},
|
387 | 420 | "outputs": [],
|
|
428 | 461 | },
|
429 | 462 | {
|
430 | 463 | "cell_type": "code",
|
431 |
| - "execution_count": 7, |
| 464 | + "execution_count": 9, |
432 | 465 | "id": "7522d730-f039-46e6-b06d-f90bd4c76f7a",
|
433 | 466 | "metadata": {
|
434 | 467 | "tags": []
|
|
450 | 483 | "<PIL.Image.Image image mode=RGB size=1000x667>"
|
451 | 484 | ]
|
452 | 485 | },
|
453 |
| - "execution_count": 7, |
| 486 | + "execution_count": 9, |
454 | 487 | "metadata": {},
|
455 | 488 | "output_type": "execute_result"
|
456 | 489 | }
|
|
466 | 499 | },
|
467 | 500 | {
|
468 | 501 | "cell_type": "code",
|
469 |
| - "execution_count": 8, |
| 502 | + "execution_count": 10, |
470 | 503 | "id": "b31e4cc5-42b3-4795-b04b-9a653228b6a4",
|
471 | 504 | "metadata": {
|
472 | 505 | "tags": []
|
|
476 | 509 | "name": "stdout",
|
477 | 510 | "output_type": "stream",
|
478 | 511 | "text": [
|
479 |
| - "The unusual aspect of this image is the cat's relaxed and vulnerable position. Typically, cats avoid exposing their bellies, which are sensitive and vulnerable areas, to potential threats. In this image, the cat is lying on its back in a cardboard box, exposing its belly and hindquarters, which is not a common sight. This behavior could indicate that the cat feels safe and comfortable in its environment, suggesting a strong bond with its owner and a sense of security in its home." |
| 512 | + "The unusual aspect of this image is that a cat is lying on its back inside a cardboard box, appearing relaxed and content. Cats are known for their curiosity and love for confined spaces, but seeing one in such a relaxed position with its belly exposed in a box is a charming and uncommon sight." |
480 | 513 | ]
|
481 | 514 | }
|
482 | 515 | ],
|
|
0 commit comments