add minicpm4 document (#3082)

openvino-dev-samples · web-flow · commit cd26f71aa521 · 2025-09-18T11:46:48.000+02:00
diff --git a/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb b/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb
@@ -223,6 +223,7 @@
     "* **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)\n",
     "*  **minicpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).\n",
     "*  **minicpm3-4b** - MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct, being comparable with many recent 7B~9B models.Compared to previous generations, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. More details can be found in [model card](https://huggingface.co/openbmb/MiniCPM3-4B).\n",
+    "* **minicpm4-8b** - MiniCPM 4 is an extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.[model card](https://huggingface.co/openbmb/MiniCPM4-8B).\n",
     "* **llama-3.2-1B-instruct** - 1B parameters model from LLama3.2 collection of instruction-tuned multilingual models. Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. More details can be found in [model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",
     ">You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).\n",
diff --git a/notebooks/llm-chatbot/llm-chatbot.ipynb b/notebooks/llm-chatbot/llm-chatbot.ipynb
@@ -151,6 +151,7 @@
     "* **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)\n",
     "*  **minicpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).\n",
     "*  **minicpm3-4b** - MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct, being comparable with many recent 7B~9B models.Compared to previous generations, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. More details can be found in [model card](https://huggingface.co/openbmb/MiniCPM3-4B).\n",
+    "* **minicpm4-8b** - MiniCPM 4 is an extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.[model card](https://huggingface.co/openbmb/MiniCPM4-8B  ).\n",
     "*  **gemma-2b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 2B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-2b-it).\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",
     ">You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/google/gemma-2b-it), carefully read terms of usage and click accept button.  You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).\n",