bentoml · Sherlock113 · Sep 28, 2025 · Sep 28, 2025
diff --git a/docs/getting-started/index.mdx b/docs/getting-started/index.mdx
@@ -4,6 +4,8 @@ sidebar_custom_props:
     icon: /img/cpu.svg
 ---
 
+import Newsletter from '@site/src/components/Newsletter';
+
 # Getting started
 
 Before you can run an LLM in production, you first need to make a few key decisions. These early choices will shape your infrastructure needs, costs, and how well the model performs for your use case.
@@ -12,4 +14,6 @@ Before you can run an LLM in production, you first need to make a few key decisi
 import DocCardList from '@theme/DocCardList';
 
 <DocCardList />
-```
+```
+
+<Newsletter />
diff --git a/docs/inference-optimization/index.mdx b/docs/inference-optimization/index.mdx
@@ -4,6 +4,8 @@ sidebar_custom_props:
     icon: /img/speed.svg
 ---
 
+import Newsletter from '@site/src/components/Newsletter';
+
 # Inference optimization
 
 Running an LLM is just the starting point. Making it fast, efficient, and scalable is where inference optimization comes into play. Whether you're building a chatbot, an agent, or any LLM-powered tool, inference performance directly impacts both user experience and operational cost. 
@@ -14,4 +16,6 @@ If you're using a serverless endpoint (e.g., OpenAI API), much of this work is a
 import DocCardList from '@theme/DocCardList';
 
 <DocCardList />
-```
+```
+
+<Newsletter />
diff --git a/docs/inference-optimization/llm-inference-metrics.md b/docs/inference-optimization/llm-inference-metrics.md
@@ -11,6 +11,7 @@ keywords:
 
 import LinkList from '@site/src/components/LinkList';
 import Button from '@site/src/components/Button';
+import Newsletter from '@site/src/components/Newsletter';
 
 # Key metrics for LLM inference
 
@@ -176,4 +177,6 @@ Using a serverless API can abstract away these optimizations, leaving you with l
   * [Mastering LLM Techniques: Inference Optimization](https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/)
   * [LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators](https://arxiv.org/pdf/2411.00136)
   * [Throughput is Not All You Need](https://hao-ai-lab.github.io/blogs/distserve/)
-</LinkList>
+</LinkList>
+
+<Newsletter />
diff --git a/docs/infrastructure-and-operations/index.mdx b/docs/infrastructure-and-operations/index.mdx
@@ -4,6 +4,8 @@ sidebar_custom_props:
     icon: /img/setting.svg
 ---
 
+import Newsletter from '@site/src/components/Newsletter';
+
 # Infrastructure and operations
 
 LLMs don't run in isolation. They need robust infrastructure behind them, from high-performance GPUs to deployment automation and comprehensive observability. A strong model and solid inference optimization determine how well your application performs. But it’s your infrastructure platform and inference operation practices that determine how far you can scale and how reliably you can grow.
@@ -12,4 +14,6 @@ LLMs don't run in isolation. They need robust infrastructure behind them, from h
 import DocCardList from '@theme/DocCardList';
 
 <DocCardList />
-```
+```
+
+<Newsletter />
diff --git a/docs/introduction.md b/docs/introduction.md
@@ -12,6 +12,7 @@ keywords:
 ---
 
 import Features from '@site/src/components/Features';
+import Newsletter from '@site/src/components/Newsletter';
 
 # LLM Inference Handbook
 
@@ -44,4 +45,6 @@ You can read it start-to-finish or treat it like a lookup table. There’s no wr
 
 ## Contributing
 
-We welcome contributions! If you spot an error, have suggestions for improvements, or want to add new topics, please open an issue or submit a pull request on our [GitHub repository](https://github.com/bentoml/llm-inference-handbook).
+We welcome contributions! If you spot an error, have suggestions for improvements, or want to add new topics, please open an issue or submit a pull request on our [GitHub repository](https://github.com/bentoml/llm-inference-handbook).
+
+<Newsletter />
diff --git a/docs/llm-inference-basics/index.mdx b/docs/llm-inference-basics/index.mdx
@@ -5,6 +5,8 @@ sidebar_custom_props:
     collapsed: false
 ---
 
+import Newsletter from '@site/src/components/Newsletter';
+
 # LLM inference basics
 
 LLM inference is where models meet the real world. It powers everything from instant chat replies to code generation, and directly impacts latency, cost, and user experience. Understanding how inference works is the first step toward building smarter, faster, and more reliable AI applications.
@@ -13,4 +15,6 @@ LLM inference is where models meet the real world. It powers everything from ins
 import DocCardList from '@theme/DocCardList';
 
 <DocCardList />
-```
+```
+
+<Newsletter />
diff --git a/docs/llm-inference-basics/what-is-llm-inference.md b/docs/llm-inference-basics/what-is-llm-inference.md
@@ -7,6 +7,8 @@ keywords:
     - LLM inference, AI inference, inference layer
 ---
 
+import Newsletter from '@site/src/components/Newsletter';
+
 # What is LLM inference?
 
 LLM inference refers to using trained LLMs, such as GPT-4, Llama 4, and DeepSeek-V3, to generate meaningful outputs from user inputs, typically provided as natural language prompts. During inference, the model processes the prompt through its vast set of parameters to generate responses like text, code snippets, summaries, and translations.
@@ -69,4 +71,6 @@ Understanding LLM inference early gives you a clear edge. It helps you make smar
 - **If you're a technical leader**: Inference efficiency directly affects your bottom line. A poorly optimized setup can cost 10× more in GPU hours while delivering worse performance. Understanding inference helps you evaluate vendors, make build-vs-buy decisions, and set realistic performance goals for your team.
 - **If you're just curious about AI**: Inference is where the magic happens. Knowing how it works helps you separate AI hype from reality and makes you a more informed consumer and contributor to AI discussions.
 
-For more information, see [serverless vs. self-hosted LLM inference](./serverless-vs-self-hosted-llm-inference).
+For more information, see [serverless vs. self-hosted LLM inference](./serverless-vs-self-hosted-llm-inference).
+
+<Newsletter />