Skip to content

Commit b8278c1

Browse files
authored
Merge pull request #48 from Sherlock113/docs/add-newsletter
docs: Add newsletter sections
2 parents 0ca3e40 + 0113f29 commit b8278c1

File tree

7 files changed

+33
-7
lines changed

7 files changed

+33
-7
lines changed

docs/getting-started/index.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ sidebar_custom_props:
44
icon: /img/cpu.svg
55
---
66

7+
import Newsletter from '@site/src/components/Newsletter';
8+
79
# Getting started
810

911
Before you can run an LLM in production, you first need to make a few key decisions. These early choices will shape your infrastructure needs, costs, and how well the model performs for your use case.
@@ -12,4 +14,6 @@ Before you can run an LLM in production, you first need to make a few key decisi
1214
import DocCardList from '@theme/DocCardList';
1315
1416
<DocCardList />
15-
```
17+
```
18+
19+
<Newsletter />

docs/inference-optimization/index.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ sidebar_custom_props:
44
icon: /img/speed.svg
55
---
66

7+
import Newsletter from '@site/src/components/Newsletter';
8+
79
# Inference optimization
810

911
Running an LLM is just the starting point. Making it fast, efficient, and scalable is where inference optimization comes into play. Whether you're building a chatbot, an agent, or any LLM-powered tool, inference performance directly impacts both user experience and operational cost.
@@ -14,4 +16,6 @@ If you're using a serverless endpoint (e.g., OpenAI API), much of this work is a
1416
import DocCardList from '@theme/DocCardList';
1517
1618
<DocCardList />
17-
```
19+
```
20+
21+
<Newsletter />

docs/inference-optimization/llm-inference-metrics.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ keywords:
1111

1212
import LinkList from '@site/src/components/LinkList';
1313
import Button from '@site/src/components/Button';
14+
import Newsletter from '@site/src/components/Newsletter';
1415

1516
# Key metrics for LLM inference
1617

@@ -176,4 +177,6 @@ Using a serverless API can abstract away these optimizations, leaving you with l
176177
* [Mastering LLM Techniques: Inference Optimization](https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/)
177178
* [LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators](https://arxiv.org/pdf/2411.00136)
178179
* [Throughput is Not All You Need](https://hao-ai-lab.github.io/blogs/distserve/)
179-
</LinkList>
180+
</LinkList>
181+
182+
<Newsletter />

docs/infrastructure-and-operations/index.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ sidebar_custom_props:
44
icon: /img/setting.svg
55
---
66

7+
import Newsletter from '@site/src/components/Newsletter';
8+
79
# Infrastructure and operations
810

911
LLMs don't run in isolation. They need robust infrastructure behind them, from high-performance GPUs to deployment automation and comprehensive observability. A strong model and solid inference optimization determine how well your application performs. But it’s your infrastructure platform and inference operation practices that determine how far you can scale and how reliably you can grow.
@@ -12,4 +14,6 @@ LLMs don't run in isolation. They need robust infrastructure behind them, from h
1214
import DocCardList from '@theme/DocCardList';
1315
1416
<DocCardList />
15-
```
17+
```
18+
19+
<Newsletter />

docs/introduction.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ keywords:
1212
---
1313

1414
import Features from '@site/src/components/Features';
15+
import Newsletter from '@site/src/components/Newsletter';
1516

1617
# LLM Inference Handbook
1718

@@ -44,4 +45,6 @@ You can read it start-to-finish or treat it like a lookup table. There’s no wr
4445

4546
## Contributing
4647

47-
We welcome contributions! If you spot an error, have suggestions for improvements, or want to add new topics, please open an issue or submit a pull request on our [GitHub repository](https://github.com/bentoml/llm-inference-handbook).
48+
We welcome contributions! If you spot an error, have suggestions for improvements, or want to add new topics, please open an issue or submit a pull request on our [GitHub repository](https://github.com/bentoml/llm-inference-handbook).
49+
50+
<Newsletter />

docs/llm-inference-basics/index.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar_custom_props:
55
collapsed: false
66
---
77

8+
import Newsletter from '@site/src/components/Newsletter';
9+
810
# LLM inference basics
911

1012
LLM inference is where models meet the real world. It powers everything from instant chat replies to code generation, and directly impacts latency, cost, and user experience. Understanding how inference works is the first step toward building smarter, faster, and more reliable AI applications.
@@ -13,4 +15,6 @@ LLM inference is where models meet the real world. It powers everything from ins
1315
import DocCardList from '@theme/DocCardList';
1416
1517
<DocCardList />
16-
```
18+
```
19+
20+
<Newsletter />

docs/llm-inference-basics/what-is-llm-inference.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ keywords:
77
- LLM inference, AI inference, inference layer
88
---
99

10+
import Newsletter from '@site/src/components/Newsletter';
11+
1012
# What is LLM inference?
1113

1214
LLM inference refers to using trained LLMs, such as GPT-4, Llama 4, and DeepSeek-V3, to generate meaningful outputs from user inputs, typically provided as natural language prompts. During inference, the model processes the prompt through its vast set of parameters to generate responses like text, code snippets, summaries, and translations.
@@ -69,4 +71,6 @@ Understanding LLM inference early gives you a clear edge. It helps you make smar
6971
- **If you're a technical leader**: Inference efficiency directly affects your bottom line. A poorly optimized setup can cost 10× more in GPU hours while delivering worse performance. Understanding inference helps you evaluate vendors, make build-vs-buy decisions, and set realistic performance goals for your team.
7072
- **If you're just curious about AI**: Inference is where the magic happens. Knowing how it works helps you separate AI hype from reality and makes you a more informed consumer and contributor to AI discussions.
7173

72-
For more information, see [serverless vs. self-hosted LLM inference](./serverless-vs-self-hosted-llm-inference).
74+
For more information, see [serverless vs. self-hosted LLM inference](./serverless-vs-self-hosted-llm-inference).
75+
76+
<Newsletter />

0 commit comments

Comments
 (0)