Skip to content

Commit 6940af4

Browse files
authored
docs: Fix code sample on home page (#913)
### Description - Fix code sample on home page and move it to a separate Python file. - Fix code sample in `docs/introduction/07_saving_data.mdx` and move it to a separate Python file. - Rm old pydoc-markdown scripts (they live in a [separate repository](https://github.com/apify/docusaurus-plugin-typedoc-api/tree/master/packages/plugin/python-scripts/docspec-gen) now). ### Issues - Closes: #912 ### Testing - Docs were rendered locally. ### Checklist - [x] CI passed
1 parent 48397bd commit 6940af4

File tree

10 files changed

+91
-317
lines changed

10 files changed

+91
-317
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.PHONY: clean install-dev build publish-to-pypi lint type-check unit-tests unit-tests-cov integration-tests format check-code build-api-reference run-docs
22

3-
DIRS_WITH_CODE = src tests docs
3+
DIRS_WITH_CODE = src tests docs website
44

55
# This is default for local testing, but GitHub workflows override it to a higher value in CI
66
INTEGRATION_TESTS_CONCURRENCY = 1

docs/introduction/07_saving_data.mdx

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,22 @@ import ApiLink from '@site/src/components/ApiLink';
77
import CodeBlock from '@theme/CodeBlock';
88

99
import FinalCodeExample from '!!raw-loader!./code/07_final_code.py';
10+
import FirstCodeExample from '!!raw-loader!./code/07_first_code.py';
1011

1112
A data extraction job would not be complete without saving the data for later use and processing. You've come to the final and most difficult part of this tutorial so make sure to pay attention very carefully!
1213

1314
## Save data to the dataset
1415

15-
Crawlee provides a <ApiLink to="class/Dataset">`Dataset`</ApiLink> class, which acts as an abstraction over tabular storage, making it useful for storing scraping results. First, add the following import to the top of your file:
16+
Crawlee provides a <ApiLink to="class/Dataset">`Dataset`</ApiLink> class, which acts as an abstraction over tabular storage, making it useful for storing scraping results. To get started:
1617

17-
```python
18-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
19-
from crawlee.storages.dataset import Dataset
20-
21-
# ...
22-
```
18+
- Add the necessary imports: Include the <ApiLink to="class/Dataset">`Dataset`</ApiLink> and any required crawler classes at the top of your file.
19+
- Create a Dataset instance: Use the asynchronous <ApiLink to="class/Dataset#open">`Dataset.open`</ApiLink> constructor to initialize the dataset instance within your crawler's setup.
2320

24-
Next, under the section where you create an instance of your crawler, create an instance of the dataset using the asynchronous constructor <ApiLink to="class/Dataset#open">`Dataset.open`</ApiLink>:
21+
Here's an example:
2522

26-
```python
27-
# ...
28-
29-
async def main() -> None:
30-
crawler = PlaywrightCrawler()
31-
dataset = await Dataset.open()
32-
33-
# ...
34-
```
23+
<CodeBlock className="language-python">
24+
{FirstCodeExample}
25+
</CodeBlock>
3526

3627
Finally, instead of logging the extracted data to stdout, we can export them to the dataset:
3728

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
2+
from crawlee.storages import Dataset
3+
4+
# ...
5+
6+
7+
async def main() -> None:
8+
crawler = PlaywrightCrawler()
9+
dataset = await Dataset.open()
10+
11+
# ...
12+
13+
@crawler.router.default_handler
14+
async def request_handler(context: PlaywrightCrawlingContext) -> None:
15+
...
16+
# ...

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ indent-style = "space"
153153
"T20", # flake8-print
154154
"TRY301", # Abstract `raise` to an inner function
155155
]
156-
"**/{docs}/**" = [
156+
"**/{docs,website}/**" = [
157157
"D", # Everything from the pydocstyle
158158
"INP001", # File {filename} is part of an implicit namespace package, add an __init__.py
159159
"F841", # Local variable {variable} is assigned to but never used
@@ -192,7 +192,7 @@ timeout = 1200
192192
python_version = "3.9"
193193
plugins = ["pydantic.mypy"]
194194
exclude = ["project_template"]
195-
files = ["src", "tests"]
195+
files = ["src", "tests", "docs", "website"]
196196
check_untyped_defs = true
197197
disallow_incomplete_defs = true
198198
disallow_untyped_calls = true
@@ -215,7 +215,7 @@ ignore_missing_imports = true
215215
[tool.basedpyright]
216216
pythonVersion = "3.9"
217217
typeCheckingMode = "standard"
218-
include = ["src", "tests", "docs"]
218+
include = ["src", "tests", "docs", "website"]
219219

220220
[tool.coverage.report]
221221
exclude_lines = [

website/generate_module_shortcuts.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,26 @@
11
#!/usr/bin/env python3
22

3+
from __future__ import annotations
4+
35
import importlib
46
import inspect
57
import json
8+
from typing import TYPE_CHECKING
9+
10+
if TYPE_CHECKING:
11+
from types import ModuleType
612

713

8-
def get_module_shortcuts(module, parent_classes=None):
9-
"""Traverse a module and its submodules, and if some class is present in both a module and its submodule, register a shortcut."""
14+
def get_module_shortcuts(module: ModuleType, parent_classes: list | None = None) -> dict:
15+
"""Traverse a module and its submodules to identify and register shortcuts for classes."""
1016
shortcuts = {}
1117

1218
if parent_classes is None:
1319
parent_classes = []
20+
1421
parent_module_name = '.'.join(module.__name__.split('.')[:-1])
1522
module_classes = []
23+
1624
for classname, cls in inspect.getmembers(module, inspect.isclass):
1725
module_classes.append(cls)
1826
if cls in parent_classes:
@@ -25,16 +33,15 @@ def get_module_shortcuts(module, parent_classes=None):
2533
return shortcuts
2634

2735

28-
def resolve_shortcuts(shortcuts):
36+
def resolve_shortcuts(shortcuts: dict) -> None:
2937
"""Resolve linked shortcuts.
3038
31-
For example, if there are shortcuts A -> B and B -> C,
32-
resolve them to A -> C.
39+
For example, if there are shortcuts A -> B and B -> C, resolve them to A -> C.
3340
"""
3441
for source, target in shortcuts.items():
3542
while target in shortcuts:
3643
shortcuts[source] = shortcuts[target]
37-
target = shortcuts[target]
44+
target = shortcuts[target] # noqa: PLW2901
3845

3946

4047
shortcuts = {}
@@ -43,7 +50,7 @@ def resolve_shortcuts(shortcuts):
4350
module = importlib.import_module(module_name)
4451
module_shortcuts = get_module_shortcuts(module)
4552
shortcuts.update(module_shortcuts)
46-
except ModuleNotFoundError:
53+
except ModuleNotFoundError: # noqa: PERF203
4754
pass
4855

4956
resolve_shortcuts(shortcuts)

website/pydoc-markdown/__init__.py

Whitespace-only changes.

website/pydoc-markdown/generate_ast.py

Lines changed: 0 additions & 53 deletions
This file was deleted.

website/pydoc-markdown/google_docstring_processor.py

Lines changed: 0 additions & 185 deletions
This file was deleted.

0 commit comments

Comments
 (0)