-
Notifications
You must be signed in to change notification settings - Fork 486
Open
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Currently, we use on many places these annotations for data
/ user_data
:
data: list[dict[str, Any]] | dict[str, Any]
data: dict[str, Any]
This works, but it isn't precise - we only accept JSON-serializable types.
We've got this recursive alias:
J = TypeVar('J', bound='JsonSerializable')
JsonSerializable: TypeAlias = Union[
list[J],
dict[str, J],
str,
bool,
int,
float,
None,
]
But if we use it for these variables:
data: list[dict[str, JsonSerializable]] | dict[str, JsonSerializable]
data: dict[str, JsonSerializable]
We run into variance-related errors, like this:
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: error: Argument 1 to "__call__" of "PushDataFunction" has incompatible type "dict[str, str]"; expected "Union[list[dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], dict[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]" [arg-type]
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py:450: note: Consider using "Mapping" instead, which is covariant in the value type
If we follow the suggestions, and use the Mapping
and Sequence
:
data: Sequence[Mapping[str, JsonSerializable]] | Mapping[str, JsonSerializable]
We end up with even more errors on the usage side, e.g.
item = {'key': 'value', 'number': 42}
await dataset_client.push_data(item)
Error (dict[str, object]
vs. Mapping[str, JsonSerializable]
)
Argument 1 to "push_data" of "MemoryDatasetClient" has incompatible type "dict[str, object]"; expected "Union[Sequence[Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]], Mapping[str, Union[list[Any], dict[str, Any], str, bool, int, float, None]]]" Mypy[arg-type](https://mypy.readthedocs.io/en/latest/_refs.html#code-arg-type)
Is using the JsonSerializable
alias in this context the right choice? Should we adopt something different? How? The goal is to get precise JSON-serializable typing, avoid variance errors, and usage side errors.
Metadata
Metadata
Assignees
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.