-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
A-planArea: logical plan and intermediate representationArea: logical plan and intermediate representationP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
from datetime import datetime
print("polars", pl.__version__)
df = pl.DataFrame(
{
"id_store": [1],
"id_product_group": [10],
"target_date": [datetime(2025, 1, 1)],
"id_product": [2],
"prediction": [1.0],
"price": [5.0],
}
)
lf = (
df.lazy()
# Force empty with a filter that removes everything
.filter(pl.lit(False))
.sort(
"id_store", "id_product_group", "target_date",
(pl.col("prediction") * pl.col("price")),
descending=[False, False, False, True],
)
.with_columns(
id_products = pl.col("id_product")
.cumulative_eval(pl.element().explode().sort().implode())
.over("id_store", "id_product_group", "target_date"),
)
.with_columns(
n_products = pl.col("id_products").list.len().cast(pl.Int16),
id_products = pl.col("id_products").cast(pl.List(pl.Int64)),
)
)
print("collect_schema:", lf.collect_schema()["id_products"]) # planner's view
out = lf.collect()
print("post_collect schema:", out.collect_schema()["id_products"]) # check if dtype flipped (e.g., List[List[Int64]])
if lf.collect_schema()["id_products"] != out.collect_schema()["id_products"]:
raise ValueError("Schema changed after collect!")
Log output
Issue description
Hi guys, not sure if this is a bug as this only occurs for me if the dataframe becomes empty. I'm currently in the process of migrating from polars 1.12.0 and while this code worked fine in 1.12.0 it fails with the newer polars versions, so I thought this is worth an issue.
I wrote some tests to capture events there my dataframe becomes empty and these tests are failing due to an schema mismatch in later joins. Somehow the collect_schema()
returns the correct and intended schema while the schema after collecting the empty dataframe becomes a List[List[i64]]
.
Expected behavior
schema post collecting is the same as before collecting.
Installed versions
--------Version info---------
Polars: 1.33.1
Index type: UInt32
Platform: macOS-15.6.1-arm64-arm-64bit
Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
LTS CPU: False
----Optional dependencies----
Azure CLI <not installed>
adbc_driver_manager <not installed>
altair <not installed>
azure.identity <not installed>
boto3 1.40.18
cloudpickle <not installed>
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec 2025.9.0
gevent <not installed>
google.auth <not installed>
great_tables <not installed>
matplotlib 3.10.6
numpy 1.26.4
openpyxl <not installed>
pandas 2.1.4
polars_cloud <not installed>
pyarrow 15.0.0
pydantic 2.11.9
pyiceberg <not installed>
sqlalchemy 2.0.43
torch <not installed>
xlsx2csv <not installed>
xlsxwriter <not installed>```
</details>
Metadata
Metadata
Assignees
Labels
A-planArea: logical plan and intermediate representationArea: logical plan and intermediate representationP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars