Skip to content

Regression in cumulative_eval(...explode().sort().implode()) on empty inputs: List[i64] becomes List[List[i64]] #24635

@nejox

Description

@nejox

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from datetime import datetime

print("polars", pl.__version__)

df = pl.DataFrame(
    {
        "id_store": [1],
        "id_product_group": [10],
        "target_date": [datetime(2025, 1, 1)],
        "id_product": [2],
        "prediction": [1.0],
        "price": [5.0],
    }
)

lf = (
    df.lazy()
    # Force empty with a filter that removes everything
    .filter(pl.lit(False))
    .sort(
        "id_store", "id_product_group", "target_date",
        (pl.col("prediction") * pl.col("price")),
        descending=[False, False, False, True],
    )
    .with_columns(
        id_products = pl.col("id_product")
            .cumulative_eval(pl.element().explode().sort().implode())
            .over("id_store", "id_product_group", "target_date"),
    )
    .with_columns(
        n_products = pl.col("id_products").list.len().cast(pl.Int16),
        id_products = pl.col("id_products").cast(pl.List(pl.Int64)),
    )
)

print("collect_schema:", lf.collect_schema()["id_products"])  # planner's view
out = lf.collect()
print("post_collect schema:", out.collect_schema()["id_products"])  # check if dtype flipped (e.g., List[List[Int64]])

if lf.collect_schema()["id_products"] != out.collect_schema()["id_products"]:
    raise ValueError("Schema changed after collect!")

Log output

Issue description

Hi guys, not sure if this is a bug as this only occurs for me if the dataframe becomes empty. I'm currently in the process of migrating from polars 1.12.0 and while this code worked fine in 1.12.0 it fails with the newer polars versions, so I thought this is worth an issue.

I wrote some tests to capture events there my dataframe becomes empty and these tests are failing due to an schema mismatch in later joins. Somehow the collect_schema() returns the correct and intended schema while the schema after collecting the empty dataframe becomes a List[List[i64]].

Expected behavior

schema post collecting is the same as before collecting.

Installed versions

--------Version info---------
Polars:              1.33.1
Index type:          UInt32
Platform:            macOS-15.6.1-arm64-arm-64bit
Python:              3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.40.18
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2025.9.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.10.6
numpy                1.26.4
openpyxl             <not installed>
pandas               2.1.4
polars_cloud         <not installed>
pyarrow              15.0.0
pydantic             2.11.9
pyiceberg            <not installed>
sqlalchemy           2.0.43
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>```

</details>

Metadata

Metadata

Assignees

Labels

A-planArea: logical plan and intermediate representationP-mediumPriority: mediumacceptedReady for implementationbugSomething isn't workingpythonRelated to Python Polars

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions