Skip to content

Conversation

Eason0729
Copy link
Contributor

Changed

under #[cfg(feature="std")], stacker::maybe_grow was called in RecursionCounter::try_decrease to prevent stack overflow

I also tested on Windows(stack size 1MB), it seems fine.

reproducible code from original issue

use datafusion::prelude::{ParquetReadOptions, SessionContext};

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ctx = SessionContext::new();
    ctx.register_parquet(
        "parquet_table",
        "batches.parquet",
        ParquetReadOptions::default(),
    )
    .await?;

    let sql_query = "
    SELECT seq
    FROM (
      SELECT seq,
        LEAD(seq) OVER (ORDER BY seq) AS next_seq
        FROM parquet_table
    ) AS subquery
    WHERE next_seq - seq > 1";

    let df = ctx.sql(sql_query).await?;

    df.show().await?;
    Ok(())
}

Missing

  • Return error when growing stack fail:
    growing stack on unsupported target seems to be no-op on stacker side, so I am unable to detect it.
  • Test: which part should be tested?
  • Observe performance impact

@Eason0729
Copy link
Contributor Author

I evaluated stacker as part of sqlparser and I thought it was doing some too crazy stuff that made it hard to use in embedded / wasm environments. Maybe that is better now

Originally posted by @alamb in apache/datafusion#9375 (comment)

I think the use of stacker here is fine under #[cfg(feature="std")], and I totally forget to add stacker as optional dependency 😄 .

@Eason0729 Eason0729 marked this pull request as ready for review October 13, 2024 06:41
@Eason0729
Copy link
Contributor Author

I have add stacker as optional dependency, so it won't be used in no std environment.

*It will continue to stack overflow in this case.

What's your opinion on the use of stacker? @alamb

@alamb
Copy link
Contributor

alamb commented Oct 15, 2024

I have add stacker as optional dependency, so it won't be used in no std environment.

*It will continue to stack overflow in this case.

What's your opinion on the use of stacker? @alamb

I think stacker is a clever hack but we should fix the code for real to avoid deeply nested callstacks rather than hoping that a third-party library will bail us out.

I don't think it is a good idea to use stacker as I don't want to be responsible for trying to debug any issues caused by its use. That may sound selfish, but I am already at my cognitive complexity load for sqlparser (and various other crates)

@Eason0729
Copy link
Contributor Author

That's okay. (I think this PR as a draft to demonstrate what I would like to change

Then I would try to reduce stack usage with another PR.

@Eason0729
Copy link
Contributor Author

Can I mark this PR as draft again or just create a new one?

@iffyio
Copy link
Contributor

iffyio commented Oct 18, 2024

@Eason0729 thanks for looking to improve this behavior! I think should be good to create a new PR (we can link also this PR in the description since they're related)

@iffyio
Copy link
Contributor

iffyio commented Nov 6, 2024

Marking this as draft in the meantime

@iffyio iffyio mentioned this pull request Nov 16, 2024
1 task
@Eason0729 Eason0729 closed this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants