Make data frame selection return row numbers, not pandas index value #677

jcheng5 · 2023-08-11T23:09:09Z

Fixes #676

The code in js/dataframe/index.tsx uses the row number ([0, 1, 2, ...]) as the row.id and tag's data-key attribute. That's fine until it's time to send the selection back to Shiny, at which time we need to convert from the row number to the entry in df.index.

Actually, I suppose it's a question worth asking: what would you expect input.grid_selected_rows() to return: row numbers (to be used with df.iloc) or Pandas index values (to be used with df.loc)? I guess either is defensible? (cc @nealrichardson @machow @chendaniely)

Update 2023-08-15 13:32-700: Before this PR, the code was just wrong--using a row's row number to look up the keyToIndex map, which was actually not key -> index (where "index" is referring to the pandas index) but str(index) -> index. Hence the None being returned from the repro case in #676 (there were row numbers that did not exist in the pandas index). In other words, the only way the reported selection would be correct is if the pandas index is the set of integers 0..n (which, to be fair, is probably the case with most pandas data frames, as it's the default).

jcheng5 · 2023-08-11T23:19:54Z

@andrie would love your opinion on 👆, it looked like your reprex had df["col"][input.grid_selected_rows()] which implies row numbers I think.

jcheng5 · 2023-08-11T23:23:09Z

Actually maybe using Pandas index values is not that realistic. They can theoretically be made up of almost any kind of Python object, can't they? I'm not sure we can successfully round-trip e.g. Python datetime values through JSON, and if they come back wrong, they won't be usable with df.loc.

andrie · 2023-08-14T14:13:10Z

@andrie would love your opinion on 👆, it looked like your reprex had df["col"][input.grid_selected_rows()] which implies row numbers I think.

I don't have a strong opinion on this. I adapted the documented example at https://shinylive.io/py/examples/#interactively-excluding-data and tried to make minimal modifications:

    @reactive.Calc
    def filtered_df():
        selected_idx = list(req(input.summary_data_selected_rows()))
        countries = summary_df["country"][selected_idx]
        # Filter data for selected countries
        return df[df["country"].isin(countries)]

jcheng5 · 2023-08-14T16:55:25Z

After sleeping on it, I think row numbers are the way to go. They're also less pandas-specific, if we want more native support of, I don't know, DuckDB or Ibis in the future.

wch

Looks good, but please see my comment about CoW operations when working with Pandas data frames.

shiny/render/_dataframe.py

Instead of returning Pandas index values for selected rows, use 0-indexed row numbers instead. This is more likely to work with non-Pandas data frame libraries in the future, and also saves us from needing to try to serialize/deserialize arbitrary index types.

Also, don't use df[col][rownum] in examples--that syntax is still using the pandas index. Instead, an explicit .iloc[...] is needed.

Dropping the index saves bandwidth and prevents an error if the index has non-serializable values in it.

jcheng5 · 2023-08-16T22:12:52Z

Rebased against master--had to do this interactively because the e2e folder changed to tests/e2e on main since this PR started.

* main: feat(Session): Make Session on_flush() and on_flushed() accept async functions (#693) Make data frame selection return row numbers, not pandas index value (#677) chore(api)!: Rename `ui.navset_pill_card` -> `ui.navset_card_pill` and `ui.navset_tab_card` -> `ui.navset_card_tab` (#681) Consolidate all testing into `tests/` folder (#683) Fix pyright error (#678) Make model score app work on Connect/Shinyapps.io (#657) Suppress type check for read_csv Synchonize input_file examples More realistic file import example (#582) Make flaky dataframe test have larger timeout (#675) Wrap bare value box value in `p` tag (#668)

jcheng5 force-pushed the row-selection-bug branch from c990c62 to 9f9f0ee Compare August 11, 2023 23:10

jcheng5 force-pushed the row-selection-bug branch 2 times, most recently from 8c8d6af to 34d3dd3 Compare August 15, 2023 19:07

jcheng5 changed the title ~~Make data frame output work correctly with Pandas index~~ Clarify that render.data_frame selection values are row numbers, not Pandas index Aug 15, 2023

jcheng5 changed the title ~~Clarify that render.data_frame selection values are row numbers, not Pandas index~~ Make data frame selection return row numbers, not pandas index value Aug 15, 2023

jcheng5 requested a review from wch August 16, 2023 00:12

wch approved these changes Aug 16, 2023

View reviewed changes

shiny/render/_dataframe.py Show resolved Hide resolved

jcheng5 added 8 commits August 16, 2023 15:09

Make data frame output work correctly with Pandas index

f15148e

Update changelog

edfec27

Attempt to make test less flaky

dd7b646

Some output id parameters were documented as input id.

8ecb47b

Update changelog entry to be more accurate

4495366

Sort row selection keys

1d8bb33

Also, don't use df[col][rownum] in examples--that syntax is still using the pandas index. Instead, an explicit .iloc[...] is needed.

Don't bother serializing pandas index

2b24db3

Dropping the index saves bandwidth and prevents an error if the index has non-serializable values in it.

jcheng5 force-pushed the row-selection-bug branch from f93ffa1 to 2b24db3 Compare August 16, 2023 22:10

icarusz assigned jcheng5 Aug 21, 2023

Merge branch 'main' into row-selection-bug

64514f3

jcheng5 merged commit 915b4c8 into main Aug 21, 2023

jcheng5 deleted the row-selection-bug branch August 21, 2023 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make data frame selection return row numbers, not pandas index value #677

Make data frame selection return row numbers, not pandas index value #677

Uh oh!

jcheng5 commented Aug 11, 2023 •

edited

Loading

Uh oh!

jcheng5 commented Aug 11, 2023 •

edited

Loading

Uh oh!

jcheng5 commented Aug 11, 2023

Uh oh!

andrie commented Aug 14, 2023

Uh oh!

jcheng5 commented Aug 14, 2023

Uh oh!

wch left a comment

Uh oh!

Uh oh!

jcheng5 commented Aug 16, 2023

Uh oh!

Uh oh!

Make data frame selection return row numbers, not pandas index value #677

Make data frame selection return row numbers, not pandas index value #677

Uh oh!

Conversation

jcheng5 commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcheng5 commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcheng5 commented Aug 11, 2023

Uh oh!

andrie commented Aug 14, 2023

Uh oh!

jcheng5 commented Aug 14, 2023

Uh oh!

wch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcheng5 commented Aug 16, 2023

Uh oh!

Uh oh!

jcheng5 commented Aug 11, 2023 •

edited

Loading

jcheng5 commented Aug 11, 2023 •

edited

Loading