-
Notifications
You must be signed in to change notification settings - Fork 114
Make data frame selection return row numbers, not pandas index value #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c990c62
to
9f9f0ee
Compare
@andrie would love your opinion on 👆, it looked like your reprex had |
Actually maybe using Pandas index values is not that realistic. They can theoretically be made up of almost any kind of Python object, can't they? I'm not sure we can successfully round-trip e.g. Python datetime values through JSON, and if they come back wrong, they won't be usable with |
I don't have a strong opinion on this. I adapted the documented example at https://shinylive.io/py/examples/#interactively-excluding-data and tried to make minimal modifications: @reactive.Calc
def filtered_df():
selected_idx = list(req(input.summary_data_selected_rows()))
countries = summary_df["country"][selected_idx]
# Filter data for selected countries
return df[df["country"].isin(countries)] |
After sleeping on it, I think row numbers are the way to go. They're also less pandas-specific, if we want more native support of, I don't know, DuckDB or Ibis in the future. |
8c8d6af
to
34d3dd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but please see my comment about CoW operations when working with Pandas data frames.
Instead of returning Pandas index values for selected rows, use 0-indexed row numbers instead. This is more likely to work with non-Pandas data frame libraries in the future, and also saves us from needing to try to serialize/deserialize arbitrary index types.
Also, don't use df[col][rownum] in examples--that syntax is still using the pandas index. Instead, an explicit .iloc[...] is needed.
Dropping the index saves bandwidth and prevents an error if the index has non-serializable values in it.
f93ffa1
to
2b24db3
Compare
Rebased against master--had to do this interactively because the |
* main: feat(Session): Make Session on_flush() and on_flushed() accept async functions (#693) Make data frame selection return row numbers, not pandas index value (#677) chore(api)!: Rename `ui.navset_pill_card` -> `ui.navset_card_pill` and `ui.navset_tab_card` -> `ui.navset_card_tab` (#681) Consolidate all testing into `tests/` folder (#683) Fix pyright error (#678) Make model score app work on Connect/Shinyapps.io (#657) Suppress type check for read_csv Synchonize input_file examples More realistic file import example (#582) Make flaky dataframe test have larger timeout (#675) Wrap bare value box value in `p` tag (#668)
Fixes #676
The code in js/dataframe/index.tsx uses the row number ([0, 1, 2, ...]) as the row.id and tag's data-key attribute. That's fine until it's time to send the selection back to Shiny, at which time we need to convert from the row number to the entry in
df.index
.Actually, I suppose it's a question worth asking: what would you expect
input.grid_selected_rows()
to return: row numbers (to be used withdf.iloc
) or Pandas index values (to be used withdf.loc
)? I guess either is defensible? (cc @nealrichardson @machow @chendaniely)Update 2023-08-15 13:32-700: Before this PR, the code was just wrong--using a row's row number to look up the
keyToIndex
map, which was actually notkey -> index
(where "index" is referring to the pandas index) butstr(index) -> index
. Hence theNone
being returned from the repro case in #676 (there were row numbers that did not exist in the pandas index). In other words, the only way the reported selection would be correct is if the pandas index is the set of integers0..n
(which, to be fair, is probably the case with most pandas data frames, as it's the default).