Skip to content

Conversation

jcheng5
Copy link
Collaborator

@jcheng5 jcheng5 commented Aug 11, 2023

Fixes #676

The code in js/dataframe/index.tsx uses the row number ([0, 1, 2, ...]) as the row.id and tag's data-key attribute. That's fine until it's time to send the selection back to Shiny, at which time we need to convert from the row number to the entry in df.index.

Actually, I suppose it's a question worth asking: what would you expect input.grid_selected_rows() to return: row numbers (to be used with df.iloc) or Pandas index values (to be used with df.loc)? I guess either is defensible? (cc @nealrichardson @machow @chendaniely)


Update 2023-08-15 13:32-700: Before this PR, the code was just wrong--using a row's row number to look up the keyToIndex map, which was actually not key -> index (where "index" is referring to the pandas index) but str(index) -> index. Hence the None being returned from the repro case in #676 (there were row numbers that did not exist in the pandas index). In other words, the only way the reported selection would be correct is if the pandas index is the set of integers 0..n (which, to be fair, is probably the case with most pandas data frames, as it's the default).

@jcheng5 jcheng5 force-pushed the row-selection-bug branch from c990c62 to 9f9f0ee Compare August 11, 2023 23:10
@jcheng5
Copy link
Collaborator Author

jcheng5 commented Aug 11, 2023

@andrie would love your opinion on 👆, it looked like your reprex had df["col"][input.grid_selected_rows()] which implies row numbers I think.

@jcheng5
Copy link
Collaborator Author

jcheng5 commented Aug 11, 2023

Actually maybe using Pandas index values is not that realistic. They can theoretically be made up of almost any kind of Python object, can't they? I'm not sure we can successfully round-trip e.g. Python datetime values through JSON, and if they come back wrong, they won't be usable with df.loc.

@andrie
Copy link

andrie commented Aug 14, 2023

@andrie would love your opinion on 👆, it looked like your reprex had df["col"][input.grid_selected_rows()] which implies row numbers I think.

I don't have a strong opinion on this. I adapted the documented example at https://shinylive.io/py/examples/#interactively-excluding-data and tried to make minimal modifications:

    @reactive.Calc
    def filtered_df():
        selected_idx = list(req(input.summary_data_selected_rows()))
        countries = summary_df["country"][selected_idx]
        # Filter data for selected countries
        return df[df["country"].isin(countries)]

@jcheng5
Copy link
Collaborator Author

jcheng5 commented Aug 14, 2023

After sleeping on it, I think row numbers are the way to go. They're also less pandas-specific, if we want more native support of, I don't know, DuckDB or Ibis in the future.

@jcheng5 jcheng5 force-pushed the row-selection-bug branch 2 times, most recently from 8c8d6af to 34d3dd3 Compare August 15, 2023 19:07
@jcheng5 jcheng5 changed the title Make data frame output work correctly with Pandas index Clarify that render.data_frame selection values are row numbers, not Pandas index Aug 15, 2023
@jcheng5 jcheng5 changed the title Clarify that render.data_frame selection values are row numbers, not Pandas index Make data frame selection return row numbers, not pandas index value Aug 15, 2023
@jcheng5 jcheng5 requested a review from wch August 16, 2023 00:12
Copy link
Collaborator

@wch wch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but please see my comment about CoW operations when working with Pandas data frames.

Instead of returning Pandas index values for selected rows, use 0-indexed row
numbers instead. This is more likely to work with non-Pandas data frame libraries
in the future, and also saves us from needing to try to serialize/deserialize
arbitrary index types.
Also, don't use df[col][rownum] in examples--that syntax
is still using the pandas index. Instead, an explicit
.iloc[...] is needed.
Dropping the index saves bandwidth and prevents an
error if the index has non-serializable values in it.
@jcheng5 jcheng5 force-pushed the row-selection-bug branch from f93ffa1 to 2b24db3 Compare August 16, 2023 22:10
@jcheng5
Copy link
Collaborator Author

jcheng5 commented Aug 16, 2023

Rebased against master--had to do this interactively because the e2e folder changed to tests/e2e on main since this PR started.

@jcheng5 jcheng5 merged commit 915b4c8 into main Aug 21, 2023
@jcheng5 jcheng5 deleted the row-selection-bug branch August 21, 2023 17:16
schloerke added a commit that referenced this pull request Aug 22, 2023
* main:
  feat(Session): Make Session on_flush() and on_flushed() accept async functions (#693)
  Make data frame selection return row numbers, not pandas index value (#677)
  chore(api)!: Rename `ui.navset_pill_card` -> `ui.navset_card_pill` and `ui.navset_tab_card` -> `ui.navset_card_tab` (#681)
  Consolidate all testing into `tests/` folder (#683)
  Fix pyright error (#678)
  Make model score app work on Connect/Shinyapps.io (#657)
  Suppress type check for read_csv
  Synchonize input_file examples
  More realistic file import example (#582)
  Make flaky dataframe test have larger timeout (#675)
  Wrap bare value box value in `p` tag (#668)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data frame selected rows intermittently returning None Type
3 participants