Skip to content

Conversation

inkcherry
Copy link
Contributor

@inkcherry inkcherry commented Sep 3, 2025

Motivation

Adjustable max_token_recv_per_rank, allowing for reduced memory overhead in some balancing scenarios.
FYI @zhenhuang12

@TianDi101 TianDi101 requested a review from isytwu September 3, 2025 08:49
@TianDi101
Copy link
Collaborator

@inkcherry Thanks for this PR! @isytwu Could you please help review this? The idea is very similar to what we have discussed before to reduce memory usage.


inline __host__ __device__ int MaxNumTokensToRecv() const {
if (numWorstToken != 0) {
return numWorstToken;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if return worldSize * numWorstToken will be better?

Copy link
Contributor

@isytwu isytwu Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps using min(numWorstToken, worldSize * MaxNumTokensToRecvPerRank()) could prevent users from passing large values. And should MaxNumTokensToSend() also be changed to add numWorstToken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants