Skip to content

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Aug 29, 2025

Description:

This Pull request replaces the HTTP API requests for Github source PR's, Issues and their comments with GraphQL requests.

Scan Results with no verification:

Note: While results may vary slightly between runs, GraphQL consistently outperforms REST in scan duration.

Screenshot from 2025-09-03 13-40-13

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 marked this pull request as ready for review September 1, 2025 13:36
@kashifkhan0771 kashifkhan0771 requested review from a team as code owners September 1, 2025 13:36
@kashifkhan0771 kashifkhan0771 marked this pull request as draft September 1, 2025 13:36
@kashifkhan0771 kashifkhan0771 self-assigned this Sep 1, 2025
@kashifkhan0771 kashifkhan0771 marked this pull request as ready for review September 3, 2025 11:03
@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner September 3, 2025 11:03
Copy link
Contributor

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially skeptical of putting the new code in graphql.go, but I think I see why you did it and I can't think of a better option.

However, I did have some trouble following the rate limiting code - in particular, how it interacts with the rest of the GitHub rate limiting code. Specifically, it's doing a lot of its own stuff before it begins to use the rest of the rate limiting code, and I don't understand why it needs to do that beforehand. (Also, mentioned inline, 5 minutes is a huge interval, and Prometheus gets skipped in many cases.)

Can you try another pass over the rate limiting code to see if things can be clarified a bit?

ctx.Logger().Info("GraphQL RATE_LIMITED error (fallback)",
"retry_after", retryAfter.String())
time.Sleep(retryAfter)
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return (and all of the later ones) happen before we report rate limiting to Prometheus, which means that we won't have much visibility when it happens. Relatedly, I find it a bit hard to follow this new rate limiting code - I suspect the difficulty was with integrating it with the existing rate limiting code. (For example: Why does this initial request have a 3x 5minute retry step before getting into the global rate limiting code?) You might be able to resolve both issues at once with another refactoring pass at this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked the reporting to Prometheus. Thanks for pointing out!

The reason I initially set the request interval to 5 minutes was because the GraphQL API sometimes returns a rate limit error before the remaining value reaches 0. When this happens, the response includes no rate limit info, so we can't rely on the usual rate limit object and the resetAt value defaults to zero, which can cause incorrect handling.

To deal with this, I first handled the error case directly, then move on to process the rate limit if available.

Your comment gave me a new perspective, so I adjusted the approach a bit and now:

  1. We first check if the global rate limit is already set.

    • If it's set, we use that value and delay accordingly.
  2. If it's not set, and we receive an error:

    • We check if it's a rate limit issue.
    • In that case, we set the global rate limit (so other requests know to wait) and continue reporting to Prometheus.
  3. If there's no error, we check the actual rate limit object:

    • If remaining < 3, we wait until the resetAt time.
    • Why 3? Because some GraphQL queries can cost more than one point, depending on complexity.

func (s *Source) handleGraphQLRateLimit(ctx context.Context, rl *rateLimit, errIn error, reporters ...errorReporter) bool {
// if rate limit exceeded error happened, wait for 5 minute before trying again
if errIn != nil && strings.Contains(errIn.Error(), "rate limit exceeded") {
retryAfter := 5 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge retry interval. Does GitHub prescribe it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in the above comment.

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good, just had a couple notes. We should add tests for this--probably the integration tests should cover this (maybe a unit test or two for chunkIDs), so if we just run the integration tests w/ UseGithubGraphqlAPI both false and true, that'd give us even more confidence we're not breaking anything.

main.go Outdated
feature.GitlabProjectsPerPage.Store(100)

// OSS Default using github graphql api for issues, pr's and comments
feature.UseGithubGraphqlAPI.Store(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start this off as false so it's not automatically turned on for EE customers (we can/should make it true for OSS after we get the flag in EE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, regretfully I think the initialism is GraphQL--if we're gonna do it for REST we should probably do it here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll temporarily turn it off. I didn't quite understand your second comment - just to clarify, we don’t use any initialism for REST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was just saying the "QL" is capitalized; Graphql -> GraphQL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

func (s *Source) processRepoComments(ctx context.Context, repoInfo repoInfo, reporter sources.ChunkReporter, cutoffTime *time.Time) error {
func (s *Source) processIssueandPRsWithCommentsREST(ctx context.Context, repoInfo repoInfo, reporter sources.ChunkReporter, cutoffTime *time.Time) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm an 80 columns person because I have a bunch of buffers open side-by-side, but I accept I'm a relic from an earlier time. This one (and at least one more below) are > 150 though--can we wrap somewhere between 90 and 120?

return s.processRepoComments(ctx, repoInfo, reporter, cutoffTime)
// if we need to use graphql api for repo issues, prs and comments
if feature.UseGithubGraphqlAPI.Load() {
return s.processRepoIssueandPRsWithCommentsGraphql(ctx, repoInfo, reporter, cutoffTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to do anything here, just musing)

Hrm, w/ the old code passing cutoffTime just the once was kind of 🤷🏻 , but now that we're carrying it around everywhere, it makes me think it'd be nice if we processed s.commentsTimeframeDays up top in Init, that way we don't need to drill it down everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound's good we can do it in a separate optimization PR.

return err
}

// if a pull request has more than 100 comments - some interns pull request might endup here :)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

@kashifkhan0771
Copy link
Contributor Author

We should add tests for this--probably the integration tests should cover this (maybe a unit test or two for chunkIDs), so if we just run the integration tests w/ UseGithubGraphqlAPI both false and true, that'd give us even more confidence we're not breaking anything.

I added the test but even the existing integration tests are not working for me 😿

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants