Skip to content

Conversation

u-kai
Copy link
Contributor

@u-kai u-kai commented Sep 6, 2025

What does it do ?

This PR optimizes the ProviderSpecific field in the Endpoint struct by changing it from a slice-based implementation ([]ProviderSpecificProperty) to a map-based implementation (map[string]string).
This change improves property access performance from O(n) linear search to O(1) constant-time lookups while maintaining full backward compatibility for CRD users.

Motivation

The current slice-based ProviderSpecific implementation creates significant performance bottlenecks in large-scale environments.
This optimization was identified as a prerequisite for addressing the performance concerns raised in PR #5799.
Before implementing provider-specific property warnings, the underlying data structure needed to be optimized to prevent the warnings themselves from becoming a performance bottleneck.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added the apis Issues or PRs related to API change label Sep 6, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. internal Issues or PRs related to internal code plan Issues or PRs related to external-dns plan provider Issues or PRs related to a provider registry Issues or PRs related to a registry source needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 6, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @u-kai. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 6, 2025
// DNSEndpointSpec defines the desired state of DNSEndpoint
type DNSEndpointSpec struct {
Endpoints []*endpoint.Endpoint `json:"endpoints,omitempty"`
Endpoints []*Endpoint `json:"endpoints"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, the path is apis/v1alpha1. Moving Endpoints under this folder, means we are doing versioning for Endpoint object. Not against, but not sure if this is the right approach. As now we most likely going to have v1alpha.Endpoint....

I have no solution, not an easy one though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this — you’re right that moving Endpoint under apis/v1alpha1 makes it part of the public API surface and therefore versioned.

A few clarifications on intent and impact:

Intentional separation:
We’re explicitly decoupling the CRD type (apis/v1alpha1.Endpoint) from the internal type (endpoint.Endpoint).
This lets us keep the CRD schema stable while giving us freedom to optimize the internal representation (e.g., map-based ProviderSpecific) without leaking those changes into the API.

No user-facing change:
The CRD schema shape for Endpoints remains the same from a user point of view; we only replaced the reference to the internal type with the versioned API type.

On future versions (v1, etc.):
If v1’s Endpoint is identical to v1alpha1, we can avoid extra helpers by using a type alias or a shared converter.
If it diverges, we’ll add a thin conversion layer and still keep the internals map cleanly.
Typically the controller consumes one served version; the apiserver handles storage-version normalization.

Maintenance cost:
We’re aware this assigns versioning responsibility to Endpoint, but it also prevents tight coupling to internals and reduces the risk for future internal refactors — or at least, we believe it can reduce such risks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you generated CRDs, are they still the same or there is diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't handle this properly initially.
I found that the Endpoints field should have the omitempty tag.
After adding omitempty to the endpoints field and regenerating the CRDs, there are no diffs.
Everything is now in the proper state.

@ivankatliarchuk
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 6, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2025
@u-kai u-kai force-pushed the refactor/provider-specific-map branch from a2c62d3 to a2c0d29 Compare September 8, 2025 00:10
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 8, 2025
@u-kai u-kai force-pushed the refactor/provider-specific-map branch from a2c0d29 to 921fb70 Compare September 9, 2025 12:08
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2025
Comment on lines 231 to 256
func (ps ProviderSpecific) String() string {
if len(ps) == 0 {
return "[]"
}
// Collect and sort keys for stable output.
keys := make([]string, 0, len(ps))
for k := range ps {
keys = append(keys, k)
}
sort.Strings(keys)
// Build entries like "{key value}" preserving stable order.
b := strings.Builder{}
b.WriteByte('[')
for i, k := range keys {
if i > 0 {
b.WriteByte(' ')
}
b.WriteByte('{')
b.WriteString(k)
b.WriteByte(' ')
b.WriteString(ps[k])
b.WriteByte('}')
}
b.WriteByte(']')
return b.String()
}
Copy link
Contributor

@vflaux vflaux Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to maintain code that mimic the default output from fmt?
I would prefer to keep the ProviderSpecificProperty struct (but package-private), build a slice of it from ProviderSpecific and use fmt.Sprintf().
Performance wise your code is faster though.
@ivankatliarchuk wdyt ?

edit:
Something like that:

func (ps ProviderSpecific) String() string {
	data := make([]providerSpecificProperty, 0, len(ps))
	for k, v := range ps {
		data = append(data, providerSpecificProperty{Name: k, Value: v})
	}
	slices.SortFunc(data, func(a, b providerSpecificProperty) int {
		return strings.Compare(a.Name, b.Name)
	})
	return fmt.Sprint(data)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too shure what this method is for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to keep the log output identical. But that's mostly for debug logs.

log.Debugf(`Skipping endpoint %v because of missing owner label (required: "%s")`, ep, ownerID)

But there is some info logs too:
log.Infof("Skipping endpoint %v because owner does not match", ep)

log.WithField("endpoint", ep).Debugf("Skipping endpoint because all targets were filtered out")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivankatliarchuk
As @vflaux mentioned, this logic is to maintain compatibility of the log output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a map, so the output should be consistent. I think I'm questioning the actual need for this method. Is formatting with %q or %v is not enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mloiseleur. Wdyt, do we want to keep same output in logging or Map : ProviderSpecific map[key1:value1 key2:value2 key3:value3] is just enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Changing logs output is not the goal of this PR.
Nonetheless, it changes the struct, so it can be somehow expected that it impacts log output.

The default output provided by Go on map is fine and well known in go software.
My recommendation is to use it as a start, for this PR.

We'll see with time if it's not enough and if we need to update it with specific code and/or struct.

@vflaux @ivankatliarchuk @u-kai Wdyt ? Does it make sense to you ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.
My original motivation for adding the custom String() was to keep backward compatibility in log output — since the struct changed, I wanted to avoid surprising changes in log messages.

That said, I’m also fine with simplifying the code and just returning the default map output. It makes the code cleaner, though it will slightly change the log format. If we go this route, I think it’s worth mentioning in release notes so users are aware of the change.

If you prefer that approach, I can drop the custom String() method and just let it print the map directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I prefer the simpler approach and drop String() method.
No problem to add a line on this in the next release notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the String() method and related tests as suggested.
@mloiseleur @ivankatliarchuk
Please review when you have a chance 🙇

newEp.Labels[k] = v
}
newEp.ProviderSpecific = append(endpoint.ProviderSpecific(nil), ep.ProviderSpecific...)
if ep.ProviderSpecific != nil {
Copy link
Contributor

@vflaux vflaux Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test case for this as this is not covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address the test coverage for the if ep.ProviderSpecific !=nil branch in a separate PR.
The current tests use makeZone helper function which doesn't set ProviderSpecific, and modifying it would require updating all existing tests that depend on it.
A dedicated PR focused on test infrastructure improvements would be more appropriate to maintain consistency across the codebase.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2025
@ivankatliarchuk
Copy link
Contributor

I dunno. When I merge with master I just do git merge master, but hard to say what exact configs applied in the background

@ivankatliarchuk
Copy link
Contributor

Was it tested on real environment? The potential risk here - it may keep try to recreate/update same endpoints over and over again, due some of the changes.

@u-kai
Copy link
Contributor Author

u-kai commented Sep 20, 2025

@ivankatliarchuk

I dunno. When I merge with master I just do git merge master, but hard to say what exact configs applied in the background

Alright, then I’ll resolve conflicts with --no-rebase for now.

Was it tested on real environment? The potential risk here - it may keep try to recreate/update same endpoints over and over again, due some of the changes.

I tested the following scenario and didn’t see any issues: Ran with v0.19.0 using DNSEndpoint and Ingress as sources.

Then switched to the version with this change applied and confirmed it kept working fine. After that, modified the record contents and verified the DNS records were updated accordingly.

Of course, it’s not realistic to cover every possible case. That said, since this change doesn’t alter any external
interfaces and all unit tests are still passing, I think the risk of problems is quite low.
If you have any specific scenarios in mind where this could be risky, I’d be happy to check them.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2025
Copy link
Collaborator

@mloiseleur mloiseleur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM.
🤔 The last thing we should double check / confirm: does it have impact on webhook providers ?

@u-kai
Copy link
Contributor Author

u-kai commented Sep 21, 2025

@mloiseleur
Thanks for flagging this — you’re absolutely right.
This does affect webhook providers: we need to align the types that webhooks receive and providers send with the CRD-defined types, and ensure backward compatibility (likely via an adapter/translation layer).
I’m working on this now and will follow up with an update. If you have a specific webhook provider in mind, I can test against it as well.

@u-kai
Copy link
Contributor Author

u-kai commented Sep 23, 2025

@mloiseleur
Verified webhook provider compatibility by adding an adapter layer between internal and API endpoint types.

  • Implemented conversion functions
  • Updated webhook to use the adapter
  • Added comprehensive tests

All tests pass and backward compatibility is maintained. Ready for review!

@u-kai u-kai requested a review from mloiseleur September 23, 2025 09:37
Copy link
Contributor

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm,. few questions left to resolve

v4EP := ep.DeepCopy()
v4EP.Targets = v4Targets
v4EP.RecordType = endpoint.RecordTypeA
v4EP := &endpoint.Endpoint{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is deepcopy no longer works here? Not sure if we should be using endpoint.Endpoint in refactorings, dedicated methods more preferred, so we could incapsulate things in the future.

Why this code is required, with null checks and map copy?

if ep.Labels != nil {
			v4EP.Labels = make(endpoint.Labels, len(ep.Labels))
			maps.Copy(v4EP.Labels, ep.Labels)
		}

		if ep.ProviderSpecific != nil {
			v4EP.ProviderSpecific = make(endpoint.ProviderSpecific, len(ep.ProviderSpecific))
			maps.Copy(v4EP.ProviderSpecific, ep.ProviderSpecific)
		}

My understanding v4EP created without labels or provider specific, is this not just?

v4ep.Labels = ep.Labels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On DeepCopy:
This code now uses an internal Endpoint type instead of the CRD code-generated one, so we don’t have generated DeepCopy methods anymore.
I don’t think it’s worth re-implementing DeepCopy just for this use case—being explicit in the adapter keeps the conversion type-safe and makes future encapsulation easier.

On the nil checks and map copy:
A direct assignment would share the same map (Go maps are reference types), so changes to the converted value could leak back to the source.
Copying avoids aliasing and preserves nil vs empty.

if providerSpecific.Name == key {
return providerSpecific.Value, true
}
if e.ProviderSpecific == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-09-25 at 10 16 25

}

// DeleteProviderSpecificProperty deletes any ProviderSpecificProperty of the specified name.
func (e *Endpoint) DeleteProviderSpecificProperty(key string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-09-25 at 10 16 53

"sigs.k8s.io/external-dns/endpoint"
)

func ToInternalEndpoint(crdEp *apiv1alpha1.Endpoint) *endpoint.Endpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name crdEp to specific and method description is missing for exported method

@@ -0,0 +1,86 @@
/*
Copyright 2017 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copyright 2017 The Kubernetes Authors.
Copyright 2025 The Kubernetes Authors.

return ep
}

func ToInternalEndpoints(crdEps []*apiv1alpha1.Endpoint) []*endpoint.Endpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other method

newEp.ProviderSpecific = append(endpoint.ProviderSpecific(nil), ep.ProviderSpecific...)
if ep.ProviderSpecific != nil {
newEp.ProviderSpecific = make(endpoint.ProviderSpecific)
maps.Copy(newEp.ProviderSpecific, ep.ProviderSpecific)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the map copy even required, why not a direct assignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the map copy: direct assignment would just alias the underlying map, since maps in Go are reference types.
That would mean v4EP and ep share the same backing map, and any mutation on one would affect the other.
To avoid this aliasing we explicitly copy into a new map, so the converted object is fully independent.

This struct is used in a variety of ways in tests, and we wanted to minimize any chance of shared state leaking between instances.
Making explicit copies ensures data isolation and makes test behavior more predictable.

w.WriteHeader(http.StatusOK)
if err := json.NewEncoder(w).Encode(records); err != nil {
apiRecords := adapter.ToAPIEndpoints(records)
if err := json.NewEncoder(w).Encode(apiRecords); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-09-25 at 10 33 49

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are only encoding valid types into an in-memory buffer, this call cannot realistically fail.
Because the failure path is not reproducible in this context, we are intentionally not covering it with tests.

log.Errorf("Failed to call adjust endpoints: %v", err)
w.WriteHeader(http.StatusInternalServerError)
}
pve = adapter.ToAPIEndpoints(endpoints)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not covered before, which probably explains why tests were green when there was an issue with webhook

Screenshot 2025-09-25 at 10 34 41

"sigs.k8s.io/external-dns/endpoint"
)

func ToInternalEndpoint(crdEp *apiv1alpha1.Endpoint) *endpoint.Endpoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name ToInternalEndpoint not necessary is clear engough. external-dns Endpoint is not internal, as is exported. I do get the meaning probably slighly different here.

Maybe ToAPI and FromApi or ToVersionedApi and FromVersionedApi

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@u-kai
Copy link
Contributor Author

u-kai commented Sep 25, 2025

@ivankatliarchuk
Thank you very much for all the reviews!
I’ve made the necessary fixes and replied to the comments where a response was needed.
I’d appreciate it if you could take another look.

@ivankatliarchuk
Copy link
Contributor

Overall lgtm. Need to find some time to execute few smoke tests on my side.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apis Issues or PRs related to API change cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. internal Issues or PRs related to internal code needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. plan Issues or PRs related to external-dns plan provider Issues or PRs related to a provider registry Issues or PRs related to a registry size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants