Skip to content

Use transformers to reduce memory footprint of external-dns #5595

@valerian-roche

Description

@valerian-roche

What would you like to be added:

Currently the pod and service sources use pod informers. Those informers end up keeping an in-memory, go representation of the pods in all namespaces (by default) in the cluster.
Within large clusters this end up representing large amounts of memory (10+ GBs) and also makes external-dns unstable as its memory usage is directly correlated to the size of pod blueprints.

Why is this needed:

Transformers have been implemented in https://github.com/kubernetes/client-go as a way to reduce this memory footprint when only a small subset of fields are required. In the case of external-dns this removes most of the spec, parts of metadata and of status. The resulting memory footprint in our clusters is ~1/10th of the previous usage (though your milage may vary depending on your pods blueprint "shape"). A side effect is that pod informer is no longer the only sizing constraint and a doubling in pod numbers results in a less than 50% memory increase.
We did not push for this earlier as a limitation of informers in client-go removed most of the value: on the initial sync all pods would still end up in memory prior to being passed to the informer. As a result while it greatly improved the average memory usage, it would not improve the maximum memory usage (except if resync is set in which case it still nearly halved the memory peak). This issue is no longer present when using watchlist (not activated by default, but can be activated through code or environment variable) with a fix recently merged in client-go kubernetes/kubernetes#131799.

We have been running with all this in a fork for more than a month and have been able to divide our external-dns costs by 4, and are hoping to be able to merge this upstream for everyone to use

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions