feat: add --provider-cache-patch-on-apply flag to avoid full re-fetch on large zones#6447
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
Welcome @tanishcharaya! |
|
Hi @tanishcharaya. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
53fd999 to
5cfc0c3
Compare
5cfc0c3 to
a441085
Compare
|
The PR's stated problem: A full zone re-fetch after every write negates the cache on large zones. My view, only coredns + pihole + rfc2316 +awssd as provider, could currently handle 100k+ records reliably, rest will not. The current zone caching doesn't have this problems - missing a new zone for a few minutes is tolerable. Serving a stale record view to the planner can produce wrong creates, updates, or worse, skip correcting real drift. I'll compare here two Caches, Fundamentally Different Things (zones vs dns records) Zone cache (blueprint.ZoneCache[T]) - answers: "which zones do I manage?"
Record cache (CachedProvider) - answers: "what records currently exist?"
But the real problem is larger:
The motivation example is 100k+ Route53 records. Route53's ListResourceRecordSets has a default quota of 5 req/s, page size 300. That's ~333 API calls - around 67 seconds of sustained throttled API traffic just to build the initial cache. This PR does nothing for that window. Any scenario that forces a full re-fetch (pod restart, OOM kill, node eviction, rolling update) is just as expensive as before. In Kubernetes, pod restarts are routine. The optimization only pays off in the narrow window of a continuously running, stable process - exactly what you can't guarantee in a Kubernetes workload.
external-dns stores nothing durable. The cache lives only in the heap of the current process. The PR's patch-on-apply logic is best-effort warmth between reconcile loops, not a solution to the fetch cost. We are currently caching zones, it's not an expensive operation, but the zones not changing that often
For the optimization to matter, you need:
If you're on a 1-minute reconcile with a 10-minute cache and creates/deletes every loop, the math works in the happy path - but in any production Kubernetes environment the restart frequency likely closes that window regularly. So why not to just set reconcile to 10 minutes? PatchOnApply - is another sharp case. This solution introduces correctness risk. patch-on-apply makes the cache MORE persistent - even a write no longer triggers a re-fetch. So the cache can now drift indefinitely as long as writes Interesting solution, this is a follow-up for #3402 |
|
@ivankatliarchuk What would you suggest to solve this problem ? In particular, it has been found that with 200k records in an AWS Route53 zone, each refresh triggers around 70 API calls to retrieve all the records, making it more likely to hit the AWS Route53 API rate limits. If the page size is 300, shouldn't the number of calls be 200k/300 = 0.67k not 0.07k. Also since we have fresh sync of external provider at regular intervals, shouldn't the cache stay in sync ? It can drift maximium by provider-cache-time. |
|
Improving things always welcome. I'll play devils advocate, and will leave it for other reviewers to decide, as don't have strong opinion around this feature. I have not looked at the code, as CLA not signed, most likely there are some edge-cases that needs considering as well. The controller pattern has one job: Caching "actual DNS state" means we're not observing - we're trusting a snapshot. The cache breaks the core guarantee the controller exists to provide: that DNS reality matches Kubernetes intent. The record cache is compensating for one problem: reading a large zone is expensive. That's an API interaction problem. In such case, I'm not sure what the right solution is. For example I'm moved all large records to coredns provider as an example. A TTL-based cache of managed records is a pragmatic shortcut that trades DNS correctness for cost savings (depends how risky is this correctness). It works until it doesn't - and when it fails silently, DNS is wrong -> as a result it could lead to disaster. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What does this PR do?
Adds a
--provider-cache-patch-on-applyboolean flag (defaultfalse) toCachedProvider.When enabled, a successful
ApplyChangescall patches the in-memory record cache in-place rather than invalidating it. This avoids a full provider re-fetch (e.g.ListResourceRecordSets) on the next reconcile loop after every write.Why is this needed?
When
--provider-cache-timeis set, the previous behaviour unconditionally calledReset()before writing to the provider, invalidating the cache regardless of success or failure. On the nextRecords()call, external-dns would re-fetch the entire zone.For large zones (e.g. 100k+ Route53 records with creates/deletes every minute) this means a full zone list on every sync cycle — negating the cache entirely and generating significant API cost and latency.
How does it work?
ApplyChanges: removes deleted/updated-old entries and appends created/updated-new entries directly in the cache slice — no provider round-trip.ApplyChanges: still callsReset()to guarantee consistency with the provider state.false— existing behaviour is fully preserved for users who do not set the flag.Usage