-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Update existing lists #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is quite a large diff. Happy to iterate if there are things you want to change. |
I'm not sure list patching can work reliably. A domain can be in another list instead of the previous list after blocklist update and would mess up the patching process. For example: We have two lists and the first one is full. If a new domain is added to the first list after the blocklist update, the last domain would end up being in the second list. We wouldn't be able to check its existence without going through all the current lists. Before blocklist update
After blocklist update
Things become even more complicated if the next update is from a different set of blockklists. Therefore, to make it work reliably, we can only go through all the current lists but that would defeat the purpose to save time and requests because there isn't an API to get all items of all the lists. |
This change does check all existing lists. It is a reliable operation and idempotent such that should an error occur, the same command can be used again to complete any missing changes. The benefits of this change are not only (slightly) faster operations, but avoiding any period where the lists or rules are unapplied which would leave the network and users vulnerable to accessing otherwise blocked hosts. The maximum requests in this flow is 2x number_of_lists (GET + PATCH to every list) in the worst case. The requests in the current flow is 2x number_of_lists (DELETE + POST to recreate every list) in the normal case. |
I've been using this PR and the performance difference is pretty huge, using only OISD small it takes the average runtime from 2m45s to about 30s. (Comparing against a full delete+create & this PR to compare+patch). I have came across a bug though, the script errors when no lists currently exist, it only works where at least 1 list already exists. Example workflow:
|
Thanks for the feedback and bug report. Will get that bug patched today.
…On Mon, Dec 30, 2024 at 7:07 AM Kieran Brown ***@***.***> wrote:
I've been using this PR and the performance difference is pretty huge,
using only OISD small it takes the average runtime from 2m45s to about 30s.
(Comparing against a full delete+create & this PR to compare+patch).
I have came across a bug though, the script errors when no lists currently
exist, it only works where at least 1 list already exists.
Example workflow:
https://github.com/kieranbrown/dns/actions/runs/12520357125/job/34925911614
file:///home/runner/work/dns/dns/lib/api.js:78
const cgpsLists = lists.filter(({ name }) => name.startsWith("CGPS List"));
^
TypeError: Cannot read properties of null (reading 'filter')
at synchronizeZeroTrustLists (file:///home/runner/work/dns/dns/lib/api.js:78:27)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async file:///home/runner/work/dns/dns/cf_list_create.js:143:3
—
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTT2ED3CNFAVTVOVV2MKFT2IFORVAVCNFSM6AAAAABS2ELTJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVGU4TQNZZGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Can confirm that issue is resolved 🙏 I've just encountered another issue when adding a new blocklist. Running a full delete+create fixed the issue so it seems to be related to adding a new Cloudflare list during synchronisation. Should be able to reproduce by starting off with a small blocklist, let it create at least 1 Cloudflare list, then add another blocklist causing it to create more Cloudflare lists. Example workflow:
|
Apologies for this oversight and thanks for the continued testing and feedback. |
Magnificent work; thank you for being patient and quickly fixing new problems as they come up. Because there are so many moving parts involved in this process, I'd continue testing the new code further before merging.
Aside from what I've mentioned, it works beautifully. Thanks again. |
Fixed your 3. comment since it's an easy change. For 1. and 2. the change would be more intrusive and other than organization, doesn't change the outcome much. |
I’m still coming across the Chuck NaN problem when the script wants to create a new list https://github.com/kieranbrown/dns/actions/runs/12702722844/job/35409402648 |
Do you have any manually named lists using the same prefix?
I’m not going to be able to look at this for a while. I’ve been affected by
the California wildfires.
…On Fri, Jan 10, 2025 at 1:00 AM Kieran Brown ***@***.***> wrote:
I’m still coming across the Chuck NaN problem when the script wants to
create a new list
https://github.com/kieranbrown/dns/actions/runs/12702722844/job/35409402648
—
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTT2ED4KD5UDQ5JD5QXKST2J6D3LAVCNFSM6AAAAABS2ELTJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBSGEYDKOBYGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I would like to test this out myself. ps: I'm very sorry to hear you are impacted by the CA fires. That is terrible and I wish you all the best. |
2865475
to
bf94073
Compare
Sorry for the delay. I have addressed the NaN issue. It was a bug on my part, forgot to spread the array of numbers so it was trying to do Math.max on an array rather than a list of numbers. |
@umpire7777777 the easiest way to test out this branch is to edit your workflow to checkout this branch instead of the parent repo's v1 branch. e.g.
to
Or point to your own repo if you do pull this branch into one of your own. |
I made this change to test. It failed during the create new rules step. It failed on the first list, then repeated until the final failure. Number of blocked domains: 182391 |
Oh dear. I’ll look at a fix for that tomorrow!
…On Thu, Feb 13, 2025 at 8:23 PM umpire7777777 ***@***.***> wrote:
@umpire7777777 <https://github.com/umpire7777777> the easiest way to test
out this branch is to edit your workflow to checkout this branch instead of
the parent repo's v1 branch.
e.g. Change
name: Checkout
uses: ***@***.***
with:
repository: "mrrfv/cloudflare-gateway-pihole-scripts"
ref: "v1"
to
name: Checkout
uses: ***@***.***
with:
repository: "bsyk/cloudflare-gateway-pihole-scripts"
ref: "update-lists"
Or point to your own repo if you do pull this branch into one of your own.
I made this change to test. It failed during the create new rules step. It
failed on the first list, then repeated until the final failure.
Number of blocked domains: 182391
Number of lists to be created: 183
Creating 183 lists for 182391 domains...
Checking existing lists...
Found 0 existing lists. Calculating diffs...
0 removals, 182391 additions to make
Created "CGPS List - Chunk -Infinity" list - 182 left
An error occured while making a web request: "Error: HTTP error! Status: 4
29
<https://github.com/umpire7777777/cloudflare-gateway-pihole-scripts/actions/runs/13322293310/job/37209024781#step:8:30>",
retrying. Attempt 1 of 50.
THIS IS NORMAL IN MOST CIRCUMSTANCES. Refrain from reporting this as a bug
unless the script doesn't automatically recover after several attempts.
Waiting for 2 minutes to avoid rate limiting.
An error occured while making a web request: "Error: HTTP error! Status:
409", retrying. Attempt 2 of 50.
---- repeated
An error occured while making a web request: "Error: HTTP error! Status:
409", retrying. Attempt 50 of 50.
THIS IS NORMAL IN MOST CIRCUMSTANCES. Refrain from reporting this as a bug
unless the script doesn't automatically recover after several attempts.
Could not create "CGPS List - Chunk -Infinity" - Error: undefined - Error:
HTTP error! Status: 409
file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/helpers.js:59
throw new Error(${(data && 'errors' in data) ? data.errors[0].message :
data} - ${error});
^
Error: undefined - Error: HTTP error! Status: 409
at request
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/helpers.js:59:11)
at process.processTicksAndRejections
(node:internal/process/task_queues:105:5)
at async createZeroTrustListsOneByOne
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/api.js:188:7)
at async synchronizeZeroTrustLists
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/api.js:169:5)
at async
file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/cf_list_create.js:
143
<https://github.com/umpire7777777/cloudflare-gateway-pihole-scripts/actions/runs/13322293310/job/37209024781#step:8:144>
:3
Node.js v22.13.1
Error: Process completed with exit code 1.
—
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTT2EBNRNLQAUCEJ2FMDKL2PVVVBAVCNFSM6AAAAABS2ELTJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJYGIZTENZSGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
[image: umpire7777777]*umpire7777777* left a comment
(mrrfv/cloudflare-gateway-pihole-scripts#144)
<#144 (comment)>
@umpire7777777 <https://github.com/umpire7777777> the easiest way to test
out this branch is to edit your workflow to checkout this branch instead of
the parent repo's v1 branch.
e.g. Change
name: Checkout
uses: ***@***.***
with:
repository: "mrrfv/cloudflare-gateway-pihole-scripts"
ref: "v1"
to
name: Checkout
uses: ***@***.***
with:
repository: "bsyk/cloudflare-gateway-pihole-scripts"
ref: "update-lists"
Or point to your own repo if you do pull this branch into one of your own.
I made this change to test. It failed during the create new rules step. It
failed on the first list, then repeated until the final failure.
Number of blocked domains: 182391
Number of lists to be created: 183
Creating 183 lists for 182391 domains...
Checking existing lists...
Found 0 existing lists. Calculating diffs...
0 removals, 182391 additions to make
Created "CGPS List - Chunk -Infinity" list - 182 left
An error occured while making a web request: "Error: HTTP error! Status: 4
29
<https://github.com/umpire7777777/cloudflare-gateway-pihole-scripts/actions/runs/13322293310/job/37209024781#step:8:30>",
retrying. Attempt 1 of 50.
THIS IS NORMAL IN MOST CIRCUMSTANCES. Refrain from reporting this as a bug
unless the script doesn't automatically recover after several attempts.
Waiting for 2 minutes to avoid rate limiting.
An error occured while making a web request: "Error: HTTP error! Status:
409", retrying. Attempt 2 of 50.
---- repeated
An error occured while making a web request: "Error: HTTP error! Status:
409", retrying. Attempt 50 of 50.
THIS IS NORMAL IN MOST CIRCUMSTANCES. Refrain from reporting this as a bug
unless the script doesn't automatically recover after several attempts.
Could not create "CGPS List - Chunk -Infinity" - Error: undefined - Error:
HTTP error! Status: 409
file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/helpers.js:59
throw new Error(${(data && 'errors' in data) ? data.errors[0].message :
data} - ${error});
^
Error: undefined - Error: HTTP error! Status: 409
at request
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/helpers.js:59:11)
at process.processTicksAndRejections
(node:internal/process/task_queues:105:5)
at async createZeroTrustListsOneByOne
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/api.js:188:7)
at async synchronizeZeroTrustLists
(file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/lib/api.js:169:5)
at async
file:///home/runner/work/cloudflare-gateway-pihole-scripts/cloudflare-gateway-pihole-scripts/cf_list_create.js:
143
<https://github.com/umpire7777777/cloudflare-gateway-pihole-scripts/actions/runs/13322293310/job/37209024781#step:8:144>
:3
Node.js v22.13.1
Error: Process completed with exit code 1.
—
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTT2EBNRNLQAUCEJ2FMDKL2PVVVBAVCNFSM6AAAAABS2ELTJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJYGIZTENZSGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
d8d9c76
to
293a3f5
Compare
@umpire7777777 I believe this is now fixed. Sorry for the issue. |
I'm not geting it, sorry. I made the change above, but I think there is another change needed in main.yml but cannot figure it out. When I run the workflow Update Filter Lists, it still references "Delete Lists - Run npm run cloudflare-delete" and Create Lists -npm run cloudflare-create". Can you share your main.yml so I can compare? |
Sorry. I forgot I'd removed some steps.
|
Thinking about the docker work I submitted recently, once this PR merges it won't end up applying for docker containers due to the entrypoint script using the "npm start" command which runs "download", "delete", then "create" in order. I see a couple options that make sense to me:
@bsyk @mrrfv any thoughts if this is still planned to be merged at some point? Would be nice to have the docker containers take advantage when it is. |
To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed. This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired lists and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals. The priority for additions is: 1. Add to lists we're already patching for removals. 2. Add to existing lists with fewer than LIST_ITEM_SIZE entries. 3. Create a new list.
I've addressed some of the concerns about small/empty lists in this second PR (not sure how to add non-repo reviewers to that one) The |
@raetha Regarding the entrypoint, it should be sufficient to run Note: There may be a need to init the allowlist.txt first since the refresh:blocklist does not touch that. |
Hi @bsyk. Got everything working in my fork with this PR, the allowlist one, and pulled in your defrag branch. It all looks good, but I found one issue that I think stems back to this PR. My allowlist is no longer being excluded. It took me a little while to figure out what is happening, but somewhere between your PRs it seems to have stopped excluding domains from a supplied allowlist. Specifically I am using Hagezi's Multi Normal blocklist, which includes joinhoney.com. I've added parts of the PayPal Honey whitelist into my own allowlist.txt file which can be found at: I'm finding that the joinhoney.com domain is still being added to a CF. It's possible I'm doing something wrong, but I believe things look right in my configuration, and I don't see anything in the code that jumps out at me given I've never worked directly with JS before. Could you take a look and see if you notice anything that could be breaking allowlist processing? |
I don't have any issue with allowlist processing. One issue is in the
However, any with the trailing bits, e.g. PR here to fix the Here are some additional debugging steps.
|
Awesome, thanks for that fix it seems to be working as expected now. As for the allowlist I shared though, that is basically just a stripped down version of the list provided directly by PayPal Honey for use with ad blockers. The full version can be found at https://www.joinhoney.com/whitelist/honey-smart-shopping.txt. I just kept only the entries for their main domain as I didn't want overrides for the others that look more questionable. Ultimately I've just been in a loop lately where my main blocklist blocks these domains and was preventing downloading the official version of the list. So I looked at it for the first time and pulled out just what I needed into the one stored in my repo at the moment. I plan to find a better way to handle it eventually. Thanks for the quick fix. All the improvements you've made are greatly appreciated. |
To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed.
This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired list and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals.
The priority for additions is: