If you make crawlers for a living, or just because a man needs hobbies, there are landmines you'll eventually step on.
With our BrandCat project, our system crawls the web every day looking for phishing sites, copycat domains, and the usual crowd that tries to steal your thunder by registering something close to your brand.
When you run crawlers at scale, you eventually bump into sinkholes or traps like the Spamhaus XBL (Exploits Block List).
Why the XBL Matters
The XBL is a fantastic resource: it tracks IPs that show signs of infection, compromised devices, and malware beacons. Security companies and ISPs all over the world rely on it to stop bad traffic before it causes damage.
Where Things Get Tricky
The problem is that, from their perspective, repeated traffic to a sinkhole looks exactly like a botnet checking in with its command-and-control. From our perspective, it was nothing more than our crawlers fetching domains that happened to resolve inside their sinkhole range.
The result? One of our "workhorse" server IPs ended up on the Spamhaus XBL for a few days, even though nothing was infected. That's the flip side of these blocklists: they're extremely useful in catching real threats, but they can also generate false positives when legitimate security research or crawlers happen to hit the same traps.
How We Dealt With It
The fix isn't rocket science: you collect sinkhole IPs and ranges and drop them in your firewall so your crawlers never try to talk to them. It's not hard, but it takes time, because it's not like Spamhaus will publish an official list of their IPs 😁
Btw, don't just take my word for it, you should always double-check every IP or range yourself.
Here are some that we've stumbled upon and safely blocked:
62.28.241.21
89.185.44.100
195.22.4.21
195.22.26.192/26
195.22.28.192/27
195.38.137.100
195.157.15.100
212.61.180.100
23.253.126.58
104.239.157.210
5.2.189.251
54.208.168.113
150.101.125.42
148.81.111.64/26
62.0.58.94
221.8.69.25
45.56.77.175
104.244.12.0/22
142.0.36.234
193.166.255.171
143.215.130.0/24
198.61.227.6
50.57.148.87
74.200.48.169
91.186.66.36
192.169.69.25
93.159.228.22
95.211.172.143
139.146.167.25
199.2.137.0/24
204.95.99.59
207.46.90.0/24
109.74.196.143
50.116.56.144
50.116.32.177
178.79.190.156
87.106.24.200
87.106.26.9
74.208.64.145
74.208.64.191
74.208.164.166
212.227.55.84
74.208.15.160
74.208.15.97
87.106.250.34
87.106.86.28
176.58.104.168
212.227.20.19
86.124.164.25
216.218.185.0/24
Scripts for the Brave and Lazy
And, as always, if you're into curl | bash
extreme sports, here's a quick way to add them either with iptables or ufw.
iptables version:
#!/bin/bash
iptables -N SINKHOLE_IPS
iptables -A OUTPUT -j SINKHOLE_IPS
for ip in \
62.28.241.21 \
89.185.44.100 \
195.22.4.21 \
195.22.26.192/26 \
195.22.28.192/27 \
195.38.137.100 \
195.157.15.100 \
212.61.180.100 \
23.253.126.58 \
104.239.157.210 \
5.2.189.251 \
54.208.168.113 \
150.101.125.42 \
148.81.111.64/26 \
62.0.58.94 \
221.8.69.25 \
45.56.77.175 \
104.244.12.0/22 \
142.0.36.234 \
193.166.255.171 \
143.215.130.0/24 \
198.61.227.6 \
50.57.148.87 \
74.200.48.169 \
91.186.66.36 \
192.169.69.25 \
93.159.228.22 \
95.211.172.143 \
139.146.167.25 \
199.2.137.0/24 \
204.95.99.59 \
207.46.90.0/24 \
109.74.196.143 \
50.116.56.144 \
50.116.32.177 \
178.79.190.156 \
87.106.24.200 \
87.106.26.9 \
74.208.64.145 \
74.208.64.191 \
74.208.164.166 \
212.227.55.84 \
74.208.15.160 \
74.208.15.97 \
87.106.250.34 \
87.106.86.28 \
176.58.104.168 \
212.227.20.19 \
86.124.164.25 \
216.218.185.0/24
do
iptables -A SINKHOLE_IPS -d $ip -j DROP
done
ufw version:
#!/bin/bash
for ip in \
62.28.241.21 \
89.185.44.100 \
195.22.4.21 \
195.22.26.192/26 \
195.22.28.192/27 \
195.38.137.100 \
195.157.15.100 \
212.61.180.100 \
23.253.126.58 \
104.239.157.210 \
5.2.189.251 \
54.208.168.113 \
150.101.125.42 \
148.81.111.64/26 \
62.0.58.94 \
221.8.69.25 \
45.56.77.175 \
104.244.12.0/22 \
142.0.36.234 \
193.166.255.171 \
143.215.130.0/24 \
198.61.227.6 \
50.57.148.87 \
74.200.48.169 \
91.186.66.36 \
192.169.69.25 \
93.159.228.22 \
95.211.172.143 \
139.146.167.25 \
199.2.137.0/24 \
204.95.99.59 \
207.46.90.0/24 \
109.74.196.143 \
50.116.56.144 \
50.116.32.177 \
178.79.190.156 \
87.106.24.200 \
87.106.26.9 \
74.208.64.145 \
74.208.64.191 \
74.208.164.166 \
212.227.55.84 \
74.208.15.160 \
74.208.15.97 \
87.106.250.34 \
87.106.86.28 \
176.58.104.168 \
212.227.20.19 \
86.124.164.25 \
216.218.185.0/24
do
ufw deny out to $ip
done
⚠️ And again, don't take my word for it, verify each IP or range yourself before dropping them.