Problem http User-Agents: Difference between revisions
m (Chef moved page Problem User Agents to Problem http User-Agents: Problem http User-Agents) |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<!-- seo --> | <!-- seo --> | ||
{{#seo: | {{#seo: | ||
|title=- | |title=Problem http User-Agents | ||
|titlemode=replace | |titlemode=replace | ||
|description= | |keywords=#tools #problemhttpuseragents #badbots #apache #httpdconf | ||
|hashtagrev=12032020 | |||
|description=How I block pesky web bots from my webserver. Apache | |||
}} | }} | ||
<!-- /seo --> | <!-- /seo --> | ||
Line 10: | Line 11: | ||
An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up. | An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up. | ||
I have built up this list of problem user-agents over months of log analysis. Most of these agents ignore | I have built up this list of problem user-agents over months of log analysis. Most of these agents ignore robots.txt or at best, ignore the crawler-delay command in robots.txt: | ||
<pre> | <pre> | ||
Line 26: | Line 27: | ||
Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|\ | Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|\ | ||
BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|\ | BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|\ | ||
Barkrowler|MJ12Bot|Semrush|GPTBot | Barkrowler|MJ12Bot|Semrush|GPTBot [NC] | ||
RewriteRule . - [R=429,L] | RewriteRule . - [R=429,L] | ||
Line 33: | Line 34: | ||
''single line version'' | ''single line version'' | ||
<pre> | <pre> | ||
RewriteCond %{HTTP_USER_AGENT} Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|Barkrowler|MJ12Bot|Semrush|GPTBot | RewriteCond %{HTTP_USER_AGENT} Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|Barkrowler|MJ12Bot|Semrush|GPTBot [NC] | ||
RewriteRule . - [R=429,L] | RewriteRule . - [R=429,L] | ||
Line 54: | Line 55: | ||
{{CategoryLine}} | {{CategoryLine}} | ||
[[Category:Tools]] | [[Category:Tools]] | ||
<!-- footer hashtags --><code 'hashtagrev:12032020'>#tools #problemhttpuseragents #badbots #apache #httpdconf</code><!-- /footer_hashtags --> |
Latest revision as of 13:21, 8 November 2024
Stop the Bandwidth Hogs
An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up.
I have built up this list of problem user-agents over months of log analysis. Most of these agents ignore robots.txt or at best, ignore the crawler-delay command in robots.txt:
User-agent: * Crawl-delay: 4
Add the following lines to your httpd config file: /etc/httpd/conf.d/domainname.conf
multi-line version
RewriteCond %{HTTP_USER_AGENT} \ Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|\ Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|\ BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|\ Barkrowler|MJ12Bot|Semrush|GPTBot [NC] RewriteRule . - [R=429,L]
single line version
RewriteCond %{HTTP_USER_AGENT} Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|Barkrowler|MJ12Bot|Semrush|GPTBot [NC] RewriteRule . - [R=429,L]
I prefer to use error code 429 [Too Many Requests] but if you prefer to block with httpd error code 410 [Gone] then change the RewriteRule to the example below:
RewriteRule ^.*$ - [F,L]
Test your config first!
Run the following command to test your httpd config file before restarting!
sudo apachectl configtest
If there are no errors then restart your apache daemon to begin blocking the pests with your new rules!
#tools #problemhttpuseragents #badbots #apache #httpdconf