Problem http User-Agents: Difference between revisions

From Cookipedia
Jump to: navigation, search
(Created page with " <!-- seo --> {{#seo: |title=--title-- |titlemode=replace |description=--description-- }} <!-- /seo --> === Stop the Bandwidth Hogs=== An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up. Add the following lines to your httpd config file: /etc/httpd/conf.d/domainname.conf ''multi-line version'' <pre> RewriteCond %{HTTP_USER_AGENT} \ Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgene...")
 
No edit summary
Line 9: Line 9:
=== Stop the Bandwidth Hogs===
=== Stop the Bandwidth Hogs===
An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up.
An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up.
I have built up this list of problem user-agents over months of log analysis.  Most of these agents ignore robits.txt or at best, ignore the crawler-delay command in robots.txt:
<pre>
User-agent: *
Crawl-delay: 4
</pre>


Add the following lines to your httpd config file: /etc/httpd/conf.d/domainname.conf
Add the following lines to your httpd config file: /etc/httpd/conf.d/domainname.conf

Revision as of 10:19, 5 November 2024

Stop the Bandwidth Hogs

An Apache rewrite rule to block the pesky httpd_User_Agents that slow up my server because of the bandwidth they use up.

I have built up this list of problem user-agents over months of log analysis. Most of these agents ignore robits.txt or at best, ignore the crawler-delay command in robots.txt:

User-agent: *
Crawl-delay: 4	

Add the following lines to your httpd config file: /etc/httpd/conf.d/domainname.conf

multi-line version

RewriteCond %{HTTP_USER_AGENT} \
    Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|\
    Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|\
    BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|\
    Barkrowler|MJ12Bot|Semrush|GPTBot|Qwantbot [NC]

RewriteRule . - [R=429,L]

single line version

RewriteCond %{HTTP_USER_AGENT} Custom-AsyncHttpClient|EzoicBot|Turnitin|sqlmap|wikiteam3dumpgenerator|Seekport|HawaiiBot|SenutoBot|Go-http-client|AwarioBot|Wget|Bytespider|BLEXBot|webmeup-crawler|AhrefsBot|Amazonbot|ImagesiftBot|DataForSeoBot|Barkrowler|MJ12Bot|Semrush|GPTBot|Qwantbot [NC]

RewriteRule . - [R=429,L]

I prefer to use error code 429 [Too Many Requests] but if you prefer to block with httpd error code 410 [Gone] then change the RewriteRule to the example below:

RewriteRule ^.*$ - [F,L]

Test your config first!

Run the following command to test your httpd config file before restarting!

sudo apachectl configtest

If there are no errors then restart your apache daemon to begin blocking the pests with your new rules!