Limiting Chat GPTBot Crawl Rate

Limiting Chat GPTBot Crawl Rate

Recently ChatGPT's crawler robot GPTBot had been crawling this site so heavily that it has raised the processor load so high that it had rendered the site inoperable to normal users.

It was hitting the server from multiple ip addresses more than 30 times per second.

For the benefit of others, here are various ways to prevent this crawler from slowing down your Linux server.

Full User-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

To limit the crawl rate and ensure your server remains usable, you can control how frequently GPTBot (and other bots) request pages by configuring your robots.txt file or using rate-limiting techniques at the server level.

Step 1: Adjust Crawl Rate with robots.txt

You can limit the crawl rate for GPTBot by adding the following to your robots.txt file:

December 2025

The following crawl delay seems to be working. The extra instructions after the crawl delay are specific to MediaWiki servers. They allow the site to be crawled but prevent processor-expensive queries which can create infinite loops or extremely large URL trees.

# Crawl-delay: This sets a delay (in seconds) between each request from the bot. Adjust the value (e.g., 10 seconds) to a rate that suits your server load.
User-agent: GPTBot
Crawl-delay: 10
Disallow: /*action=
Disallow: /*oldid=
Disallow: /*curid=
Disallow: /*diff=
Disallow: /*printable=
Disallow: /*redlink=
Disallow: /*mobileaction=
Disallow: /index.php?title=Special:

User-agent: ChatGPT-User
Crawl-delay: 10
Disallow: /*action=
Disallow: /*oldid=
Disallow: /*curid=
Disallow: /*diff=
Disallow: /*printable=
Disallow: /*redlink=
Disallow: /*mobileaction=
Disallow: /index.php?title=Special:

Step 2: Use Server-Side Rate Limiting

You can also set rate limits using server-side tools like mod_qos (for Apache) or ngx_http_limit_req_module (for NGINX). These modules help manage how many requests are allowed per second per IP address.

NGINX Configuration (if you are using NGINX):

http {
    limit_req_zone $binary_remote_addr zone=bot_zone:10m rate=1r/s;

    server {
        location / {
            limit_req zone=bot_zone burst=5 nodelay;
        }
    }
}

This limits bots to 1 request per second, with a burst capacity of 5.

Apache Configuration (if you are using Apache):

You can use mod_qos to limit the requests:

QS_SrvRequestRate 1

This limits requests to 1 per second.

Step 3: Use Fail2Ban for Rate-Limiting Bots (Advanced)

If you are using Fail2Ban with iptables or firewalld, you can also set up a Fail2Ban rule to detect excessive bot traffic and throttle it:

Create a custom jail for GPTBot in /etc/fail2ban/jail.local:

[gptbot]
enabled  = true
port     = http,https
filter   = gptbot
logpath  = /var/log/apache2/access.log  # or /var/log/nginx/access.log
maxretry = 10
findtime = 60
bantime  = 600

Create a filter in /etc/fail2ban/filter.d/gptbot.conf:

[Definition]
failregex = <HOST> - - .*"GET .* HTTP.*" .* "GPTBot"

This will ban IPs that send more than 10 requests in 60 seconds for 10 minutes.

By combining robots.txt settings and server-side rate limiting, you can control bot activity and prevent server overload.

Completely Block Chat GPT Bots from your server

As an absolute last resort you can use the following iptables rules to block all Chat GPT Ip addresses from your server. This list is in CIDR format. These rules can easily be modified for use with firewalld or another firewalls. IP List valid as of December 2025

/sbin/iptables -I INPUT -s 132.196.86.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 172.182.202.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 172.182.204.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 172.182.207.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 172.182.214.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 172.182.215.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 20.125.66.80/28 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 20.171.206.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 20.171.207.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 4.227.36.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 52.230.152.0/24 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.175.128/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.227.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.227.128/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.228.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.230.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.241.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.241.128/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.242.0/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.243.128/25 -j REJECT # Chat GPT
/sbin/iptables -I INPUT -s 74.7.244.0/25 -j REJECT # Chat GPT

How to check what the Bot see when they crawl your site [linux]

If you want to see what a specific bor oe crawler sees when they spider your website, use the command

 curl -A "BOTNAME" -I http://website_to_test

See what Google sees when it spiders cookipedia.co.uk

curl -A "Googlebot" -I https://www.cookipedia.co.uk/

See what A certain backlinks pest sees when it spiders cookipedia.co.uk

curl -A "SERankingBacklinksBot" -I https://www.cookipedia.co.uk/

See what A GPTBot sees when it spiders cookipedia.co.uk

curl -A "GPTBot" -I https://www.cookipedia.co.uk/

#tools #chatGPT #Robotstxt #fail2ban #iptables #firewalld #serverload #apache #webcrawler #pest