WAF & Bot Blocking: When Your Firewall Stops AI Crawlers
How Web Application Firewalls can accidentally block AI bots from reading your content.
What it measures
This check detects whether a Web Application Firewall (WAF) — Cloudflare, Sucuri, AWS WAF, Akamai — is blocking AI crawlers from accessing your content. WAFs protect against attacks but can accidentally block legitimate AI bots.
Why it matters for AI
When a WAF blocks an AI crawler, it typically serves a CAPTCHA challenge or 403 Forbidden response. AI bots can't solve CAPTCHAs, so your content becomes completely invisible. This is one of the most common reasons otherwise well-optimised content never appears in AI-generated answers.
| User Agent | Used by | What to whitelist |
|---|---|---|
| GPTBot | ChatGPT / OpenAI | Allow in WAF rules |
| ChatGPT-User | ChatGPT browsing | Allow in WAF rules |
| ClaudeBot | Claude / Anthropic | Allow in WAF rules |
| PerplexityBot | Perplexity AI | Allow in WAF rules |
| Google-Extended | Google AI / Gemini | Allow in WAF rules |
How to improve
- Whitelist AI crawler user agents — Add an allow rule in your WAF for the bots listed above
- Check your CDN settings — Cloudflare's "Bot Fight Mode" can block legitimate AI bots
- Test with AI user agents — Use curl with a GPTBot user-agent string to test access
- Monitor WAF logs — Look for blocked requests from legitimate AI crawlers
💡 Quick win
In Cloudflare: go to Security → WAF → Custom Rules. Create an "Allow" rule that matches user agents containing "GPTBot", "ClaudeBot", or "PerplexityBot".
