When building a WordPress or WooCommerce site, enabling Nginx FastCGI cache can significantly improve page load speed. However, there’s one issue: the cache is only generated upon the first visit. This means that when a search engine crawler or a user opens a page for the first time—or when the cache has expired—the request will still trigger a slower dynamic response.
To avoid this, we need a cache warming script that periodically crawls your sitemap to ensure all pages are pre-cached. This way, visitors always enjoy an instant “first-click load” experience.
In this article, we’ll share an optimized Python script (cache_warmer.py
) and Bash script (cache_warmer.sh
) designed specifically for WordPress sites. The solution supports multi-site management, runs efficiently even on low-resource servers, and intelligently skips dynamic pages (like login pages) as well as static files (like images). We’ll also provide the relevant Nginx configuration and deployment steps to help you easily boost your site’s performance.
Script Feature Overview
Our auto-warming script includes the following core functions:
1、Sitemap Parsing
- Supports recursive parsing of
sitemap_index.xml
- Collects all page URLs
2、Intelligent Filtering
- Skips static resources (
.jpg
,.css
,.js
, etc.) - Skips manually specified non-cacheable URLs
- Skips URLs that already have an existing cache file
3、Concurrent Warming
- Uses
ThreadPoolExecutor
for multithreaded requests to uncached pages - Default thread count is configurable (e.g., 10)
4、Multi-Site Support
- Configure multiple WordPress sites via
sites.json
5、Logging
- Generates clear log files to record new caches, failed URLs, and runtime duration
- Newly cached URLs → written to
new_urls.log
- Failed URLs → written to
failed_urls.log
- Process logs → written to
cache_warmer.log
6、Result Statistics
- Prints counts of new, skipped, and failed URLs
- Reports total runtime
7、Low Resource Usage
- Optimized thread management and log writing, suitable for low-spec servers (e.g., single-core CPU, 512MB RAM)
Code Implementation
1、Python Script: cache_warmer.py
The following is the core Python script, responsible for parsing the sitemap, warming up the cache, and generating logs:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import json
import time
import hashlib
import logging
import requests
from urllib.parse import urlparse
from concurrent.futures import ThreadPoolExecutor, as_completed
from lxml import etree
from tenacity import retry, stop_after_attempt, wait_fixed
# ================= Configuration =================
CACHE_METHOD = "GET"
THREADS = 3
TIMEOUT = 15
HEADERS = {"User-Agent": "Mozilla/5.0 (CacheWarmer/1.0)"}
SITES_FILE = "sites.json"
LOG_DIR = "./logs"
FAILED_LOG_FILE = os.path.join(LOG_DIR, "failed_urls.log")
NEW_LOG_FILE = os.path.join(LOG_DIR, "new_urls.log")
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
handlers=[
logging.StreamHandler(),
logging.FileHandler(os.path.join(LOG_DIR, "cache_warmer.log"))
]
)
# ================= Utility Functions =================
def ensure_log_dir():
"""Ensure the log directory exists and clear log files"""
if not os.path.exists(LOG_DIR):
try:
os.makedirs(LOG_DIR)
logging.info(f"Successfully created log directory: {LOG_DIR}")
except OSError as e:
logging.error(f"Failed to create log directory: {LOG_DIR}, error: {e}")
raise
for log_file in [FAILED_LOG_FILE, NEW_LOG_FILE]:
with open(log_file, 'w', encoding='utf-8') as f:
f.write("")
def fastcgi_cache_path(site: dict, url: str, method: str = CACHE_METHOD) -> str:
"""Generate FastCGI cache file path from Nginx fastcgi_cache_key using sitemap URL"""
cache_dir = site.get("cache_dir")
if not cache_dir:
raise ValueError(f"Site {site.get('name', 'unknown')} has no cache_dir defined.")
if not site.get("name"):
raise ValueError("Site has no name defined.")
parsed = urlparse(url)
scheme = parsed.scheme
host = parsed.netloc
request_uri = parsed.path
if parsed.query:
request_uri += f"?{parsed.query}"
key_str = f"{scheme}{method}{host}{request_uri}"
md5_name = hashlib.md5(key_str.encode('latin-1')).hexdigest()
subdir1 = md5_name[-1]
subdir2 = md5_name[-3:-1]
return os.path.join(cache_dir, subdir1, subdir2, md5_name)
def parse_sitemap(url: str) -> list:
"""Recursively parse sitemap_index and sitemap, return list of URLs"""
urls = []
try:
resp = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
if resp.status_code != 200:
logging.warning(f"⚠️ Unable to access {url}: {resp.status_code}")
return urls
tree = etree.fromstring(resp.content)
locs = tree.xpath("//*[local-name()='loc']")
for loc in locs:
if loc.text:
url_text = loc.text.strip()
if url_text.endswith('.xml') or url_text.endswith('.xml/'):
urls.extend(parse_sitemap(url_text))
else:
urls.append(url_text)
except Exception as e:
logging.warning(f"⚠️ Failed to parse sitemap {url}: {e}")
return urls
@retry(stop=stop_after_attempt(3), wait=wait_fixed(2))
def warm_url(site_name: str, url: str) -> tuple[bool, str]:
"""Request URL and return success status and reason"""
try:
resp = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
return resp.status_code == 200, f"Status: {resp.status_code}"
except requests.exceptions.RequestException as e:
logging.error(f"Request failed {url}: {str(e)}", exc_info=True)
return False, f"Error: {str(e)}"
def warm_site(site: dict) -> dict:
"""Warm up cache for a single site"""
site_name = site.get("name", "unknown")
cache_dir = site.get("cache_dir")
if not cache_dir or not os.path.isdir(cache_dir):
logging.error(f"Cache directory {cache_dir} does not exist or is not accessible")
return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}
if not os.access(cache_dir, os.R_OK):
logging.error(f"Cache directory {cache_dir} is not readable")
return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}
logging.info(f"--- 🚀 Warming cache for {site_name} ---")
start_time = time.time()
urls = []
for sitemap_url in site.get("sitemaps", []):
urls.extend(parse_sitemap(sitemap_url))
urls = list(set(urls))
total_urls = len(urls)
if total_urls == 0:
logging.warning(f"⚠️ No URLs found for {site_name}. Please check your sitemap configuration or network connection.")
return {"site": site_name, "total": 0, "new": 0, "skipped": 0, "failed": 0, "skipped_static": 0, "skipped_nocache": 0, "time": 0}
count_new, count_skipped, count_failed = 0, 0, 0
count_skipped_static, count_skipped_nocache = 0, 0
new_urls = []
static_exts = ('.jpg', '.jpeg', '.png', '.gif', '.css', '.js', '.ico', '.svg', '.woff', '.woff2', '.ttf', '.webp')
no_cache_urls = site.get("no_cache_urls", [])
urls_to_warm = []
for url in urls:
if url.lower().endswith(static_exts):
count_skipped_static += 1
continue
if url in no_cache_urls:
count_skipped_nocache += 1
continue
if os.path.exists(fastcgi_cache_path(site, url)):
count_skipped += 1
continue
urls_to_warm.append(url)
threads_to_use = min(len(urls_to_warm), THREADS) if urls_to_warm else 1
logging.info(f"Starting {threads_to_use} threads for warming...")
with ThreadPoolExecutor(max_workers=threads_to_use) as executor:
future_to_url = {executor.submit(warm_url, site_name, url): url for url in urls_to_warm}
for future in as_completed(future_to_url):
url = future_to_url[future]
try:
success, reason = future.result()
if success:
count_new += 1
new_urls.append(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url}")
else:
count_failed += 1
with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
f.write(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} ({reason})\n")
except Exception as e:
count_failed += 1
with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
f.write(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} (Exception: {str(e)})\n")
if new_urls:
with open(NEW_LOG_FILE, 'a', encoding='utf-8') as f:
f.write("\n".join(new_urls) + "\n")
logging.info(f"Newly cached URLs: {len(new_urls)}")
if count_failed:
logging.info(f"Failed URLs saved to {FAILED_LOG_FILE}")
elapsed = time.time() - start_time
logging.info(f"--- ✅ {site_name} done: total {total_urls}, new {count_new}, skipped {count_skipped}, failed {count_failed}, skipped static {count_skipped_static}, skipped no-cache {count_skipped_nocache}, time {elapsed:.2f}s ---")
return {"site": site_name, "total": total_urls, "new": count_new, "skipped": count_skipped, "failed": count_failed, "skipped_static": count_skipped_static, "skipped_nocache": count_skipped_nocache, "time": elapsed}
# ================= Main =================
def main():
start_all = time.time()
if not os.path.exists(SITES_FILE):
logging.error(f"{SITES_FILE} file does not exist")
return
try:
ensure_log_dir()
except Exception as e:
logging.error(f"Script cannot run: {e}")
return
with open(SITES_FILE, "r") as f:
sites = json.load(f)
for site in sites:
warm_site(site)
elapsed_all = time.time() - start_all
logging.info(f"\n=== 🎯 All sites warmed, total time {elapsed_all:.2f}s ===")
print("\n") # Print empty line to console
with open(os.path.join(LOG_DIR, "cache_warmer.log"), 'a', encoding='utf-8') as f:
f.write("\n") # Append empty line to log file
if __name__ == "__main__":
main()
Key Points
- Threads:
THREADS = 3
, suitable for low-spec servers, balancing performance and resource usage. - Filtering Rules: Skips static files and non-cacheable pages (e.g.,
oddbbo.com/wishlist
). - Logging: Outputs to
logs/cache_warmer.log
, with a blank line appended after each run for separation. - Retry Mechanism: Uses the tenacity library to automatically retry failed URLs up to 3 times.
2、Configuration File: sites.json
Configure multiple WordPress sites with their sitemap URLs and cache directories:
[
{
"name": "soezworld",
"sitemaps": ["https://soez.world/sitemap_index.xml"],
"cache_dir": "/cache/fastcgi_cache/soezworld",
"no_cache_urls": []
},
{
"name": "oddbboworld",
"sitemaps": ["https://oddbbo.world/sitemap_index.xml"],
"cache_dir": "/cache/fastcgi_cache/oddbboworld",
"no_cache_urls": []
},
{
"name": "websitesoez",
"sitemaps": ["https://websitesoez.com/sitemap_index.xml"],
"cache_dir": "/cache/fastcgi_cache/websitesoez",
"no_cache_urls": []
},
{
"name": "oddbbo",
"sitemaps": ["https://oddbbo.com/sitemap_index.xml"],
"cache_dir": "/cache/fastcgi_cache/oddbbo",
"no_cache_urls": [
"https://oddbbo.com/wishlist",
"https://oddbbo.com/random",
"https://oddbbo.com/my-account"
]
}
]
Note: Make sure the URLs in no_cache_urls
match the format in the sitemap (with or without a trailing slash).
3、Bash Script: cache_warmer.sh
Automate the execution of the Python script:
#!/bin/bash
# ------------------------------------------
# Cache Warmer Automation Script
# ------------------------------------------
# Script directory
SCRIPT_DIR="/cache/fastcgi_cache_warmer"
# Python executable path
PYTHON_BIN="/usr/bin/python3"
# Log directory
LOG_DIR="$SCRIPT_DIR/logs"
# Ensure the log directory exists
mkdir -p "$LOG_DIR"
# Change to the script directory
cd "$SCRIPT_DIR" || exit 1
# Run the Python script and output to console
$PYTHON_BIN cache_warmer.py
Note: The script does not redirect output. Logs are written by the Python script to the logs/
directory, generating three log files: cache_warmer.log
, new_urls.log
, and failed_urls.log
.
4、Nginx Configuration
Ensure Nginx FastCGI cache works correctly with the script. If your sitemap URLs include a trailing slash, use:
rewrite ^/(.+[^/])$ /$1/ permanent;
For the full Nginx FastCGI cache configuration, please refer to: Configure Nginx FastCGI Cache.
Deployment and Verification
Deployment Steps
1、Install dependencies:
pip install requests lxml tenacity
2、Save the Scripts:
- Place
cache_warmer.py
andsites.json
into/cache/fastcgi_cache_warmer/
, or any directory of your choice. Make sure to update the paths in the script accordingly. - Save
cache_warmer.sh
and give it execute permissions:
chmod +x /cache/fastcgi_cache_warmer/cache_warmer.sh
3、Set Up a Scheduled Task
In aaPanel → Scheduled Tasks, add a Shell script task, set the execution interval, and enter the following script content:
/cache/fastcgi_cache_warmer/run_cache_warmer.sh
4、Run a Test:
./cache_warmer.sh
Verify Cache Effectiveness
1、Check the Logs:
cat logs/cache_warmer.log
2、Sample Output:
2025-08-22 15:00:00 [INFO] --- 🚀 Warming cache for websitesoez ---
2025-08-22 15:00:02 [INFO] Starting 3 threads for warming...
2025-08-22 15:00:02 [INFO] --- ✅ websitesoez done: total 192, new 0, skipped 100, failed 0, skipped static 92, skipped no-cache 0, time 1.61s ---
2025-08-22 15:00:02 [INFO] --- 🚀 Warming cache for oddbbo ---
2025-08-22 15:00:10 [INFO] Starting 3 threads for warming...
2025-08-22 15:00:10 [INFO] --- ✅ oddbbo done: total 137, new 0, skipped 87, failed 0, skipped static 47, skipped no-cache 3, time 7.35s ---
2025-08-22 15:00:10 [INFO] === 🎯 All sites warmed, total time 18.60s ===
3、Check the Cache Files:
find cache/fastcgi_cache/oddbbo/ -type f
4、Verify the Response Headers:
curl -I https://oddbbo.com/some-page | grep X-Cache
X-Cache: HIT
indicates a cache hit.
Optimization and Considerations
1、Low-Spec Server Optimization:
Set THREADS
to 1 or 2 (in cache_warmer.py
).
Batch-write failed URLs to reduce I/O:
failed_urls = []
if not success:
count_failed += 1
failed_urls.append(f"{time.strftime('%Y-%m-%d %H:%M:%S')} - {site_name}: {url} ({reason})")
if failed_urls:
with open(FAILED_LOG_FILE, 'a', encoding='utf-8') as f:
f.write("\n".join(failed_urls) + "\n")
2、URL Format Consistency:
Check the Sitemap:
curl https://oddbbo.com/sitemap_index.xml | grep -E "wishlist|random|my-account"
If a URL has a trailing slash, update the corresponding entry in no_cache_urls
in sites.json
.
Important Note
In the Nginx configuration, cache files are typically stored in:
fastcgi_cache_path /cache/fastcgi_cache levels=1:2 keys_zone=MYCACHE:100m inactive=1d max_size=1g;
The levels=1:2
setting means:
- levels: This parameter determines the directory hierarchy of the cache files and the length of the directory names at each level.
- 1:2: A colon-separated list specifying the number of characters for each directory level.
- 1 (first level): The first-level subdirectory name is formed from the last 1 character of the cache key.
- 2 (second level): The second-level subdirectory name is formed from the 2 characters counting backward from the second-to-last character of the cache key.
How It Works
- Generate Cache Key: Nginx generates a unique cache key based on the rules defined by
fastcgi_cache_key
. - Compute Directory Path: Nginx takes the MD5 hash of the cache key (a 32-character hexadecimal string, e.g.,
d41d8cd98f00b204e9800998ecf8427e
). - Apply
levels
Rule:- Start from the end of the hash string.
- Use the last 1 character as the first-level directory name.
- Use the second-to-last and third-to-last characters as the second-level directory name.
Example
Suppose the MD5 hash of a cache key for a request is:d41d8cd98f00b204e9800998ecf8427e
,
- The last character is
e
→ first-level directory name:e
- The second-to-last and third-to-last characters are
27
→ second-level directory name:27
Therefore, the cache file will be stored at:
/cache/fastcgi_cache/e/27/d41d8cd98f00b204e9800998ecf8427e
Make sure your cache path matches this structure exactly, as it is required for our script to function correctly.