Serve Static Drupal Content Faster With
Boost And Nginx
By Stephen Jayna, 23rd December 2009
For Drupal sites that receive a not insignificant amount of anonymous traffic Boost is for you. Following on from yesterday's article on XCache where we went from 49 to 132 requests per second, we'll show you how Boost has taken us to an eye-popping 2516 requests per second for static Drupal content.
While it won't benefit everyone, there are a staggering number of Drupal based sites out there that serve predominately anonymous content. If you fall into this category you could do worse than consider adding Boost to your architecture.
Static vs. Dynamic Content
First things first. We need to define exactly what we mean by static Drupal content or anonymous traffic. Essentially it's content that remains the same no matter who is looking at it. Facebook, for example, is a site where each and every page is tailored for you: it's very dynamic. Conversely The Times is almost (as far as I can tell) entirely static — adverts withstanding — and would be a prime candidate for Boost.
How Does Boost Work?
Boost is a module that replaces Drupal's in-built anonymous page caching. When a page is generated by Drupal it is written by Boost to the file system. This allows your web server to serve a static file (if it's available) instead of invoking PHP. Take a look below at Everita's cache:
root@everita:/var/www/drupal/6/drupal/cache# pwd /var/www/drupal/6/drupal/cache root@everita:/var/www/drupal/6/drupal/cache# find . ./perm ./perm/www.everita.com ./perm/www.everita.com/.boost ./perm/www.everita.com/sites ./perm/www.everita.com/sites/everita.com ./perm/www.everita.com/sites/everita.com/files ./perm/www.everita.com/sites/everita.com/files/css ./perm/www.everita.com/sites/everita.com/files/js ./perm/www.everita.com/javascript ./perm/boost-gzip-cookie-test.html.gz ./.boost ./normal ./normal/www.everita.com ./normal/www.everita.com/.boost ./normal/www.everita.com/test_.html ./normal/www.everita.com/_.html ./normal/www.everita.com/contact-everita_.html ./normal/www.everita.com/access-denied_.html ./normal/www.everita.com/bookshelf_.html ./normal/www.everita.com/about-everita_.html ./normal/www.everita.com/page-not-found_.html ./normal/www.everita.com/search-results_.html ./normal/www.everita.com/pixel-portraits-facial-recognition-opencv_.html ./normal/www.everita.com/how-the-newton-virus-was-made_.html ./normal/www.everita.com/subversive-sightseeing-interactive-video-telescopes-bu0836_.html ./normal/www.everita.com/thank-you-for-contacting-us_.html ./normal/www.everita.com/unodb-documentation_.html ./normal/www.everita.com/thank-you-for-your-request_.html ./normal/www.everita.com/comment ./normal/www.everita.com/comment/reply ./normal/www.everita.com/iphone-app-and-mobile-phone-development_.html ./normal/www.everita.com/mysql-lamp-and-drupal-services-from-everita_.html ./normal/www.everita.com/lightwave-collada-and-opengles-on-the-iphone_.html ./normal/www.everita.com/software-design-and-development-in-oxford-and-reading_.html
What you can see above is a static version — ready to serve — of almost every page in the Everita website. Be warned that you must use Clean URLs for Boost to work.
What sets Boost apart how well it is integrated into Drupal compared to something like Varnish. One rather excellent feature is that it knows what pages exist in the site and will crawl them thus warming the cache for you. This gets around the problem of one user having endure a tedious delay while the page is made for the first time.
Time Is Money
This is very important for sites which a substantial amount of content. It's usually the case that the vast majority of pages are only visited once or twice a day (the so called long-tail). Thus — chances are — they won't already be in the cache. You could argue this doesn't matter. After all if they are rarely in demand why worry about caching them?
The point is this: according to research by Amazon and Google even a 500ms delay could result in 20% less traffic. While 500ms may seem insignificant, 20% certainly isn't. Warming your cache is important: don't waste your users' time by having them do it.
Installing Boost
Boost is no different than any other Drupal module, download and extract it to your modules folder:
cd /var/www/drupal/6/drupal/sites/all/modules wget http://ftp.drupal.org/files/projects/boost-6.x-1.17.tar.gz tar -xzvf boost-6.x-1.17.tar.gz rm boost-6.x-1.17.tar.gz
Enable the module in Drupal by checking Boost, under the Caching heading at:
http://www.yourwebsite.com/admin/build/modules
Now configure the Boost module at:
http://www.yourwebsite.com/admin/settings/performance/boost
I had to create a directory called 'cache' under my document-root with permission for my webserver to write it. The Drupal status report will tell you if anything is awry:
http://www.yourwebsite.com/admin/reports/status
Once that's done you can start configuring Boost, it has a myriad of options. I'll explain what I changed in order to get the best for my specific setup.
Configuring Boost For Nginx
Firstly I turned off Gzip page compression as Nginx does this for me. Obviously there's another performance gain to be had by serving up pre-zipped content rather than have Nginx do it on-the-fly. However, for the sake of simplicity, we'll leave this off for now.
Next I disabled caching of XML, CSS and JavaScript. Drupal continues to do this more than adequately leaving static files under /sites/everita.com/files/ (assuming you've enabled bandwidth optimizations). Boost has only taken over page caching, nothing else.
Finally I enabled the cron crawler as discussed above. The rest I've left for the time being, clearly you can tailor the other options as you see fit.
So, Is It Working? Where Are My Cache Files?
Assuming your files are being cached under 'cache' (the default) you should begin to see .html files appearing. Note that if you're logged in — presumably as an administrator — you won't cause files to be cached as you meander through the site: you need to log out, browse the site, and check again.
cd /var/www/drupal/6/drupal/cache find .
Configuring Nginx
As it stands you're now producing beautifully static .html files but as yet no one is reaping the benefits. We need to tell Nginx to serve cache files if they exist, reverting back to PHP and Drupal if they don't. Without any further hesitation here is that all important snippet from my configuration file:
/etc/nginx/sites-available/mysqlperformancetuning.conf
server { . . . set $boost ""; set $boost_query "_"; if ( $request_method = GET ) set $boost G; } if ($http_cookie !~ "DRUPAL_UID") { set $boost "${boost}D"; } if ($query_string = "") { set $boost "${boost}Q"; } if ( -f $document_root/cache/normal/$host$request_uri$boost_query.html ) { set $boost "${boost}F"; } if ($boost = GDQF){ rewrite ^.*$ /cache/normal/$host/$request_uri$boost_query.html break; } if (!-e $request_filename) { rewrite ^/(.*)$ /index.php?q=$1 last; rewrite /(.*)/$ /index.php?q=$1 last; break; } }
Credit and thanks go to Mechanix for a healthy amount of direction.
Essentially the above states that a cache file may be served under the following circumstances:
- The request is a GET
- You're an anonymous user and not logged in
- There aren't any URL parameters
- The file requested exists in the cache
- Otherwise refer it on to Drupal as before
The $boost_query variable refers to 'Character used to replace "?"' under 'Generated output storage (HTML, XML, AJAX)' in Boost Settings for what it's worth.
That's it! I've a fairly basic site with equally simple URLs so your rules might become more complex but the principle is the same. Make sure you restart Nginx once you've made these modifications:
/etc/init.d/fastcgi restart
Clearing The Cache
The strategy you use for clearing your cache is very dependant on the type of site you have. By default Boost will ignore calls from Drupal to clear the entire cache preferring to refresh it according to its own settings.
I've turned this off by setting 'Ignore cache flushing' to disabled. This lets me continue to use 'Clear cache data' to clear the entire cache when I tinker with the site's CSS for example. I'm a small site, it's less of an issue, my cache can be re-generated quickly. You might need to consider this more carefully. Rest assured Boost affords you plenty of control over when and how this happens.
Conclusion
You can see the difference this has made compared to yesterday's efforts with XCache below. Don't be fooled: you still need XCache or similar — especially if you deliver dynamic content — Boost can't help you there. If your content is predominately static however:
root@everita:~# ab -n 10000 -c 2 http://www.everita.com/ Server Software: nginx/0.6.32 Server Hostname: www.everita.com Server Port: 80 Document Path: / Document Length: 25793 bytes Concurrency Level: 2 Time taken for tests: 3.974 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 260060000 bytes HTML transferred: 257930000 bytes Requests per second: 2516.26 [#/sec] (mean) Time per request: 0.795 [ms] (mean) Time per request: 0.397 [ms] (mean, across all concurrent requests) Transfer rate: 63904.10 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.2 0 2 Processing: 0 1 0.2 1 6 Waiting: 0 0 0.1 0 5 Total: 1 1 0.1 1 6
I could get a further superficial increase if I used the keep-alive option in ab (-k) but it's hardly worth it. As with any benchmark these should be taken with a pinch of salt. The point is, comparing like for like with yesterday's test, Boost is certainly worth considering.
Your Comments
typo
Hi there,
I used your config to tinker around with nginx + apache + boost + drupal. The speed increase is massive to say the least (350req./sec before, 12k req./sec after. yes, that's 12 _k_ ;) ). Thanks for putting the info together.
However, there are typos in your snippet which made nginx complain on init:
if ( $request_method = GET )
lacks a "{"
rewrite ^.*$ /cache/normal/$host/$request_uri$boost_query.html break;
"break;" -> newline?
thanks and best regards,
oliver
What i can use this config
What i can use this config with try_files directive instead of if (!-e $request_filename) {
?location / {
/var/www/virtual/magazon.lg.ua/htdocs;
index index.php;
try_files $uri $uri/ @drupal;
}
location @drupal {
rewrite ^/(.*)$ /index.php?q=$1 last;
}
re: nginx boost config location?
Hello,
Thanks for your comments. I've updated the article accordingly.
You should be able to use the proxy_pass functionality to proxy requests that reach a certain location through to Apache.
http://wiki.nginx.org/NginxHttpProxyModule
Regards,
Steve
nginx boost config location?
Hi, any chance you could clarify where you put the boost nginx config in your example?
It's not clear if it's in nginx.conf or a sites-enabled/example.conf
It's also not clear what the context is, http, server, location etc...
I've got nginx setup proxying through to apache so need to know if nginx can intercept requests and serve static cache else pass on to apache.
Clarification there would be appreciated, thanks for the nice tutorial!
Post new comment