This Small Corner

Eric Koyanagi's tech blog. At least it's free!

How to run Wordpress at Scale

By Eric Koyanagi
Posted on

Wordpress is known for its amazing flexibility and expandability. It isn't famous for good performance, and that's to be expected with how the platform is designed. For example, wp_postmeta is a critical concept and is a table most plugins use to expand post objects with extra data. Using this table is considered "good practice" by many Wordpress developers, but it's an objectively bad way to store metadata at scale.

Understand Wordpress Conventions

The existence of wp_postmeta might imply that using it is the "best practice", but WP's coding styles and plugin best practices do not mention it at all. Don't assume that just because Wordpress provides something using it is automatically a 'best practice'. Consider your DB layer very carefully!

Review the actual guidelines as suggested by Wordpress, but otherwise, it's up to you to make good choices. Don't trust "conventions" otherwise, such as using a given plugin because it's wildly popular or stashing data in postmeta because "that's what it is for". As for the litany of code styles, that's up to you. If you're supporting enterprise-level WP where dev is internal, it's probably fine to stick to a style like PSR-12 instead of making your engineers memorize WP's guide. I personally am not a fan when frameworks insist on their own style guides instead of adopting more universal ones, but this is also because WP is very, very old.

Let's Talk Caching

There's a lot of myth and misunderstanding when it comes to properly caching a Wordpress site, and this is even more complex when you're dealing with WooCommerce. First, you have to understand a bit about the PHP request lifecycle. This will help you understand why some caching solutions are far better than others. It will also help you unravel the ocean of articles that talk about things like opcache without much context.

Application-level caches (anything that happens via a WP plugin) do work, but we're talking about Wordpress at scale. So why don't plugin-based options work as well as some alternatives? After all, the idea is that cached page skips the database entirely -- and each WP page requires a whole mess of queries to generate, so this ought to make a great solution!

But we're awesome people that have read my linked article about the PHP request lifecycle, so we know the steps involved to get to this point...right? Oh, okay, I'll summarize.

First, Apache (or whatever server you use) needs a spare PHP process. Duh, that's because PHP has process-based concurrency. If you're server has more cores, it'll be able to handle more concurrent requests. That's how Apache works! If there's no process available...your request will wait until one is. This is the first weakness of an application-level cache. Great that you've skipped the massive cost in database operations, but you're still dependent on the request being dispatched to Apache and interpreted by PHP.

Second, PHP needs to compile your code. With every request, PHP tokenizes your file and does a two-pass compilation step. This generates the opcodes that are sent to the actual CPU (well, sort of...but let's simplify a bit). This is where the opcache caching technique comes in, because in theory opcache will help skip tokenization and compilation. In practice, it isn't that simple. Opcache works better with PHP-fpm than the classic handler and it depends on having the right configuration and resources. Also, opcache won't affect the speed of I/O or input-bound operations. It's expected that it won't get us a massive bump in performance because this step isn't the most massive bottleneck for most systems.

Third, the caching plugin needs to do its job, which has a tiny bit of overhead, itself. Yes, you've managed to skip a glut of database calls...but your Wordpress can still crash and burn with traffic because it's still reliant on Apache having enough processes available!

This is why having a cache like Varnish (or a CDN like CloudFront) is absolutely critical. With this, you're not even touching apache if the static content already exists. With the above lifecycle in mind, it should be immediately obvious why this is better. This technique allows for massively more throughput, which is really logical since we skip basically everything and are able to just spit out some static content back to the browser.

I'm not saying that plugin-based caches are useless with Wordpress...and there's no reason you can't use both, especially for plugins that have some built-in pre-caching systems. Since they live on the application level, they can use hooks to cache pages even before a visitor hits them. This yields benefit when a user hits a page that isn't in the Varnish cache, as it might still be in the plugin-based cache and therefore saves you (only one) full re-render from the DB.

Wait, there's more. Let's talk Cache Busting!

Varnish or Cloudfront can easily fail with production traffic. I've seen this happen rather often, where there's a robust caching setup that seems fine...but performance is still bad because real-life traffic patterns aren't being looked at closely enough. This is where there's a world of difference between a technology like Varnish out-of-the-box and one that's properly configured.

For example (and probably the main example), consider paid traffic.

For some sites, paid traffic might be massive, and each user comes into a page with a unique query string attached so that analytics can attribute the impression and track the campaign's performance. This is very industry standard, but if you don't happen to know about how the Varnish cache is busted, it will destroy the technique.

Unless configured to ignore certain query string paramters, Varnish will treat each URL as a separate object. When users come in with distinct click IDs or similar, that means none of these users are hitting a cache and the query string is effectively "busting" it. That's a useful thing to know! Sometimes you need to tack a query string onto a page for testing, as this forces it out of the Varnish cache. In this case, though, paid traffic will destroy your carefully-planned caching strategy and probably cripple your site.

You need to be sure that all query string params that might be dynamic are ignored in the Varnish configuration. It can be hard to communicate this to stakeholders, but it's important that they let you know if traffic will start using new query string parameters...else they risk accidentally crushing their own site or degrading the performance globally.

What about WooCommerce?

Honestly...scaling a static WP site is rather easy with the right understanding of caching. With Varnish working as it should, you don't need to be too concerned about some inefficiencies with the database layer or the like because virtually all traffic will be shunted through perhaps multiple layers of caching. That works and it works very well for static content.

What about WooCommerce, though? Since e-commerce involves distinct user sessions and complex dynamic content, it isn't as simple as throwing everything behind CloudFront or Varnish...that said, you should still use these techniques because a huge portion of the site will still be cacheable. With thousands of products, you'll need to think about pre-warming your caches, too -- otherwise you might have a ton of long-tail pages that have an outsized impact on performance because they're constantly out-of-cache due to low traffic. This also applies to product page pagination -- deep pages might not be cached, and that can be a problem as many people use the site.

Be careful when you think about pre-warming your cache, though, because that process can knock you offline, too! It will crawl a potentially massive number of pages, forcing a ton of fresh-from-the-DB re-renders that can drag down performance until it's done. With enough pages, that can be a big issue.

Services like ElasticCache and ElasticSearch are more critical when working with a scaled-out Woo platform. This allows you to avoid very costly DB hits for things like product searches with filters by pushing that data to a specialized system like ElasticSearch. Similarly, ElasticCache can stuff expensive objects into memory so they don't need to be fetched from the database. Product searches and filtering will be some of the biggest bottlenecks, so it is worth designing this carefully.

Most people that browse an e-commerce site won't be adding items to the cart, so the anonymous-level cache is still the most important thing to get right. Once the session actually starts, though, Varnish will no longer work by default. If you have a ton of logged-in users browsing the site, that can be an issue -- and that's where the plugin-based cache can kick in to help avoid expensive DB calls. Optimizing logged-in sessions will rely on more conventional techniques like auto-scaling, ElasticSearch, and ElastiCache.

One thing to be careful about is the wp_postmeta table I mentioned early. Many Woo-based plugins abuse this table, and I'm still baffled that plugin authors believe using postmeta is a good option when it relies heavily on the string-based 'key' field. This snowballs because it then requires multiple, multiple joins against postmeta to obtain product details.

It would be very wise to think about the product metadata you need and use...you know, columns that can have real numeric-based indexes and don't require five or six inner joins on the same table to extract a few key-based fields. Ugh! Postmeta is fine for some use cases, but it doesn't work for enterprise-level scale. As a plugin author, I'd feel the same way -- better to create and clean up the tables you need and have them properly indexed than to use postmeta just because it's already there. The objective isn't to use as few tables as possible, it's to do things efficiently.

What about Headless?

Headless Wordpress is a popular idea that makes some intuitive sense: Wordpress is great as a CMS and tool...but not that great to run and host. Chopping off its head means that Wordpress will serve as the backend CMS, delivering content to some other front end system via API, which presents it to users. This eliminates some security issues and allows you to have a lot more control over performance. Potentially you can even have content exist only in a static form, serving your blog via a CDN and making it very, very fast and very, very cheap.

This is a good option, but not an effortless one -- and in the end, it'll be about as performant as serving content via Varnish, but it isn't like running and configuring Varnish is effortless. It might end up being a simpler implementation overall, at least once you have a good foundation for your headless systems. I would suggest a headless approach for those that want to maximize cost-efficiency and don't want to worry as much about security or maintaining hardware.

Headless WooCommerce is a different beast, though. The benefit might be even greater at a high level, but the complexity is also much greater. Honestly, I'm not sure I understand the idea of Headless WooCommerce...I am skeptical that it has a lot of value.

If you are large enough as a firm to need headless Woo, you should really consider Headless Shopify, instead. This will be far more performant and robust -- otherwise you'll need to ensure the WooCommerce portion of your app is able to keep up with the detached front-end, and that can easily become less efficient than a classic WooCommerce site if done incorrectly. It is often easier to scale the Woo stack you have than try to slice off its head and get fancy. If you aren't careful, you can easily become counterproductive with performance compared to a varnish-backed classic Woo app.

By comparison, Shopify's Headless API is already built and they will handle scaling it. You have to be careful about staying under API limits, but the overall architecture is far, far more reliable and simple.

If you are at the scale where you need Headless WooCommerce...maybe now is a good time to step back and think about what you're really trying to accomplish and consider a Shopify migration. Shopify migrations aren't always as easy as they sound...but hey, if you're stuck, you can hire me!


« Back to Article List
Written By
Eric Koyanagi

I've been a software engineer for over 15 years, working in both startups and established companies in a range of industries from manufacturing to adtech to e-commerce. Although I love making software, I also enjoy playing video games (especially with my husband) and writing articles.

Article Home | My Portfolio | My LinkedIn
© All Rights Reserved