Migrating complex WooCommerce product data to Shopify
Why migrating to Shopify can be difficult
Older sites running on WooCommerce can become rather complex, especially because Woo is so open-ended. Simply "migrating to Shopify" isn't so simple. Yes, there's a wealth of paid "bulk importers" that will bring products over, but for sites running at scale? You'll be left with a field of skeletons. Most of your carefully curated product data will be gone. Even some standard fields (like product weight and country origin) might not be set.
Welcome to the wonderful world of Shopify. Instead of complaining about plugins, you can complain about apps. Just like Woo, many Shopify apps are absolutely great...but apps are especially good at doing something general. Imports and migrations tend to be very specific.
For sites at scale, there's also hundreds of thousands (or millions) of distinct SKUs, with different bits of product data scattered across various metadata fields, custom tables, or join tables. Some data maps 1-to-1 in Shopify, some doesn't. As an added bonus, there's also partner integrations. For example, dropship systems. Many of these systems (surprisingly) have very basic integrations and have a specific way in which they want product data "bound" to them.
All these factors combined means that migrating a complex site can be difficult. It often requires ad-hoc choices and compromises that no out-of-the-box import app can handle.
For stakeholders that want immediate results, it may be worth hiring a specialist (like me!). For developers tasked with "getting it done", maybe something in this mess of words will help you.
Creating flat product data from WooCommerce
Most engineers that work with WooCommerce at scale eventually roll their eyes at the post_meta table. It balloons, quickly, and proves how WooCommerce is designed more for flexibility than scale...which is true of Wordpress in general. I'm not trying to throw shade at Wordpress/WooCommerce, but it's important to be aware of a platform's weaknesses. The nature of the Wordpress/WooCommerce schema can make it frustrating to work with at times.
My approach was to create a series of staging tables using temporary queries. This proved to be far more performant because I could avoid repeat queries against big tables like wp_postmeta.
CREATE TEMPORARY TABLE IF NOT EXISTS categories (INDEX tindex (id)) SELECT p.id, GROUP_CONCAT(DISTINCT cat_term_data.name) as "categories" FROM wp_posts p LEFT JOIN wp_term_relationships tr ON tr.object_id = p.ID LEFT JOIN wp_terms cat_term_data ON cat_term_data.term_id = tr.term_taxonomy_id LEFT JOIN wp_term_taxonomy td_category ON ( td_category.term_taxonomy_id = tr.term_taxonomy_id ) WHERE p.post_status = 'publish' AND td_category.taxonomy = 'product_cat' GROUP BY p.ID;
This SQL above extracts product categories and implodes them into a comma-separated string, indexed per post ID. As you can imagine, there are many similar steps required to create simple, fast associations of complex data. Another example below shows extracting some variation-level data -- this is data that Shopify specifically needs at a variation level, such as thumbnails and SKUs.
Why are we doing this? Because what I need is a simple export that contains all relevant data for each SKU. In Woo, a lot of this data isn't stored 1-to-1 per each SKU, it's stashed in post_meta or other tables. Collapsing such data into a comma-separated list is more ideal for being able to update or create each SKU in Shopify. This would likely be true with an export to any other system, too.
CREATE TEMPORARY TABLE IF NOT EXISTS variations (INDEX tindex (post_parent)) SELECT p.post_parent, p.ID as "variation_id", pm_sku.meta_value as "sku", pi.guid as "variation_image_url" FROM wp_posts p INNER JOIN wp_postmeta pm_sku ON (pm_sku.post_id = p.ID AND pm_sku.meta_key='_sku') LEFT JOIN wp_postmeta pm_thumb_id ON (pm_thumb_id.post_id = p.ID AND pm_thumb_id.meta_key='_thumbnail_id') LEFT JOIN wp_posts pi ON pi.ID = pm_thumb_id.meta_value WHERE p.post_status = 'publish' AND p.post_parent != 0;
With these tables created, you can create a main product table that uses a single, simple join. This is querying against the entirety of wp_posts, so eliminating as many expensive joins as possible is a very important optimization. Try doing this query with joins against post meta and watch as your database burns to the ground. Actually, don't do that.
CREATE TEMPORARY TABLE IF NOT EXISTS products (INDEX tindex (product_id)) SELECT p.ID as 'product_id', DATE_FORMAT(p.post_date, '%Y-%m-%dT%TZ') as "publishDate", p.post_title as 'title', attributes.sku, p.post_type, attributes.country_of_origin, attributes.tax_class, attributes.categories, attributes.brands, attributes.tags, attributes.weight, attributes.final_sale, attributes.dropshipper, attributes.age_stage as "age", attributes.gender, attributes.criteria, attributes.style, p.post_name as handle FROM wp_posts p INNER JOIN attributes ON p.ID = attributes.id WHERE p.post_status = 'publish' ;
This works well for moderate scale, and should perform even with millions of products without too much sweating depending on your DB's resources. That's because joins are done via post IDs and the progressing building of intermediate temporary tables helps cherry-pick only relevant data from tables that are otherwise not very performant (like repeat, repeat calls to post meta).
I can't stress this enough. Post_meta has string-based key. Using temporary tables to reshuffle data such that it can be easily extracted with a primary key index is vastly better -- plus, these temporary tables are orders of magnitude smaller than post_meta. It's a lot of work, and this is just one strategy for dealing with Woo's notorious schema at scale.
For example, many sites at this scale could interact with something like ElasticCache to fetch bulk product data, avoiding the DB entirely. That would have been far more ideal, but for my specific use case, it was not reliable.
Wait, this is just the first step...?
Yes. For a moderately complex site, just the SQL (or other process) to extract all product data can become fairly massive. Now for the difficult part.
The next step is mapping the data into Shopify, which can feel like shuffling two decks together when the cards are different shapes and also obey different physics. Granted, I'm not the best card shuffler even in ideal circumstances.
Not every bit of data is straightforward. Shopify's APIs and developer ecosystem is great, but they do have two APIs. Many developers will immediately reach for the REST API because (let's be honest), it's more familiar and more widely supported by third party tools. Unfortunately, you can't update every piece of product data with only REST in Shopify.
For example, if you or your client need to migrate the tax code as was set in WooCommerce, that field can only be set using the GraphQL API in Shopify. I'm sure that Shopify's API's will have parity eventually (or more likely, Shopify will phase out the REST API entirely)...but for now, you're best bet is to use GraphQL.
For mature sites, taxes are sort of a big deal...and most importers that I've seen don't even try to touch this field. This field is "product category"...which does not align with Woo's idea of a category at all. Woo's product categories map more cleanly to tags or collections in Shopify.
It's a similar story with the product's country of origin -- important for anyone that does business internationally, but few out-of-the-box importers will touch it because it requires setting data on an InventoryItem, not even a Variant. It might not seem like a huge deal, but that one added field can be critical.
To round out the mapper's requirements, it also has to respect Shopify's strict rate limiting.
Example implementation of a custom Woo to Shopify importer
It isn't perfect, and it's tailored specifically to my use case....but this elixir project can be a starting point. I picked Elixir for two reasons. First, I wanted to say "oh, yeah, I've done a project in Elixir" to sound more competent than I really am. Was that being too honest?
Actually, I was genuinely curious about working in a functional language and knew that it would be well suited to parallelism. In fact, it was very easy to wire up a supervisor to structure requests such that they respected Shopify's API limits.
This might not work for especially large sets of products (although should run fine with a few hundred thousand SKUs) and could use a few overhauls and improvements -- but it does a good job of exploring how complex a seemingly simple task ("migrate us to Shopify") can sometimes be! There's also some functionality to import other resources like orders.
I would definitely add more functionality if revisiting this project for even greater scales of project data. I would definitely store Shopify resources in a persistent store instead of needing to use their API to paginate resources each time I want to run an update. Similarly, I might store the raw Woo export in a persistent database instead of creating a CSV. That said, reading a CSV into memory can scale very well -- a few GB of memory can hold a ton of product data.
Is there a plugin to make this easier?
I don't know, maybe? Probably not. The whole point of migrating a complex site is that it's complex! You have to deal with all the complexities and caveats of the legacy system and somehow fold them into the caveats and complexities of the new system. There's no simple way to do this, really. Every organization and application is different. For many, it will be effortless. For others, it seems so daunting that it might make stakeholders think migration is impossible or too expensive.
In a perfect world, it would be as simple as clicking a few buttons...and for sites with little custom data, it often is.
Hopefully this article helps someone as they go about migrating, but if you did find something confusing or have questions, feel free to reach out!