skip to content

Search

Part V: Data Migration in Wordpress

4 min read

Anyone can migrate a dump of wp_posts, but what about when things get more complicated?

Complicated Wordpress Data Migrations

Forget any plugin that promises to do a “comprehensive” migration with zero effort, that isn’t how migrations work. When you have a big, old WordPress site with bespoke data, you need to put some serious elbow grease into lubricating that data into something new and fresh. In retrospect, I regret using grease-based metaphor to describe this.

Let’s move on.

Denormalizing and Normalize Data

Often, people think it’s better to completely ditch an old CMS and start fresh rather than trying to import content piecemeal. Start fresh, they say. Okay, but…then they want years and years of data to still be there, obviously, because that’s their content.

And the data they need can be bespoke or absurdly messy. If you just do a straight import for every post and post meta, all you’ve done is copy a mess from one pile to another.

post_meta is itself evil. I’ve seen databases with millions of post meta records all to make less than a hundred actual pages!

Get cozy with SQL and denormalize the data.

Denormalizing the data means that we want to gather all the data for a relevant post, page, or product (a high level content) and represent it on one single row.

We do this in SQL by using a GROUP_CONCAT command:

GROUP_CONCAT(CASE WHEN tt.taxonomy='category' THEN t.name END SEPARATOR ', ') as post_categories

If data is stashed in some weird custom table? Meh, whatever. I can join or group or make a temporary table (or CTE) and bend that disparate data into a single record. For complex sites, this can be a doozy of a process.

When the Database is Entirely FUBAR

You can accomplish a similar concept by skipping the database entirely and extracting meaningful structured data from legacy pages themselves.

Yes, you’ll need to write a literal web scraper.

I’d suggest Python and beatifulsoup4. It’s not as brutal as it sounds.

This works anywhere

The nice thing about the above strategies is that they are universal. It doesn’t matter what platform you’re migrating from so long as you can make a flatflile (either with SQL, scraping, a mix of the two, hiring artisans to etch the data into a harddrive? It doesn’t matter).

Normalizing Data (again)

Here’s where we finally get to the point! No matter how you did it, the magic of crunching data into the same row comes in the second half of our import process.

As we iterate each row of imported data crunched onto single rows, we can normalize it however the hell we want.

In other words, we’ve crunched disparate (often bespoke) data into a row, so now it’s time to break it apart again. Why?

  1. We can migrate away from bad choices instead of just carrying them forward
  2. We can transform template-based content into block-based content! (use serialize_block_attributes)
  3. We can make this process idempotent, creating hundreds of 1000s of pages again and again until everything is design-complete.

We’re looking for the best of both worlds: the ability to bring in the content and copy that enterprises spent years and years making, but also the ability to move to a modern system and pivot/recreate the data any way you’d like.

For example, people often abuse pages in Wordpress. There might be 1000s of pages that really represent concrete ideas like products, and this import strategy let’s us work on both the new design/structure and the content migration at the same time.

Conceptually, this part of the import is as simple as iterating all your exported flat data row by row and calling wp_insert_post with the paramaters you want.

Creating a block programatically in PHP

I was going to skip over this, but maybe it’s helpful to havea a more complete example other than just mentioning the method,eh?

$marqueeArgs = serialize_block_attributes([    
    'title' => $importedTitle, 
    'description' => "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
]);

$postContent = '<!-- wp:custom/marquee {"name":"custom/marquee","data":'.$marqueeArgs.',"mode":"preview"} /-->';

Why is there no copy pasta?

Data migration is almost always bespoke. Content gets messy, so messy you sometimes need a Python web scraper just to get “canonical” content. It’s why plugins don’t work (well) to cleanly migrate content. They either bring way too much or not enough.

There’s so much to the first steps (and all of it is generic SQL or Python), so I’ll provide some links for further research.

If you’ve gotten this far, you’ll read anything. 😉