Object Oriented Content Migration

Matt Johnson / @xmatt / alleyinteractive.com

http://xmattus.github.io/object-oriented-content-migration

  • We are a full-service digital agency
  • WordPress.com VIP partner
  • Hiring!

What is migration?

  • Your client has an old site (maybe WordPress, maybe something else).
  • You're making them a new site.
  • Their old site has content in it. They worked hard on that content!

Content migration can be one of the most fun parts of a project...

  • Reverse-engineering weird legacy systems.
  • Building code to clean up and format your content real nice.
  • The satisfaction of processing thousands (or hundreds of thousands) of posts with a single CLI command.

...or one of the least fun

  • Oops, legacy content is Windows-1252 content encoding, WordPress speaks UTF-8.
  • Half the authors are mysteriously missing.
  • "Oh hey, Matt, we have this microsite we forgot to tell you about until right now, a week before the launch. Can we just merge it into the main site's migration?"

WXR: Usually Not an Option

  • Most Alley projects fully transform sites.
  • Rare (but not unheard of) for old site to be WP at all.
  • Major information architecture changes are typical, e.g.
    • Switching from users-as-authors to Co-Authors Plus
    • Loading custom metadata into Fieldmanager
    • Remapping all the taxonomy
    • Adding several custom post types

Make Your Own WXR?

  • Can be unwieldy; need code to hook into the old data and generate XML.
  • Still limited by the format of WXR.
  • No test to see content on a real site before sending to VIP.

Object-Oriented Approach

  • ETL means extract, transform, load.
  • Use classes to encapsulate extracting and loading.
  • Processor: Extracts data.
  • Migrateable: Loads data.
  • Transform as needed when instantiating migrateables.
  • Deploy via WP-CLI.

Processor class


class Processor {
    public $migrateable_class;

    public function __construct() {
        $this->migrateable_class = 'Migrateable';
    }

    abstract function advance_cursor();

    abstract function get_next_item();

    public function load_migrateable() {
        $item = $this->get_next_item();
        $this->migrateable = new $this->migrateable_class( $item );
        $this->advance_cursor();
    }

    public function save_migrateable() {
    	$this->migrateable->save();
    }

}
						

Processor Class

  • Abstract methods for iterating over legacy data objects.
  • For specific processor types, will impose related migrateable class.

Migrateable Class


// Example where $item is a simple assoc array, e.g. parsed JSON.
class Migrateable {
    public $item;

    public function __construct( $item ) {
        $this->item = $item;
    }

    public function get_title() {
        return empty( $this->item['title'] ) ? $this->item['title'] : '';
    }

    public function get_content() {
    	return empty( $this->item['content'] ) ? $this->item['content'] : '';
    }

    public function get_legacy_url() {
    	return empty( $this->item['url'] ) ? $this->item['url'] : '';
    }

    public function save() {
        $post = $this->get_post_by_legacy_url();
        if ( ! $post ) {
          $post = array();
        }
        $post['post_title'] = $this->get_title();
        $post['post_content'] = $this->get_content();
        // More fields and other fun transformations here
        $post_id = wp_update_post( $post );
        update_post_meta( $post_id, 'legacy_url', $this->get_legacy_url() );
    }

    public function get_post_by_legacy_url() {
    	$post = big_ugly_meta_query( $this->get_legacy_url() );
    	if ( $post && ! is_wp_error( $post ) ) {
    	    return $post;
    	}
    	return false;
    }
}
						

Migrateable Class

  • Specific ways to pull data from legacy format.
  • Can transform pieces of data in these methods.
  • Built-in idempotent behavior.

Implementation Plan

  • Subclass Processor and Migrateable to tailor to your situation.
  • Control process from WP-CLI using cursor methods in Processor.

The Importance of Idempotence

  • Convenient when migrating thousands (or millions) of posts.
  • Potentially life-saving when on a deadline!

Some Migrateable Source Examples

  • A MySQL database (any weird schema).
  • A pile of XML or JSON files.
  • An RSS feed.
  • A REST API.

Ask me later (or now) about...

  • Dealing with media
  • Dealing with Co-Authors Plus
  • Packaging code for VIP

The End


Want to do cool stuff like this?
We're hiring.

info@alleyinteractive.com