Migrating from MediaWiki to Confluence

One of the most valuable assets a software company holds, is its documentation. Although not as important as the code itself, documentation can enable handovers, assist support enquiries, and just generally make life simpler for anyone and everyone associated with the software.

Whilst modern version control systems have popularised the use of markdown-based README files, and in some cases proprietary Wikis, most software companies make use of separate documentation systems. For some, this might be a collection of Word documents, or a labyrinth of interconnected Google Docs, and for others a more sophisticated system like Confluence.

In our case, we made use of MediaWiki, the underlying technology behind Wikipedia, and a very popular way of holding large amounts of documentation in an interconnected manner. Advanced on the other hand, used Confluence. Hence, a migration was required.

Surely there’s a tool for this?

But what we found, as do many companies that make use of MediaWiki, was that over time our documentation had become a little outdated, and in some cases outright incorrect. In addition, there were some orphaned pages (pages not linked to anywhere else), and some pages that belonged on a separate Confluence space entirely.

In addition, and as you might expect, the process of converting from Wikitext to Confluence is… not perfect. Despite performing the initial migration automatically, you would still be required to go in and proofread everything, fixing links and re-uploading images as you went, and doing the various other tasks required to refine the finished product.

In the end, despite utilising a tool, you’re still going to have to check everything manually.

So we decided to just do it manually.

Sometimes there’s no replacement for manual (digital) labour

Manually migrating an entire documentation system? Really!?

Now of course, I’m not here to suggest that a manual migration is the right answer for everyone. Instead, I am simply reporting some of my findings from the process, in the hope that someone out there might find it helpful.

Lessons learned from manually migrating 100+ articles from MediaWiki to Confluence

Lesson #1: Using categories (and a lack thereof) to track our migration

  • Migrated to Confluence space 1
  • Migrated to Confluence space 2
  • Not being migrated to Confluence

In this example, we were splitting our MediaWiki between 2 Confluence spaces, but of course you could use more (or less). We also found that quite a few articles had become so outdated or irrelevant that they weren’t worth the trouble of migrating them, but we still needed a way of marking them as “done”, so a category was created for that.

Each time a page was migrated, we would select one of the above categories, insert the below code into the top of the article, and then un-comment the relevant line:

https://gist.githubusercontent.com/DuncanMcArdle/48774b88c152d6323afdc2a0bc85d3ff/raw/f7054f7cfb1729e7f4fab6f3eeb1b9830fc0ffde/blog_migrating_from_mediawiki_to_confluence.wikitext

What this did was 2-fold:

First, it categorised the page appropriately, and meant that page no longer appeared in the “uncategorised pages” section found in MediaWiki’s infamous “Special Pages” area. This meant that the “uncategorised pages” feature effectively became a migration tracker, ticking down page by page as each was migrated, until eventually there were none left, and we knew we were done.

The moment this message finally appeared was a truly wonderful one

Second, it marked each page as migrated in unmissable red text, notifying viewers (and would-be editors) not to waste their time there on the Wiki.

How the top of each migrated page looks

A quick note: We did also consider linking directly to the new post-migration Confluence URL for each page, however as the plan was always to decommission the MediaWiki instance soon after migration, we decided it probably wasn’t worth the effort.

Lesson #2: Copy-pasting is incredibly effective (thanks Confluence!)

Confluence have done a fantastic job of making their system capable of converting incoming content to their format. As a result, even some of our most complicated MediaWiki pages were more than 90% correct after simply highlighting all the content, hitting copy, switching to Confluence, and hitting paste ( note: you should do this from viewing mode, not edit mode, or you’ll get raw Wikitext).

You will have to do some checks afterward to ensure everything went over as expected, and you may encounter issues around things like image alignment and captions, but overall this method was surprisingly accurate.

Lesson #3: Image quality is massively reduced

The good news however, is that if you click into the image on your MediaWiki, you’ll be able to see the original quality version. You can then copy that, and paste it over the top of the newly transferred one in Confluence.

Copy-paste to the rescue (again)!

See the “Original file” link at the bottom-left

Lesson #4: Articles in Confluence will be owned by whoever transfers them

Every. Single. Time.

In an ideal world, you would ask the original creators of the MediaWiki articles to perform the migration, thus retaining them as the owner. But in reality this may simply not be possible, and you should instead simply unsubscribe to the articles as and when you create them in Confluence.

Lesson #5: URLs may need to be updated (but are relatively easy to find)

Luckily Confluence make it very easy to find instances where the old link has been used. By simply searching for the old URL on Confluence, it will highlight every page where the URL was used, and you can then go about tracking down the specific references, and updating them.

In addition, this is also a great time to update any references to legacy systems. In our case, we also migrated from GitLab to GitHub, so this same method presented a great way of updating those links in our MediaWiki / Confluence content, too.

In closing

If however, you do decide to do it manually, then I hope the above is of some help to you.

--

--

Full-stack software developer from the UK, author of the Aftermath book series, full time tech-nerd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Duncan McArdle

Full-stack software developer from the UK, author of the Aftermath book series, full time tech-nerd.