Migrating from MediaWiki to Confluence
One of the most valuable assets a software company holds, is its documentation. Although not as important as the code itself, documentation can enable handovers, assist support enquiries, and just generally make life simpler for anyone and everyone associated with the software.
Whilst modern version control systems have popularised the use of markdown-based README files, and in some cases proprietary Wikis, most software companies make use of separate documentation systems. For some, this might be a collection of Word documents, or a labyrinth of interconnected Google Docs, and for others a more sophisticated system like Confluence.
In our case, we made use of MediaWiki, the underlying technology behind Wikipedia, and a very popular way of holding large amounts of documentation in an interconnected manner. Advanced on the other hand, used Confluence. Hence, a migration was required.
Surely there’s a tool for this?
By far the most common question at this point will be whether a tool exists for performing such a migration, and the answer is of course yes. There are a variety of options that will convert your Wikitext formatted MediaWiki content into a Confluence style, and then port it over.
But what we found, as do many companies that make use of MediaWiki, was that over time our documentation had become a little outdated, and in some cases outright incorrect. In addition, there were some orphaned pages (pages not linked to anywhere else), and some pages that belonged on a separate Confluence space entirely.
In addition, and as you might expect, the process of converting from Wikitext to Confluence is… not perfect. Despite performing the initial migration automatically, you would still be required to go in and proofread everything, fixing links and re-uploading images as you went, and doing the various other tasks required to refine the finished product.
In the end, despite utilising a tool, you’re still going to have to check everything manually.
So we decided to just do it manually.
Manually migrating an entire documentation system? Really!?
I can definitely understand if this seems unpalatable, especially if you’re staring at a MediaWiki instance with articles numbering in the thousands (or maybe more!). Luckily for us we were only looking at just over a hundred, and the exercise actually proved to be a useful activity for refining, culling and just generally updating our documentation. In fact, it represented the single biggest refinement of our documentation in its history, which can only be a good thing, especially during the acquisition and integration phase.
Now of course, I’m not here to suggest that a manual migration is the right answer for everyone. Instead, I am simply reporting some of my findings from the process, in the hope that someone out there might find it helpful.
Lessons learned from manually migrating 100+ articles from MediaWiki to Confluence
Lesson #1: Using categories (and a lack thereof) to track our migration
Luckily for us, our MediaWiki did not make use of categories. This meant I was able to create 3 categories to track how our migration was progressing:
- Migrated to Confluence space 1
- Migrated to Confluence space 2
- Not being migrated to Confluence
In this example, we were splitting our MediaWiki between 2 Confluence spaces, but of course you could use more (or less). We also found that quite a few articles had become so outdated or irrelevant that they weren’t worth the trouble of migrating them, but we still needed a way of marking them as “done”, so a category was created for that.
Each time a page was migrated, we would select one of the above categories, insert the below code into the top of the article, and then un-comment the relevant line:
What this did was 2-fold:
First, it categorised the page appropriately, and meant that page no longer appeared in the “uncategorised pages” section found in MediaWiki’s infamous “Special Pages” area. This meant that the “uncategorised pages” feature effectively became a migration tracker, ticking down page by page as each was migrated, until eventually there were none left, and we knew we were done.
Second, it marked each page as migrated in unmissable red text, notifying viewers (and would-be editors) not to waste their time there on the Wiki.
A quick note: We did also consider linking directly to the new post-migration Confluence URL for each page, however as the plan was always to decommission the MediaWiki instance soon after migration, we decided it probably wasn’t worth the effort.
Lesson #2: Copy-pasting is incredibly effective (thanks Confluence!)
It may sound a little cliché to suggest that the best way of moving content from one system to another is to copy-paste it, but surprisingly, it really is.
Confluence have done a fantastic job of making their system capable of converting incoming content to their format. As a result, even some of our most complicated MediaWiki pages were more than 90% correct after simply highlighting all the content, hitting copy, switching to Confluence, and hitting paste ( note: you should do this from viewing mode, not edit mode, or you’ll get raw Wikitext).
You will have to do some checks afterward to ensure everything went over as expected, and you may encounter issues around things like image alignment and captions, but overall this method was surprisingly accurate.
Lesson #3: Image quality is massively reduced
If you do use the above method of copy-pasting content, bear in mind that the image quality sent over will be the one currently on display on the Wiki, not the original, which may or may not be completely different resolutions.
The good news however, is that if you click into the image on your MediaWiki, you’ll be able to see the original quality version. You can then copy that, and paste it over the top of the newly transferred one in Confluence.
Copy-paste to the rescue (again)!
Lesson #4: Articles in Confluence will be owned by whoever transfers them
One thing I only discovered after migrating 100+ articles, is that the person who migrates them then owns them in the eyes of Confluence, and is therefore subscribed to email alerts every time someone makes a change.
Every. Single. Time.
In an ideal world, you would ask the original creators of the MediaWiki articles to perform the migration, thus retaining them as the owner. But in reality this may simply not be possible, and you should instead simply unsubscribe to the articles as and when you create them in Confluence.
Lesson #5: URLs may need to be updated (but are relatively easy to find)
Given that MediaWiki is a system of interconnected documents, it’s a fair bet that your articles feature many internal links. This of course is a problem when you move to Confluence, as you don’t want users being sent back to the legacy MediaWiki instance when they click a link (especially after that system is decommissioned).
Luckily Confluence make it very easy to find instances where the old link has been used. By simply searching for the old URL on Confluence, it will highlight every page where the URL was used, and you can then go about tracking down the specific references, and updating them.
In addition, this is also a great time to update any references to legacy systems. In our case, we also migrated from GitLab to GitHub, so this same method presented a great way of updating those links in our MediaWiki / Confluence content, too.
All in all, the migration took around 5–15 minutes per article. That’s a small price to pay for ensuring your content is up to date, but I could see why the prospect may be a little more daunting if your articles number in the thousands.
If however, you do decide to do it manually, then I hope the above is of some help to you.