Having recently undergone an acquisition, I was tasked with the fairly standard post-merger task of migrating our existing codebase over to the new parent company. Luckily for me, both sides used Git. Unluckily for me, one side used GitLab (us), and the other used GitHub (them).
Now, given that both use Git, you’d be right for assuming that this is not that hard of a thing to accomplish. However, below are a few of the tips and tricks I learned along the way to improve the quality of the migration.
Note: This article is by no means a representation of the best way to perform this migration, but rather my findings from the way I did it.
The Method of Code Migration
Unsurprisingly, there are a variety of ways to migrate projects in this Git-centric environment.
GitHub themselves, keen to pull in as many new (ideally paying) users as possible, provide a simple import tool:
You simply specify the URL of the existing repository, as well as a few key details, and then let it do the rest for you.
It’s worth mentioning however that for this to work, you’ll need your existing repository to be open to the internet (which, if you’re using an on-prem instance of GitLab, it probably isn’t). You’ll also need to have 2FA turned off (which, unless you’re a monster, you will have on), or to use a 2FA code generated by GitLab.
All in all this method works very well. It’s highly compatible with different version control systems, pulls through all the branches, and is reasonably quick. However, it’s a manual process, which if you’re importing a large number of repositories will get tedious very quickly. In addition, the need to open your GitLab environment up to the open internet may mean this is a non-starter for you.
3rd Party Migration Tools
There are of course other tools for this process available on the internet. Typically they work by checking out your projects and pushing them to GitHub — much like the next section — but what I personally found was that I had a hard time trusting these tools. Those that were open-source tended to be semi-abandoned, and those that were closed-source were… closed-source.
When it comes to migrating what is of course a software company’s most valuable asset, I didn’t really want to take the risk.
I’m a software engineer, so of course the idea of writing my own script came to mind. In addition, the GitHub API is truly wonderful. It covers just about every conceivable thing you could do on GitHub, and covers it well.
Accordingly, I wrote a script that did a variety of things:
- It checked out all existing repositories from GitLab, re-pointed them to a GitHub of my choosing, and pushed them all (the actual migration)
- It checked out all GitLab and GitHub repositories side-by-side (for ensuring the migration had worked successfully)
- It deleted all repositories on the target GitHub server (for clean-up after test runs — note that this script only worked on non-production GitHub’s, or else I’d risk deleting some valuable corporate property!)
In the end, this was the approach I went with. It was straightforward enough to do in minimal time, but versatile enough to allow me to account for the various additional requirements of our specific migration. In addition, it was scalable enough to allow me to migrate anywhere from 1 to 1,000 repositories in a few seconds per repo (depending on size).
P.S. As this script was written on company time, I won’t be posting it here for public consumption, though I will include some snippets below.
A Note on Merge Requests
Unfortunately, GitLab Merge Requests (MRs) and GitHub Pull Requests (PRs) are not the same. They’re very similar, but they don’t follow a standard implementation like Git itself does. What that means is that MRs cannot be drag-and-dropped onto GitHub. Conversion must take place.
In the end, I decided this was more hassle than it was worth, and opted to drop the MRs themselves (much like the GitHub importer does). Obviously, the code commits that came from those MRs still exist, so you could argue little was lost, but if you are adamant about keeping MRs, you may wish to consider another option.
In addition to running the actual migration, there are a few pre-migration activities I’d definitely recommend carrying out.
A Spreadsheet (who doesn’t love a spreadsheet?)
As a lover of spreadsheets, this one was a no-brainer. Before I started anything, I created a spreadsheet and entered all my repositories into it. This then gave me a central point in which to manage not only the migration progress of each repository, but also to automate the creation of certain things, such as JSON data.
My migration scripts all ran on JSON data (no surprise there), but rather than enter it all myself, I had the spreadsheet create it for me. To give you an example, I wanted to apply topics for each repository (to be covered in more detail later), but some of these topics were project-specific. By flagging this in the spreadsheet, I was able to generate an array of topics to be applied to each project, and insert it right into the JSON data for that repository.
As another example, when the migration was over, the spreadsheet was also used to generate the Git switching commands (covered later) that were given to each developer.
This isn’t to say this sort of thing couldn’t be done in the migration script itself (it definitely could have been). But by using a spreadsheet I was able to present a non-technical, high-level overview of the overall migration, and distribute it to the relevant people, including those who were only interested in the number of projects remaining.
One thing I needed to address early on, was the fact that repositories on the target GitHub followed a naming convention that ours didn’t currently adhere to. This was a trivial thing to address with a formula in the spreadsheet, which then output the new name into the JSON data, which in turn was fed to the migration script.
What I actually found however was that it made more sense to rename the repositories on the existing source control (GitLab) before migrating. This meant there were no confusing mid-migration-renames to worry about, and that the development team only had to redirect their projects post-migration, not rename them as well.
Adding Secondary Emails
One issue with migrating from an on-prem GitLab instance to a cloud-based GitHub one, was that user accounts were not consistent across the two. If a developer committed to GitLab with
email@example.com, they probably didn’t also have a GitHub account under the same email. So, when the migration was done, you had an existing user of
firstname.lastname@example.org, and a new user of
email@example.com going forward.
Now, this is obviously only an issue if your users switch to personal GitHub accounts post-migration, however that is what I would personally recommend they do. The reason for that is that then all of their GitHub activity is in one place, and they have easy access to other projects they may have published or contributed to, which may in turn aid their development.
Some users will prefer to keep their professional and personal programming separate, and of course that’s absolutely fine. However, in cases such as ours (the company being acquired), they may still be changing emails to one belonging to the new company, and thus will still face the same issue.
So what I realised early on was that in GitHub, you can add as many additional emails as you like, and that will then link you to any commit made under any of those addresses. This means that you can, to your personal GitHub account, add your work email, and your new work email (if going through an acquisition). Then, once the migration is over, all commits made under any of these email addresses will link back to you.
Of course, it’s important you ask your developers to do this prior to the migration. But it’s a fairly trivial task and will ensure they have GitHub access early on, which is definitely something worth doing.
I’d like to think this point is too obvious to say, but I’ll say it anyway: You should always test your migration to a non-production environment first. Ideally, you should also test migrating from a non-production environment, but I appreciate that’s a little more cumbersome to set up.
It’s quick and easy to set up a GitHub (if you haven’t already got one), and you can use your own as a target for testing the migration. Just make sure you adjust the target URLs to your account, set the projects to private, and ensure you delete everything after you’re done.
Getting your code across may be all you need to achieve. But below are some additional things I had to take care of.
One feature that the target GitHub made use of which didn’t exist in GitLab was topics. Most were a fairly generic company-division topic, but others were project-specific. The way I managed the creation of the list of topics was to (mostly) automate it on the master spreadsheet, which in turn generated an array of topics as part of the JSON used to feed the migration. Once again, spreadsheets to the rescue!
The actual application of the topics was a fairly simple API request:
Much like with topics, there was also a requirement to assign each repository to a target team. This was preferable to assigning individual users as it meant each repository could be assigned a team, and then the team’s members could be managed in one place. Once again this was auto-populated by the spreadsheet, and so could be manipulated to separate projects into different teams.
GitHub API to the rescue once again:
Checking / Testing
So you’ve just migrated the core of your company’s assets across the open internet, and are ready to decommission its old home. But before you do, you obviously want to test that everything has moved as expected.
The obvious answer here would be to run your automated tests, to build new versions and make sure they work, etc. However we all know there will be at least 1 repository in your arsenal that doesn’t have these facilities, and so it’s time to go old-school.
What I did, was to write a script to check out every project from its old home on GitLab, and every project from its new home on GitHub, and then manually ran a diff using WinMerge. This (after some time) eventually confirmed that aside from the
.git directories (which we can safely ignore), all of the code matched. The transfer had been a success.
Note: If your repositories are named the same before and after migration, and you check out all GitLab projects to a folder called GitLab, and all GitHub projects to a folder called GitHub, you can just highlight the two folders and run WinMerge across everything. Running it manually per-project will take a lot longer!
Redirecting the Development Team
Once the code had moved across, it was time to notify the development team. But of course, they now had to begin working from the new GitHub environment.
The easy way to do this was just to have them delete their local copies and check them out fresh from GitHub, and that is exactly what I encouraged everyone to do. Nice clean checkouts, who doesn’t love a bit of spring cleaning?
But of course, some will be midway through developing a feature, or will have local stashes they don’t want to lose. Time to shine, Git:
git remote set-url origin https://github.com/gitHubOrganisation/target-repository.git
When executed from within a repository checked out from GitLab, this simple command completely re-points it to its new home on GitHub, and the developer can carry on working as if nothing happened.
And of course, thanks to that spreadsheet we made earlier, these commands can be generated automatically for each and every repository.
Finally, it’s time to lock and / or close down GitLab. Don’t be scared, you’ve done your tests, confirmed the code is safely in its new home, and of course, you’ve backed up your GitLab instance just in case, right?