Skip to main content

Migrating (and Open-Sourcing) an Historical Codebase: SVN-to-Git

I have a SVN repo on my local machine that I have been shoving stuff into since before I knew how to use revision control systems properly (I still don't). Each of the directories in my repo represents a separate project. Some of my directories have a proper trunk/branches/tags structure, but most don't. I want to migrate each of my SVN repo's directories into it's own new respective git repo, retaining all historical commits.

[My dodgy SVN repo]

The best tool I have found to do this so-far is svn2git:

This tool is a ruby gem that essentially wraps the native git feature for managing svn-to-git migration - git-svn. If you use svn2git's "--verbose" flag, you can see what commands it is issuing to the wrapped tool.

Migrating a dodgy SVN repo to multiple git repos

The reason I decided to blog about my experience was because I had a bit of trouble with the documentation and a bug with the svn2git tool. Mostly the tool is great, but for my particular purpose, I needed the "--rootistrunk" flag, which as it turns out, doesn't work. Instead (as the following Stack Overflow issue reports -, you can work around this by using the "--trunk", "--nobranches" and "--notags" flags.

[Excerpt from StackOverflow]

So, with reference to the images I have provided (above) of my haphazard SVN repo/directory structure, the following command (which I used Bash on Ubuntu on Windows to run) worked fine in the end (although it certainly took me a bit of messing about to get to this point):

$ svn2git file:///mnt/d/Work/SVN/SimpleList.V2 --verbose --trunk / --nobranches --notags --no-minimize-url

[Git log following successful application of svn2git]

Managing sensitive data

The following article provides some information on how to remove checkins of historical sensitive data (such as passwords) from your (new or old) git repo -

This is especially useful if you want to open-source an historical codebase. I used the "BFG Repo-Cleaner" tool, which is written in Scala and run using Java. Exceptionally useful and highly effective. After I had downloaded the tool, I used the following command on my new repo:

$ java -jar ../../tools/BFG/bfg.jar --replace-text sensitive.txt

The following image shows what your file containing sensitive data's historical commits will look like on GitHub once you have run across it with BFG. 

[After using BFG tool]

Git's native functionality for this purpose only allows you to get down to the file level (delete files), whereas BFG enables you to search/replace specific text in files across the entire history of your git repo, which is fantastic.

Mapping SVN authors to git authors

I think you can avoid having the Author come up ad "bernard@87bffff6-b7f0-bf49-a188-06524d5e88c0" (as shown in the example above) by specifying a mapping from your SVN authors to your git authors. You can extract that information using an approach described in the following link; a(nother) helpful blog post - - by using this relatively gnarly command:

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt

 Once you have this file, using can use the svn2git "--authors" option to specify the file that you've specified mappings in (would be "authors-transform.txt" if you used the above command to get the information). Clearly I forgot to apply this step before kicking off the migration for my SimpleList.V2 codebase. Bugger. If like me you forget to do this, nevermind, you can use the following approach to modifying the author of historical commits (if you can be bothered) -

Get the new git repo on to GitHub

Anyway, next step is to get my new git repo hosted on GitHub. Best advise I can offer here is to pick up section (4) of Troy Hunt's useful and succinct blog post on the same subject (his case is different from mine until this point in that he has a properly formed SVN repo):

Essentially once you have your git repo, complete with history brought across from SVN, you do the following to get your code onto GitHub:

  1. Create a new GitHub repo to push your code into.
  2. Add your remote repo as your new (migrated) repo's remote (GitHub provides good documentation for managing this process - - like this:
  3. Push to GitHub.
Note that if you have elected to add a ReadMe and/or a .gitgnore file on GitHub, that will need to be pulled and merged with your local repo before you can push it all  to GitHub. 

You're done!

I're retrospectively changed this blog post to make it into what are hopefully independently applicable sections. I've built this procedure up over a little while; it's essentially a piecemeal template for open-sourcing historical codebases on to GitHub (or any public git repo) using a few well-established tools. Have fun!


Popular posts from this blog

HOW-TO: Apply a “baseless merge” in Team Foundation Server 2010 (and 2012)

Another purely technical post on TFS...
The scenario We wish to migrate code between branches that do not have a branch/merge relationship, in order to expedite urgent changes being made by a project team, without disrupting on-going BAU development work. Sample branch hierachy/strategy Imagine the following branching strategy in TFS (visible by connecting to TFS via Visual Studio 2010 or 2012):

Essentially you have a "DEV" branch, which has a "QA" branch, which in turn has a "PROD" branch. DEV is the branch that you would be using for BAU development. As a piece of development matures, you move it into QA, where it is tested by your internal QA team. There may be further changes made in DEV that are moved into the QA branch as the QA team pick up issues. Once the QA team are happy with a packaged of changes, they will move them into PROD, which is essentially the hand-over to the customer. The PROD branch represents the software that the customer has.


HOW-TO: Add/edit a field in Team Foundation Server 2012 using Visual Studio 2012

It's been a while since I made a purely technical post...

So, today I wanted to make a change to a Microsoft Team Foundation Server 2012 (TFS2012) instance that I am working with to reflect "Actual" time spent on a task - mainly for reporting purposes, and because I have found in the past that making this minor process adjustment yields a relatively useful metric over the long-term.

I am using the Microsoft Scrum 2.1 Process Template ( for a project that I am working with. So that I don't forget how to do this (again!) I will blog-post the procedure I've used to add this field to the template as a screen-shot-based tutorial, as follows...
Before Assuming you are familiar with the Scrum Process Template (2.1-ish) - open a task and take a look at the "Details" section, as follows:

 This is where I want my "Actual" field to show up.
Get the Power Tools Download and install the latest v…

Eclipse/Android error: "Multiple dex files define [...]"

Wow, I am really going nuts blogging this-evening - 2nd post in less than an hour. 

Anyway this is a particularly nasty error that I keep running into with Eclipse/Android when starting the emulator after I have not run it for a little while. Since I run the risk of permanently forgetting the solution to the problem every time I walk away from my Android project (and thus having to spend a painful hour-or-so digging up the procedure again), I will blog it here, for my benefit, and for the benefit of anyone who may also suffer the same problem.

The gist is that when you start the emulator in debug mode (that is, you hit the button in the following image), you get the following error message come out on the console and a nasty popup telling you nothing more than there is an error with your program and you need to fix it:

[2012-04-06 23:20:57 - Dex Loader] Unable to execute dex: Multiple dex files define Lcom/google/gson/ExclusionStrategy;
[2012-04-06 23:20:57 - SimpleList] Conversion to Dal…