Wed, 22 Jun 2005

Replicating a Subversion Repository Using Git

I recently had to replicate a subversion repository into an existing svn repository. I wanted to import a vendor's svn repo into an existing repo, recording the full history from the source into the destination. Development was continuing in the source repo so I also needed a solution that would allow repeated syncs from the source to the destination.

First, I looked at using svnsync to do this. It handles the requirements for preservation of history and repeated syncs, but it requires a brand new repository to write to, and only svnsync should be allowed to write to the destination.

I also briefly looked at using svnsync to replicate to a new local repo, then using svndumpfilter (or svndumpfilter3?) to generate a dump that I could load into the existing destination repo, but doing this repeatedly didn't sound like fun.

So I figured out how to replicate changes from one svn repository to another using git-svn. If you want to do something similar, here's how I did it.

# create a git repository to use for the migration
git init migration
cd migration

# git-svn doesn't manipulate revision properties to store the original author,
# so the commits in the destination repository will be attributed to the user
# committing to the repo.  Setting svn.addAuthorFrom will add a From: line to
# the end of the log messages in the destination repo.  The svn.useLogAuthor
# will read this value out of svn commits so git will report the correct
# author.
git config svn.addAuthorFrom true
git config svn.useLogAuthor true

# Initialize the source and destination repositories.  Set a prefix for each since we're using two.
# Both source and destination repositories can be any valid svn URL.
# We're not doing development in git so we don't tell git to treat svn branches
# and tags like git ones.  If we have a standard svn layout with trunk,
# branches, and tags, we just treat them like normal directories as far as git
# is concerned.
git svn init -R source --prefix source/ https://example.com/path/to/source
git svn init -R dest --prefix dest/ file:///path/to/dest

# Map svn usernames in the source repository to names and email addresses.
# These values will end up in the From: line in commits.  You may want to edit
# the svn-authors before running `git svn fetch` if you want more descriptive
# names.
(for i in `svn log https://example.com/path/to/source | grep -P '^r\d' | cut -f2 -d'|' | sort -u`; do echo "$i = $i <$i@example.com>"; done) > .git/svn-authors

# Add the username of the user that will be performing the commits to the destination as well.
echo "cwarden = <cwarden@xerus.org>" >> .git/svn-authors

# Tell git to use the list of authors you just created.
git config svn.authorsfile .git/svn-authors

# Pull in both svn repositories to git.  This might take a while.
git svn -R source fetch
git svn -R dest fetch

# Create local branches from the remote branches.  `git cherry-pick` requires a
# local branch, and I like the symmetry of using one for the source as well.
git branch source remotes/source/git-svn
git branch dest remotes/dest/git-svn

The rest of the steps can be wrapped up in script to be run each time you want to migrate changes from source to dest.
#!/bin/sh
# Update the source from svn.
git co source
git svn rebase || exit 1

# And update the destination repo in case anyone else is writing to it.
git co dest
git svn rebase || exit 1

# Take all of the changes that are in source, but not dest, and
# individually apply them to the local git copy of the repo.
git cherry -v dest source | grep ^\+ | awk '{print $2}' | xargs -I{} sh -c "git cherry-pick {}; git diff --quiet || exit 255" || exit 1

# Commit the changes that we just cherry-picked to svn.
git svn dcommit || exit 1

# Now the somewhat ugly part.  Because the two histories remain separate, we
# tell git that we've applied commits from the source to the destination using
# the git grafts feature.
# This eliminates the previously applied commits from the list that
# git needs to look through the next time we run `git cherry`.
(git show-ref -s dest; git show-ref -s source) | paste -d' ' -s | xargs -d '\n' -I{} sh -c "test -f $(git rev-parse --show-toplevel)/.git/info/grafts && grep -q '{}' $(git rev-parse --show-toplevel)/.git/info/grafts || echo '{}' >> $(git rev-parse --show-toplevel)/.git/info/grafts"

As an alternative to cherry-picking commits, you could use git merge source instead. This frees you from having to store the graft points, but at the expense of having each merge appear as a single commit in the destination svn repo, so you don't get the full history from the source.

Update: The script that's run for each synchronization now aborts if an error occurs, most likely a cherry-pick that results in a conflict. Thanks for fr0sty on #git for coming up with the git diff --quiet solution to detect failed cherry-picks because the exit status is ambiguous.

tech | Permanent Link

The state is that great fiction by which everyone tries to live at the expense of everyone else. - Frederic Bastiat