Monday, October 20, 2008

Converting from CVS or SVN to Git

This post is a collection of notes about moving from CVS or Subversion (SVN) to Git.

Over the last 9 months all my projects have moved to Git. Previously I've used SVN, but found creating and merging branches (which in theory looks like the Right Way to Work) to be pretty painful.

Why Change ?

"If it ain't broke don't fix it" is the most common reason to not change, closely followed by the lost productivity during the change over. Even given these factors, if you closely examine the workflow that older repositories force business into, the benefits of change become obvious.

Imagine that you have 10 developers working on a project in CVS/SVN. A typical workflow is as follows:

1. A new set of features is assigned to 5 of the programmers. The other 5 are working on bug fixes.

2. Each programmer checks out the current head of the trunk and starts work on their bug or feature.

3. Everyone works on code locally until their work is complete, but NO-ONE commits anything as they work, because that would put the head in an unstable state. Work is only committed when it is complete.

4. The first bug fix is complete and is committed.

5. The first feature is committed.

6. Unit tests are run, and all seems well.

7. The second bug fix is committed.

8. The second feature is committed.

9. The unit tests fail horribly and the application appears to be broken in 10 places.

10. The four programmers who made the last 4 commits have a meeting to see what went wrong. As a result the head is frozen while one of them works out how to fix all the problems.

11. While the head is frozen, everyone else continues to code against the checkout they have (now 4 commits ago) while the current batch of problems are sorted out.

12. The code at the head is fixed, and the next feature is committed.

13. This feature causes more unit tests to fail, and re-breaks 3 of the bugs that have just been fixed.

14. The head is frozen again while the current crop of bugs are fixed.

The above is a amalgam of stories I have heard, and appears to be quite typical in many shops using CVS/SVN.

There are many negative things about this workflow:

1. When features and bugs are committed, they can create more bugs if they clash with other features that have already landed on the trunk.

2. Commits can cause complex bugs because of amount of code in a commit is large.

3. All the work that went into a bug or feature is contained in one commit. If a feature took 5 days of work, this is a lot of change for a commit and can make it harder to identify the point that something when wrong.

4. The inability to commit (i.e. save) work in progress tends to reduce experimentation.

There is also the problem of merging. Nearly everyone I know says that it takes a lot of time to plan a merge, and in practice many people avoid branching as a result.

Source code management systems should not create extra work.

The Git Workflow

Contrasting the above, Git allows for simple branching and merging, making it trivial to work on new features and bugs in a temporary branch. It also makes it simpler to manage existing stable, development and maintenance branches.

While on a branch, the programmer can make commits as they work. They can branch from the branch to try an experiment. They can roll-back to any previous state. They can even go back to the trunk (master) and do a quick bug fix, before returning to their current work.

The whole point of source code management is to capture work in a progressive manner at a granularity that is useful for understanding the evolution of a feature, and to help in tracking bugs down. (The bisect feature in Git is great to find the point where code broke).

When the new feature is complete, the programmer can rebase their work off the head of the master branch (the trunk). A rebase takes the current (feature) branch and all its commits and moves it to somewhere else. If you rebase a branch off the main truck of code, it is the same as if you had made the branch off the current head, rather than 20 commits back, which is where you started.

This enables local testing to take place just before the new feature is merged back into the trunk.

Practically speaking, a programmer making a new feature would do the following. I'll include the Git commands required, and I'll assume that the programmer already has a local copy of a remote master repository.

1. Create a new branch

git checkout -b new-feature

(-b creates the named branch)

2. Start work. Create a new function as part of the feature. Commit the function.

git commit -m "This function is to add some stuff to blah"

3. Rebase (the master branch has had 3 commits since work started).

To do this they switch back to master

git checkout master

Then changes from the master repository are pulled and merged locally.

git pull

Then checkout the feature branch

git checkout new-feature

The rebase it.

git rebase master

(There are faster ways using fewer commands to do this, but I have broken it out so you can follow the logic.)

4. The programmer then runs locally the unit test for the module he is working on.

5. The unit test fails. Because the change made in the commit is quite small it takes only a few minutes to find the problem. The bug is fixed and committed.

git commit -m "Fixed bug caused by changes in module Y"

6. The cycle above continues until the feature is done.

7. The programmer then switches back to the master branch

git checkout master

8. And merges in the changes.

git merge new-feature

9. All unit tests are then run locally. The code passes so the changes can be pushed to the main repository.

git push

The main repository now has the new feature, AND any other changes committed by other programmers, and assuming they used the same process, the HEAD is now in a working condition.

The git workflow avoids all of the problems that arise from working in isolation, and the single large commits. It allow much greater flexibility to experiment, and to test changes against the current trunk at any stage.

It also means that the evolution of all features is available in the repository.

How to Change to Git

There are two ways: cold turkey or slowly. If you go cold-turkey it has to be on a new project (easy), or you have to convert your old repository (harder).

The slow way is to use an adaptor like cvsimport or git-svn.

Personally, I would recommend cold-turkey.
SVN Cold Turkey Links
Basic Migration

Migration to a remote server

Project and server migration
CVS Cold-Turkey Links
CVS to Git transition guide

Understanding Git

I'd recommend watching the following videos before starting to use git.

Linus Torvalds on Git

In this video Linus Torvalds explains the rationale behind Git, and why the distributed model works better than other models.

Randal Schwartz on Git

Randal Schwartz explains the inner workings of Git.

Git with Rails

Ryan Bates shows how to use Git with a simple rails project. This is useful to see how easy it is to use.

Installing Git

On GNU/Linux (from source)


On Windows

Learning Git

To learn how to use git, the best videos around are on gitcasts. To get started view the first 4 videos in the basic usage section.

There is also a guide for svn deserters.

Tools to help using Git

The best tools are built right in, you just have to know where to find them. My personal favourites are command line auto-completion and showing the branch in the prompt. I have blogged about this previously.

Those of you who are used to CVS/SVN may not see the point of the second of these; once you start using Git you will be making and merging branches all over the place, and the prompt is a great reminder of where you are. The prompt that comes with Git also shows when you are part way through a merge or rebase (i.e. you have unresolved conflicts).

You can use gitk to view the repository (or gitx for OSX).

If there are other resources that readers find useful put them in the comments and I'll add them to this post.

No comments: