Your SCM may be decentralized, but your project isn’t
Or, why your project doesn’t need Git
Geeks love new things. If geeks were frogs, gadgets would be lillypads. Here’s a diagram:
Frogs Geeks love to jump from lillypad to lillypad as soon as they see a new one. This is why I’m not all that surprised that people (at least in the Ruby community) are quickly ditching the Subversion ship that’s gotten so much attention over the last few years. While surprised I am not, I am slightly disappointed in people for leapfrogging to a new technology so quickly before it’s proven to really be as cool as it sounds.
Important note: I’ve only been using Git for about a week now, so many of the technical details I provide about Git may will be wrong. Please correct me. However, none of what I’m about to say has anything to do with the technical aspects of git. Having seen the way git is used in certain projects makes me wonder what people think they’re really getting from distributed source control management. Also note that I’m not discussing Linux development here– I know nothing about it, nor do I care. The fact that git works great for Linus doesn’t mean it works for every other open source project.
Git is a technical masterpiece; but not all technical masterpieces are useful to you.
Git has some really nice features. Branching is effortless and hidden from the user. Merging is nice, conceptually, though resolving conflicts seems like a pain compared to Subversion. The speed/size optimizations are a must, and I sure hope the svn guys get their act together.
Note however that none of what I just credited Git for has anything to do with its decentralizededness (dictionary, please). All of this could be implemented in Subversion without changing your workflow.
Programming is not quite like editing Wikipedia
The supposed advantage to decentralized SCM’s is that anyone can contribute code just by running a git clone and then making and sharing changes. Everybody has "their own branch" that they could develop on. But the truth is that anyone who tells you this is simply giving you a false sense of reality.
In real life, you don’t just download the repository, make a change and get a guarantee that your work will be merged back into the main branch for everyone to use (you made the change using SCM so you could share it, after all). In real life, projects are well guarded from the outside world and have a few gatekeepers known as maintainers. These people are usually the project owners/creators. In real life, you deal with having to convince these maintainers that your code deserves to be in the main branch.
The only advantage to git is that once you deal with all the politics, you can theoretically have the maintainers merge your Git branch with theirs really easily– though in reality most projects still would rather take patches using ticketing systems. In reality, workflow trumps technology.
Your workflow is the ultimate bottleneck.
To really understand why decentralized SCM is a complete waste for 90% of your projects, you must first step back and look at how you work. Let’s describe your average open source project:
- Most open source projects are small. Not everything is Gnome/wine/Linux.
- Most projects I see using git have about 10-12 active developers, with about 3-5 active committers. It can easily be fewer.
- More specifically, these projects have far fewer developers than users. Most of the people who download the source only do so to compile it– never to edit.
- Sometimes there is only one main committer with one or two backups. Watch project timelines, you’ll see only a handful of names– the rest will be your odd patch.
- 99.99% of all open source project inevitably make one official release for each set of changesets.
- Such a release is usually hosted in one centralized location with maybe a few mirrors strictly for distribution’s sake.
Is anyone coming to a scary realization here? As decentralized as you attempt to make your project, you will always run into a single point of failure: your workflow.
I’m currently watching merb-core development as an example of one of these projects and I’ve noticed that the workflow is essentially equivalent to one with a centralized repository. Someone will submit a patch and have it committed by one of three main committers. If your patch doesn’t make it to the main branch, you’re simply out of luck. Sure, you could use your patch locally, but you could do this with any body of source code whatsoever, .git, .svn or .tar.gz. This is really no different from Subversion to anyone outside the core development team.
If your project has only one or two active committers or falls under any of the above categories, do yourself a favour and don’t waste your time installing git on your server. You won’t be benefiting from its features because your project and workflow will not have changed.
So who really benefits from distributed source control?
Core developers do. The truth is, git isn’t as great as a DVCS as it is a private whiteboard for "pre-commits" to the main repository. Git can make it a lot easier to pass around changes before finalizing them which would mean less broken builds on the main repository. That’s a good thing, and almost worth a two-tier setup (see diagrams below). But really, this is nothing Subversion cannot do with almost as little effort.
Why Subversion can do what Git does
This is what a git development workflow normally looks like:
The outer repository in this diagram is bundled with a release server (web server, most likely) and ticketing system for patches. The "git" blocks are machines with individual branches for each core developer (abstracted from their physical machines in case they use github or something). I didn’t draw all the connecting lines, but enough to show where the bottleneck lies. More importantly, that you’re not really using git as a decentralized development platform (sorry).
Now lets try this setup with Subversion:
Notice that the Tier 1 workflow does not change. Instead, imagine a subversion repository (it doesn’t have to be the same physical repository) where each developer has their own branch and "publishes" their changes by committing to that branch. This development workflow is 100% equivalent to using, say, github, to share changes. Literally– it’s exactly the same. No, really, it is. In this scenario, Joe can merge Larry’s changes by simply– merging them into his branch. When a final release is made, the few maintainers will merge the code that they got from other branches back into trunk, potentially tagging the changeset.
In fact, not only is this workflow exactly the same as Git’s, but it has a side effect that nearly makes it more powerful than using Git: in an optimistic development environment, the maintainers could give out write access for branches to people from the outside world. I could get "Loren’s branch", and start developing my changes in my own little sandbox. This would be similar to git, but the visibility of my code would be much higher in that the core developers would be able to keep tabs on changes that non-core developers are making without having me ping them about it *. I would no longer be a second-class citizen with my own git repository far off at some URI in the public world (see git diagram), but instead I would be developing in the same location where the core team is. I have no clue why people think this is a bad thing.
* To be fair, git could do what I just described, but the developers would need to manage links to all the outsider repositories / track them all with relatively complex, currently-non-free-or-very-private software (github being one). The infrastructure for doing this in Subversion is built-in and implicit.
In summary, group me with all the other people who are skeptical of distributed source control management, please.