Jason (jcreed) wrote,

Intriguing paper from a couple years ago by Löh, Swierstra, and Leijen: A Principled Approach to Version Control. While yak shaving today, I found myself playing around with (as well as precedentially mucking with configure scripts trying to even install) git and I had a vague memory of reading about a distributed version control system that had a whole bunch of very nearly reasonable theoretical talk to go along with it. Turns out I was thinking of darcs. The related work on their wiki pages pointed me to the above paper.

I think I'd read it before*, but I definitely didn't appreciate it. The cute thing about it, which I don't think is trumpted enough, is how it reduces complex data structures (in the simple case directory trees, files consisting of lines) to relational facts much like the way relational databases do, and this representation has two key advantages: one, that what this representation considers a "local" change is actually quite reasonable. Moves and copies of files, or of lines, are nicely local, because they can be represented as basically shuffling of pointers. Two --- and I didn't see this mentioned at all in the paper --- that when you're considering two changes that conflict, the only way they can conflict is by their composition violating some structural invariant, not by failing to commute with one another. If all you have is a (signed?) (multi?)set of database tuples, then any pair of changes that consist of adding and removing tuples necessarily commute with one another as long as their composition (in both orders) is also a valid change.

This would justify a bunch of significant optimizations --- which are exactly what you want, as opposed to implementing the system naively straightforwardly. I think that would have bad performance, especially if you took their suggestion about abstractly labelling every line of every file distinctly to capture the meaning of simultaneous edits on a line-based file.

But the point of the paper, I think, and it's a good one, is to give a proposed semantics of how version control should (or at least could) behave, not to prescribe any particular implementation.

*Oh yeah! Thanks, livejournal tags. Also: the talk very explicitly mentions the commutativity property I did, so I suspect the paper must have also mentioned it and I just missed it.
Tags: version control
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded