Good (text) data formats merge well while in source control. By this, I don’t mean they get along with their peers and share valuable life insights. I mean that when two people have changed a file and one person needs to reconcile his changes with the other person’s, the resolving (or merging) is relatively painless - if possible, automatic.
I spent an entire day researching ways to effectively make XML data merge-friendly. I learned a good bit about 3-way resolves, various XML data formats, a few peoples’ love lives, various source control systems, and one or two of Perforce’s dirty little tricks. I didn’t - and an observant person is already finishing this sentence in his head - find anything on merge-friendly data. Nothing. I find it hard to believe that in the history of XML, nobody has ever approached this problem.
So I came up with a couple ideas of my own, asked around the office, got a little bit. Vince dropped by and offered some really good suggestions. He asked if I had researched this, and when I said I found nothing, he said that normally in that kind of situation he would make a blog post since there really ought to be something online about the topic.
Well, I have a blog too, and I’m beating him to it. Here we go:
(1)When you open a file and resave it, it should come out exactly the same.
This one should probably be the most obvious, but I’ve had the unpleasantness of working with files that do not comply. Files should not juggle themselves when you’re not looking. If every single line changes when you make a single edit, your file might as well be binary.
(2)Preserve list order.
This goes hand-in-hand with the above, and you’d probably have to actually work to violate this. If you write out a list, read it in in the same order. And then write it out in the same order. If it’s not in alphabetical order, don’t read it in in alphabetical order, or else be sure to preserve the original ordering somewhere so you can write it out exactly the same. The moment you start juggling things, the moment you introduce unnecessary variation which introduces unnecessary merge headaches.
(3)Beware hash tables.
I don’t want to lecture you on how a hash table works, just know this: there’s the good potential that through the course of running your program, items in your hash table might be in completely different positions when serialized via a naive iterate-and-write approach. This complicates things, because, as in #1, items that were never even touched will end up swapped around and juggled, increasing the probability of conflicts. If you can, a better approach may be to sort your data based on the key of your table and write the sorted variety out instead. This ensures that items won’t be rearranging themselves every time you save.
(4)Beware C#’s XMLSerializer.
This doesn’t have to do with data formats explicitly, but it comes into play with the above point and some of the following. You don’t really have a lot of control with C#’s auto-serialization feature, meaning that it’s very hard to change your XML format around for maximum mergeability.
(5)One attribute per line.
Lines like the following look great (I’ll be using [ instead of < to avoid HTML-ifying conflicts):
[img src="your image.awesome" width="5" height="14million"]
And when two people edit that line - even if they edit different attributes - instant conflict! The whole problem can be avoided:
Now when two people edit different attributes, it’s perfectly safe. This has the added benefit of (potentially) improving your readability; I’m sure you’ve seen a ridiculously long tag stuffed with 15 attributes on a single line and had to scroll along trying to parse out what is where.
(6)Put the close of a beginning tag on the next line.
This one’s borderline, because it really breaks human readability by uglifying everything, but it does have a legitimate use:
Now if new attributes get added by multiple people, no real conflict. Whereas before, that ending ] might’ve been a problem. The reason I’m on the fence is because it’s open to abuse, and it’s hard to make a machine know when to abuse and when not to abuse. Observe:
] [/li] …
(7)Don’t put multiple tags on the same line.
Instead of: [tb][td][td] (or whatever, my HTML is rusty and this is just an example)
That way, if people start nesting things or manipulating the tables in some way, they aren’t all thrashing the same line.
That’s all (for now).
I’m sure there are more, and after a couple more brainstorming sessions maybe we’ll return here, but I think at least 6 out of 7 of those are universally good ideas that can be implemented with zero detriment (and possibly very little legwork). And when your colleagues or your users or you go to do a merge, it will be a smoother process.
I don’t want to set the world on fire.