I work with databases – I was a Database Administrator, now I’m a Database Architect. And I’ve always told the developers I work with that I’m responsible for the integrity of the database not the integrity of the data.
Incorrect data really annoys me. Especially the sort which has a canonical source, is only wrong because some moron mis-entered it, and then that wrong data has proliferated across the Internet. If you look down to the right, you’ll see a selection of books from my collection on LibraryThing. Books are an excellent example of the kind of screwed-up data I mean. LibraryThing pulls its book data from several sources. And some of it is just plain wrong – mispelt, inaccurate, incorrect… And yet it would be easy enough to check. Just look at the book itself.
Frank Herbert did not write Threshold The Blue Angles Experience. He wrote Threshold: The Blue Angels Experience. The author of Tom Strong Book 6 is not “various” but Alan Moore and Chris Sprouse (well, they’re the two that get top billing on the cover, although others did contribute).
It’s not just books. It’s CDs too. Whenever I buy a CD, I rip it to MP3s so I can listen to it at work and on my Yeep. And yet half the time I have to go and correct all the mispelt song titles. The Black League did not record a song called ‘Better Angles (Of Our Nature)’ but ‘Better Angels (Of Our Nature)’.
In fact, I don’t see why there can’t be a single canonical source of such data – which would be the publishers, of course. It’s in their interest to ensure it’s correct. After all, how can you order a book or album if they’ve entered the title incorrectly? So why can’t the publishers – the content providers themselves – publish correct data about their products, and allow free access to it by the likes of LibraryThing, GraceNote or last.fm? It’s not that difficult…