Thursday, March 06, 2008

Against "Link Rot": Using citations in blog posts helps combat decaying usefulness of web links

This post is really for other blog writers, but the rest of y'all mere readers can still check it out if you like:

Perhaps it's hubris to propose a uniform blog citation standard. Perhaps the point of blogs is their freedom from conforming to the constraints of other media. But the blogosphere is still early in its development as an information medium, and I'm beginning to wonder if it isn't time for those of us who write seriously online to begin to professionalize what we're doing just a bit.

Bruce Schneier in Wired Magazine gives a name to a concept about which I've given a lot of thought: "Link rot," or when the links in your blog or website get old and no longer link to the original, relevant, material. His point is that when Third Parties control your information, it might disappear at their whimsy. It's an excellent essay that I recommend, but here's a snippet of the problem he describes ("Third parties controlling information," Feb. 27):

Bits and pieces of the web disappear all the time. It's called "link rot," and we're all used to it. A friend saved 65 links in 1999 when he planned a trip to Tuscany; only half of them still work today. In my own blog, essays and news articles and websites that I link to regularly disappear -- sometimes within a few days of my linking to them.

It may be because of a site's policies -- some newspapers only have a couple of weeks on their website -- or it may be more random: Position papers disappear off a politician's website after he changes his mind on an issue, corporate literature disappears from the company's website after an embarrassment, etc. The ultimate link rot is "site death," where entire websites disappear: Olympic and World Cup events after the games are over, political candidates' websites after the elections are over, corporate websites after the funding runs out and so on.

I find this to be a particular problem with political, legal, and other nonfiction weblogs. If your blog is about your family life, your links and sources don't matter much. But if you're writing about the issues of the day, as this blog attempts to do, part of the credibility of any argument is inevitably one's sourcing. That's why you put the link there in the first place, so anyone who wanted to read the backup to your argument could go do so for themselves.

In the 3-1/2 years I've been writing on Grits I've developed my own protocols to combat this troublesome "link rot," using an abbreviated citation whenever I pull quotations I want to preserve for future use (in future arguments, public testimony, policy reports, or other information products). It's the same citation formula used with Schneier's quote above. You might have noticed before if you're a regular Grits reader, see here, here, and here. Essentially I put in parentheses the headline of the article, the date, and make sure somewhere in the post is the name of the publication from which I pulled the data.

Most bloggers, by contrast, simply link within a sentence to the point they're referencing, or even just say "Go here." Kuff is a good example of that blogging style, as is Schneier himself.

Putting enough data for a footnotable citation for the "money quotes" used in blog posts saves me a lot of time and legwork, and is one of the main reasons I blog in the first place. You use your judgment, of course, and I don't put a citation for every quote used on Grits - just those bits of datum from transitory media websites and other spots that I fear might be subject to "link rot," to use Schneier's term.

For example, I don't use citations when linking to other blogs (though some of them ultimately go offline), or in a passing reference when I don't pull any specific "money quote" or statistic. But when I use a significant quote from a source, I try to record at least enough information so that later I could footnote the argument without finding the original item, something I've done MANY times. That way, blog arguments and sourcing posted here remain valid even if, over time, the post suffers from "link rot," and I (or others) don't have to go back through old newspaper archives looking to pull one quote you didn't source properly.

Schneier's main point is that if a third party controls your web access (i.e., if you don't own your own server), you aren't fundamentally in control of your own data, citing a wine lovers user group that lost years worth of hosted discussions. My own inadequate solution to that is to back up my blog posts once a month in a text file on my hard drive. That doesn't preserve discussions in the comments, though, if Google's blog architecture ever went down.

That said, third party hosting can also work positively against link rot. For me, continuing to use a Blogspot blog after all this time is partially a long-term homage to the reality of link rot. If I quit writing on Grits tomorrow, there's a lot of useful information here from years past that people access all the time via search engine, etc.. If I were paying for a blog host, when the contract ran out that information would go off the web. On Blogger, presumably old blogs stay online ad infinitum, at least until Google changes its policy (which seems unlikely in the near-term).

Link rot is an annoyance, not a big problem, but adding footnotable citation data to the "money quotes" on blogs adds value and staying power to the medium, and adds tremendous long-term use-value for the writer, as well, I can tell you from first-hand experience.


Anonymous said...

> Perhaps it's hubris to propose a uniform blog citation standard.

Well, it isn't as if blogging standards haven't been suggested before. And such efforts haven't gotten any traction the times I've heard of them. In fact, they get ridiculed, for the most part, though not primarily for the standards themselves, but for the notion that anyone call tell another blogger how to run his/her site.

That said, there are things which are annoying, and poor citations is one of those things. Though I don't always follow them, I do have some standards I like to use when linking and quoting. Such as putting the link immediately before the quoted portion, or inside the blockquote tags. Comment not one's own should always be clearly seperated. But I see blogs where the post title can sometimes be a link to the post itself, or a link to the referenced URL. Of course, the worst offenders in that category are what I call "harvester" blogs, that do nothing but repost using some automated means. If I knew of a good way to either reliably block those latter types, or nuke them from orbit, I would.

Though it's rarely used, HTML has a CITE tag, for citations. I hadn't thought about this tag in this way, but your post made me think that it ought to be pretty easy to make a habit of using it, and then putting together a tool of some sort that would index your blog's CITE tags, and perhaps allow you to specify additional meta-data about them. It's possible there's some other sort of Web 2.0 gunk that'll already do that.

Regarding link rot, isn't much to do about it other than quote as much as you want to retain, and hope you don't get whacked for copyright. I haven't checked recently, but at times when I've revisited old posts, I've found that links to AP stories via Yahoo are dead. I think Yahoo has a limited retention.

Actually, from a technical point of view, it wouldn't be difficult, I don't think, to extend the CITE tag with attributes to make it more useful.

However, getting bloggers to change their ways is going to be like herding cats.

Gritsforbreakfast said...

Excellent idea on the CITE tag, Jed. I'm not the technical wizard to figure it out, but it's a good lead. As for your observation that

"getting bloggers to change their ways is going to be like herding cats."

You're 100% right, no doubt, which is why I'm really just proposing for discussion that it'd be a good idea, leading by example where possible, and engaging in ZERO cat herding. But so many bloggers use pull quotes anyway (which is fine within limits of fair use), and it's such little extra effort to add citable info (30-60 seconds extra, perhaps, per blog post), I'd think some folks would find it useful for their own purposes, as I have. best,

Ed Veronda said...

Thanks for posting this. As a new blogger, I certainly learned a lot from this posting.

John D. McLauchlan said...

As far as preserving comments, you can save those pages, too. On my computer, I can save a page with comments on it by using "Save page as". When the file name comes up, it says "comments.g" I replace the g with html. Obviously, that can be a rather tedious process and on a blog where the comments are numerous and lengthy, saving each individual page might eat up a lot of drive space. It's a thought. BTW, I am by no means a wiz at blogging or html, so this is offered with a grain of salt.

John D. McLauchlan said...

Actually, I just discovered that comment.g will open in Word. That's how much I know.

Anonymous said...

Does that mean that certain facts and commentaries are less relevant if they do not share the opinion of an authority figure? Talk about third party power to keep discussion "within the box"! I rarely use third party links in my blog - which might explain why nobody goes there. Can't get too much authority figure input.

Gritsforbreakfast said...

I don't think that's the implication, JT. But IF you source to third parties, it's nice for the sourcing to remain valid over time. If you use a pull quote from "here", but here doesn't exist in two weeks, or two years, I don't usually feel like I can use the information in a public setting without additional citation.

I sometimes write posts without external links that are simply thought pieces - there's nothing wrong with that at all - the suggestion was for those bloggers (and they are many) who are linking to outside sources, anyway. best,

Stephen Gustitis said...

I have endured "link rot" like the rest of us, but I sure didn't think about its effect on how we source our blog material. Thanks again for this insight and great work all-around.


Lauren said...

I'm a researcher, not a blogger, but I've had this problem with government websites especially. Of course, I am required to cite my sources more systematically, but it's the same idea. It might be onerous given the volume of writing you all do, but I save news articles and important webpages as pdfs and keep them on my hard drive. When I need to read, quote, or cite an article from three years ago, I have it. This doesn't solve the link rot problem, but might help your problem of losing original sources.

Gritsforbreakfast said...

lmart, I do that too with government reports, in particular, though I've not done a great job of indexing it all. That's actually becoming a real problem on my computer, FWIW.

Lately I've been putting some such documents and also spreadsheets into Google documents format and creating a public link that, to my knowledge stays up permanently (until Google changes its policies).

On newsclips, though, as a researcher part of the way I use the blog is to keep the part of the article I want as a pull quote, and put the title, publication and date there for a possible future footnote (I'm not sure how much more "systematic" you'd want to be for news articles). I see putting that information in a blog post as basically an open sourced, public version of what you're talking about, a service blogs can provide to future researchers, historians, etc. as they discuss the issues of the day.

Thanks for the good comments, folks!

Charles Kuffner said...

This is why I try to quote the relevant material when I link to something. I generally have no trouble later looking for whatever quote I need. FWIW.

Anonymous said...

Cool post as for me. I'd like to read more about this theme. Thanx for sharing that material.
Joan Stepsen
Computer geeks