Friday 3 June 2011

Shortened URL Expansion

Twitter provides me with a lot of value through the plenty of interesting and/or useful links that get tweeted by the selection of people that I follow.

Usually when I notice an interesting looking URL, I mark the tweet as favourite, so I can easily find it again later. The twitter.com's favourites interface isn't nearly easy enough for my liking, so I decided to create a webpage that would show me all my favourites links, so I could easily bookmark them on delicious or send them to instapaper.

However, I don't want to bookmark the shortened URL, to prevent link-rot I'm only interested in the actual link hidden behind one or more layers of URL shortening services. In fact I don't even want to see shortened URLs, I would much rather know where I'm going.

My initial attempt to solve this problem was to use the bit.ly api for expanding bit.ly URL-s, which works for a lot of shortened URL-s as many of them go to bit.ly under a different guise.

However, it doesn't work for all. Most importantly, it doesn't work for twitter's own t.co shortener. This also doesn't appear to have an easily discoverable API for expanding the url.

If I can't find a nice way to expand the URL, why not simply keep following the redirects until I hit the actual page I want to bookmark? The following few lines of code should expand any shortened URL, and indeed any redirected URL whether shortened or just hidden behind a redirect.

   protected string expandurl(string url) {
System.Net.HttpWebRequest hwr = (System.Net.HttpWebRequest) System.Net.WebRequest.Create(url);
hwr.Method = "HEAD";
hwr.AllowAutoRedirect = true;
hwr.MaximumAutomaticRedirections = 10;

System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)hwr.GetResponse ();

return response.ResponseUri.ToString();
}


The MaximumAutomaticRedirections setting will stop an endless redirection loop, but also means that you might still end up with a shortened URL if the one you've passed in hides even more lot of layers of redirection.

The HEAD HTTP method should only return the HTTP headers, not the actual content of the page, so using that should reduce network traffic.

Because I'm figuring out which URL is at the end of the redirect chain by following the chain, there is a rather high chance that every time this get's executed the intermediate URL shorteners will register our passing by as a "click" and update the stats for that particular shortened URL (if that service provides that service).

No comments: