Tracking Duplicate Posts in Google Reader
1:49 pm April 3rd, 2007 by Sal Cangeloso
While Google’s Reader is arguably the best RSS reader currently available, there are still some issues that have to be addressed. One of these that is not particularly pressing, but that I would have thought Google could have fixed it immediately concerns duplicate posts and feeds.
Though it seems like it would be relatively simple to do, Reader does not delete duplicate feeds if they are added or warn of a duplicate existing. The same has always gone for posts as well- if you read something somewhere and it is posted someplace else, you will inevitably end up reading it again (if you subscribe to both feeds). This second point mainly comes up with sites that repost other sites’ content. I mainly run into this with SeekingAlpha who works with number of bloggers in this capacity but it also happens with people who have a shared feed (which is easy to do through Google).
Today I noticed one such duplicate. Robert Scoble’s shared feed had linked to a post from Marshall K’s blog, to which I am also a subscriber. I noticed this and got to playing around with it- when it was marked as read in the Scoble shared feed it was also considered read in the Marshall K feed, but the converse did not seem to be true. This is to say they when I read it through Marshall K’s feed the Scoble post count did not immediately decrease. Upon further investigation it did in fact been marked, but I had to hit the refresh button in order for Reader to update itself, something which is generally not necessary (but definitely happens from time to time). So I did not have to read “10 Things You Can Do With Mixed Media RSS” twice thanks to Google Reader removing the duplicate.
I have to keep experimenting in order to find out to what extent Reader is removing duplicate posts once they are marked as read. The situation I just described does not seem like the real deal because it is using Google Reader’s shared feed system so the procedure probably isn’t as complex for them as it would otherwise be. It does recognize the two posts as being the redundant, though because of the way Google’s shared posting system works it seems like it is not recognizing a duplicate so much as it is seeing the two as being the same post (with the shared post acting as some kind of shortcut). If the system could actually recognize duplicate content from different feeds- not that this should happen very often- that would be impressive.

