cache corruption because of faulty data.version check

Mark Wielaard mark at
Sun May 23 06:41:41 EST 2010


I saw the cache getting corrupted for rss feeds. After a spider update
with no changes to the feed, the cache would be overridden with a
version that had no self and html links. I tracked it down to the
following change:

commit 33d3ad2a1af226a950838f76b97109febdce36b5
Author: Sam Ruby <rubys at>
Date:   Fri May 14 15:38:44 2010 -0400

    Be more resilient on HTTP errors

diff --git a/planet/ b/planet/
index 59afcb6..9034add 100644
--- a/planet/
+++ b/planet/
@@ -125,7 +125,7 @@ def writeCache(feed_uri, feed_info, data):"Updating feed %s", feed_uri)
     # if read failed, retain cached information
-    if not data.version and feed_info.version:
+    if not data.has_key('version') and feed_info.has_key('version'):
         data.feed = feed_info.feed
         data.bozo = feed_info.feed.get('planet_bozo','true') == 'true'
         data.version = feed_info.feed.get('planet_format')
@@ -147,7 +147,7 @@ def writeCache(feed_uri, feed_info, data):
             data.feed['planet_content_hash'] = data.headers['-content-hash']
     # capture feed and data from the planet configuration file
-    if data.version:
+    if data.has_key('version') and data.version:
         if not data.feed.has_key('links'): data.feed['links'] = list()
         feedtype = 'application/atom+xml'
         if data.version.startswith('rss'): feedtype = 'application/rss+xml'

Since an empty string evaluates to False the second hunk correctly tests
for both has_key and the version field itself. But the first hunk omits
the original check for an empty version string. The attached patch fixes


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Check-data-has-no-version-key-or-version-string-is-e.patch
Type: text/x-patch
Size: 936 bytes
Desc: not available
URL: </archives/devel/attachments/20100522/c6d4cb56/attachment.bin>

More information about the devel mailing list