Venus: parse errors on many feeds

Sam Ruby rubys at intertwingly.net
Thu Jun 25 21:30:18 EST 2009


Mary Gardiner wrote:
> On Wed, Jun 24, 2009, Sam Ruby wrote:
>> Mary Gardiner wrote:
>>> I am getting parse errors on many feeds in a way that suggests Feed Parser is
>>> failing. For an example, see the .ini file at
>>> https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
>>> feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
>>> originally at http://blog.gingertech.net/feed/ )
>> Is this still an issue?
> 
> Yes.
>> Can you try the following:
>>
>> python tests/reconstitute.py \
>>    https://users.puzzling.org/users/mary/venus-test/rss.xml
> 
> Output follows:
> 
> $ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml 
> /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
>   import re, time, md5, sgmllib

The above clearly should be fixed, but doesn't appear to be the problem.

> Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml
> HTMLParseError: malformed start tag, at line 4, column 55

On a fresh install of Ubuntu 9.04, adding *only* aptitude install bzr, I 
am not seeing this.  Nor do I see this on an Ubuntu 8.04.2 machine that 
I've installed various things over an extended period of time.

I'll also note that the following produces no matches:

   grep -i -r htmlparseerror *

HTMLParseError is the name of the exception raised by the HTMLParser 
that is included in Python, but the feed parser does not make use of 
that particular library:

   grep HTMLParser planet/vendor/feedparser.py

Is there anything unusual about your installation?  Are others seeing 
this problem?

- Sam Ruby


More information about the devel mailing list