Venus: parse errors on many feeds
Sam Ruby
rubys at intertwingly.net
Thu Jun 25 21:30:18 EST 2009
Mary Gardiner wrote:
> On Wed, Jun 24, 2009, Sam Ruby wrote:
>> Mary Gardiner wrote:
>>> I am getting parse errors on many feeds in a way that suggests Feed Parser is
>>> failing. For an example, see the .ini file at
>>> https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
>>> feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
>>> originally at http://blog.gingertech.net/feed/ )
>> Is this still an issue?
>
> Yes.
>> Can you try the following:
>>
>> python tests/reconstitute.py \
>> https://users.puzzling.org/users/mary/venus-test/rss.xml
>
> Output follows:
>
> $ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml
> /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
> import re, time, md5, sgmllib
The above clearly should be fixed, but doesn't appear to be the problem.
> Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml
> HTMLParseError: malformed start tag, at line 4, column 55
On a fresh install of Ubuntu 9.04, adding *only* aptitude install bzr, I
am not seeing this. Nor do I see this on an Ubuntu 8.04.2 machine that
I've installed various things over an extended period of time.
I'll also note that the following produces no matches:
grep -i -r htmlparseerror *
HTMLParseError is the name of the exception raised by the HTMLParser
that is included in Python, but the feed parser does not make use of
that particular library:
grep HTMLParser planet/vendor/feedparser.py
Is there anything unusual about your installation? Are others seeing
this problem?
- Sam Ruby
More information about the devel
mailing list