Venus: parse errors on many feeds
Mary Gardiner
mary at puzzling.org
Thu Jun 25 11:26:23 EST 2009
On Wed, Jun 24, 2009, Sam Ruby wrote:
> Mary Gardiner wrote:
> > I am getting parse errors on many feeds in a way that suggests Feed Parser is
> > failing. For an example, see the .ini file at
> > https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
> > feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
> > originally at http://blog.gingertech.net/feed/ )
>
> Is this still an issue?
Yes.
>
> Can you try the following:
>
> python tests/reconstitute.py \
> https://users.puzzling.org/users/mary/venus-test/rss.xml
Output follows:
$ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml
/home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import re, time, md5, sgmllib
Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml
HTMLParseError: malformed start tag, at line 4, column 55
File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet
data = feedparser.parse(feed, **options)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse
feedparser.feed(data)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed
sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.6/sgmllib.py", line 104, in feed
self.goahead(0)
File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag
self.finish_endtag(tag)
File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag
self.unknown_endtag(tag)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag
method()
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content
value = self.popContent('content')
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent
value = self.pop(tag)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop
mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats
p = _MicroformatsParser(htmlSource, baseURI, encoding)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__
self.document = BeautifulSoup.BeautifulSoup(data)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/" xmlns:indexing="urn:atom-extension:indexing" indexing:index="no"><access:restriction xmlns:access="http://www.bloglines.com/about/specs/fac-1.0" relationship="deny"/>
<title>Unconfigured Planet</title>
<updated>2009-06-25T01:25:29Z</updated>
<generator uri="http://intertwingly.net/code/venus/">Venus</generator>
<author>
<name>Anonymous Coward</name>
</author>
</feed>
$
More information about the devel
mailing list