Planet-Venus 0~bzr116-1 on Debian Squid error

Sam Ruby rubys at intertwingly.net
Sat Jun 19 03:32:12 EST 2010


On 06/18/2010 05:22 AM, Matteo Calorio wrote:
> Hello,
>
>
> I get the following list of errors when I try to get feeds from
> http://www.mymovies.it/cinema/xml/rss/:

Looks to be a bug in BeautifulSoup (which is designed to handle non-well 
formed input), which was, at one time used by the FeedParser to parse 
microformats.  I believe that since that time, the FeedParser disabled 
the parsing of microformats as that feature was never completely 
implemented.  In any case, I can verify the feed you mention can be 
successfully parsed by the latest Venus (which incidentally is in git, 
and includes a later version of the feed parser).

> ERROR:planet.runner:Error processing http://www.mymovies.it/cinema/xml/rss/
> ERROR:planet.runner:HTMLParseError: malformed start tag, at line 1, column 91
> ERROR:planet.runner:  File "/usr/lib/pymodules/python2.5/planet/spider.py",
> line 441, in spiderPlanet
>      data = feedparser.parse(feed, **options)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 3525, in
> parse
>      feedparser.feed(data)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 1662, in feed
>      sgmllib.SGMLParser.feed(self, data)
> ERROR:planet.runner:  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
>      self.goahead(0)
> ERROR:planet.runner:  File "/usr/lib/python2.5/sgmllib.py", line 138, in
> goahead
>      k = self.parse_endtag(i)
> ERROR:planet.runner:  File "/usr/lib/python2.5/sgmllib.py", line 315, in
> parse_endtag
>      self.finish_endtag(tag)
> ERROR:planet.runner:  File "/usr/lib/python2.5/sgmllib.py", line 355, in
> finish_endtag
>      self.unknown_endtag(tag)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 569, in
> unknown_endtag
>      method()
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 1414, in
> _end_description
>      value = self.popContent('description')
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 849, in
> popContent
>      value = self.pop(tag)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 764, in pop
>      mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 2218, in
> _parseMicroformats
>      p = _MicroformatsParser(htmlSource, baseURI, encoding)
> ERROR:planet.runner:  File
> "/usr/lib/pymodules/python2.5/planet/vendor/feedparser.py", line 1823, in
> __init__
>      self.document = BeautifulSoup.BeautifulSoup(data)
> ERROR:planet.runner:  File "/usr/lib/pymodules/python2.5/BeautifulSoup.py",
> line 1499, in __init__
>      BeautifulStoneSoup.__init__(self, *args, **kwargs)
> ERROR:planet.runner:  File "/usr/lib/pymodules/python2.5/BeautifulSoup.py",
> line 1230, in __init__
>      self._feed(isHTML=isHTML)
> ERROR:planet.runner:  File "/usr/lib/pymodules/python2.5/BeautifulSoup.py",
> line 1263, in _feed
>      self.builder.feed(markup)
> ERROR:planet.runner:  File "/usr/lib/python2.5/HTMLParser.py", line 108, in
> feed
>      self.goahead(0)
> ERROR:planet.runner:  File "/usr/lib/python2.5/HTMLParser.py", line 148, in
> goahead
>      k = self.parse_starttag(i)
> ERROR:planet.runner:  File "/usr/lib/python2.5/HTMLParser.py", line 226, in
> parse_starttag
>      endpos = self.check_for_whole_start_tag(i)
> ERROR:planet.runner:  File "/usr/lib/python2.5/HTMLParser.py", line 301, in
> check_for_whole_start_tag
>      self.error("malformed start tag")
> ERROR:planet.runner:  File "/usr/lib/python2.5/HTMLParser.py", line 115, in
> error
>      raise HTMLParseError(message, self.getpos())
>
> Regards,
>    Matteo

- Sam Ruby


More information about the devel mailing list