From sulamita at gmail.com Mon Jun 1 21:59:59 2009 From: sulamita at gmail.com (Sulamita Garcia) Date: Mon, 1 Jun 2009 12:59:59 +0100 Subject: Wordpress.com based blogs feeds Message-ID: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> Hi I'm sorry if it's an old question, but I couldn't find any definitive answer... It seems that blogs hosted in Wordpress.com have only the last blog entry shown in planets powered by Planet, and the summary instead full text, no matter which is the configuration. For example, in this planet http://nerds.valeta.org, running Planet 2.0 and python 2.4.4, my blog (sulamita.net) and others hosted in wordpress.com(like http://aurelio.wordpress.com/) have this problem. My blog is also syndicated in live.linuxchix.org, not running planet, and seems fine. This started several months ago, but only now I have some time to try help the maintainer to find where is the problem. Also, the titles are linked to the planet itself rather the blog being syndicated. for references, you can see my configuration settings here: http://sulamita.wordpress.com/files/2009/06/settings.png Any help is welcome. regards, -- Brain: Prepare yourself for your 15 minutes of fame Pinky: after that, can I have 15 minutes of Macarena? ?v? Sulamita Garcia /(_)\ http://sulamita.net/ ^ ^ -------------- next part -------------- An HTML attachment was scrubbed... URL: /archives/devel/attachments/20090601/f9768ca7/attachment.htm From manolopm at gmail.com Wed Jun 3 03:29:41 2009 From: manolopm at gmail.com (Manolo Padron Martinez) Date: Tue, 2 Jun 2009 18:29:41 +0100 Subject: Problem using planet Message-ID: <55fad2460906021029r413ac995v4b0b76c4e2806c4d@mail.gmail.com> Hi: I'm using planet 2.0 and I found a problem with the parser (maybe the problem is the code generated by the web that I'm trying to put in the planet but I'm not sure). The problematic feed is this http://orangemachine.wordpress.com/category/planet/feed/ and the problem is with this part of the xml pquesada It's seems that the parser is trying to use the tag as a post and when it tries to parse it fails. Any idea of where is the problem (I mean, a bug in the parser or a bug in the code generated by the web) and a solution? Regards from Canary Islands Manuel Padr?n Mart?nez From jhughes at openxtra.co.uk Fri Jun 5 21:34:23 2009 From: jhughes at openxtra.co.uk (Jack Hughes) Date: Fri, 05 Jun 2009 12:34:23 +0100 Subject: New site Message-ID: <4A2902BF.9070704@openxtra.co.uk> Hello, Wow! I thought it was going to be really difficult to set up the planet software and configure it but it was really easy. I set up my new site in no more than an hour or so with the minimum of fuss. The finished article is here: http://www.planetnetworkmanagement.com/ The site collects a lot of the network management related blogs I read together into a single place. Hopefully I'll find out about a whole lot of other great blogs through the site. Thanks for such great software. Regards, Jack Hughes From sulamita at gmail.com Wed Jun 10 19:46:19 2009 From: sulamita at gmail.com (Sulamita Garcia) Date: Wed, 10 Jun 2009 10:46:19 +0100 Subject: Wordpress.com based blogs feeds In-Reply-To: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> Message-ID: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> I've seen several messages on the last weeks but no actual developer answers... is this mailing list still alive? On Mon, Jun 1, 2009 at 12:59 PM, Sulamita Garcia wrote: > Hi > > I'm sorry if it's an old question, but I couldn't find any definitive > answer... > > It seems that blogs hosted in Wordpress.com have only the last blog entry > shown in planets powered by Planet, and the summary instead full text, no > matter which is the configuration. > > For example, in this planet http://nerds.valeta.org, running Planet 2.0 > and python 2.4.4, my blog (sulamita.net) and others hosted in > wordpress.com (like http://aurelio.wordpress.com/) have this problem. My > blog is also syndicated in live.linuxchix.org, not running planet, and > seems fine. This started several months ago, but only now I have some time > to try help the maintainer to find where is the problem. > > Also, the titles are linked to the planet itself rather the blog being > syndicated. > > for references, you can see my configuration settings here: > http://sulamita.wordpress.com/files/2009/06/settings.png > > Any help is welcome. > > regards, > > -- > Brain: Prepare yourself for your 15 minutes of fame > Pinky: after that, can I have 15 minutes of Macarena? > > ?v? Sulamita Garcia > /(_)\ http://sulamita.net/ > ^ ^ > > -- Brain: Prepare yourself for your 15 minutes of fame Pinky: after that, can I have 15 minutes of Macarena? ?v? Sulamita Garcia /(_)\ http://sulamita.net/ ^ ^ Josh Billings - "Every man has his follies - and often they are the most interesting thing he has got." -------------- next part -------------- An HTML attachment was scrubbed... URL: /archives/devel/attachments/20090610/39ce2d3f/attachment.htm From mary at puzzling.org Wed Jun 10 21:25:54 2009 From: mary at puzzling.org (Mary Gardiner) Date: Wed, 10 Jun 2009 21:25:54 +1000 Subject: Wordpress.com based blogs feeds In-Reply-To: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> Message-ID: <20090610112554.GE11389@gertrude.home.puzzling.org> On Wed, Jun 10, 2009, Sulamita Garcia wrote: > I've seen several messages on the last weeks but no actual developer > answers... is this mailing list still alive? Semi-active. There isn't a lot of support for Planet any more, only for the Venus fork at http://intertwingly.net/code/venus/ which is the only version that has been substantially developed in the last three years. It would be really nice if we could finally get the webpage etc to reflect this state of affairs. -Mary From planet at philwilson.org Thu Jun 11 03:47:42 2009 From: planet at philwilson.org (Phil Wilson) Date: Wed, 10 Jun 2009 18:47:42 +0100 Subject: Wordpress.com based blogs feeds In-Reply-To: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> Message-ID: <9159c3dc0906101047p7688364as89a25163093dede9@mail.gmail.com> 2009/6/10 Sulamita Garcia : > I've seen several messages on the last weeks but no actual developer > answers... is this mailing list still alive? When people know the answer, a response is normally forthcoming. In this case it sounds like the Planet is not configured correctly. If it were a plain Venus install then provided that it was looking at http://sulamita.net/feed/ then there is no reason why more than one item should not be displayed. Cheers, Phil From jmorris at namei.org Thu Jun 11 08:30:22 2009 From: jmorris at namei.org (James Morris) Date: Thu, 11 Jun 2009 08:30:22 +1000 (EST) Subject: Wordpress.com based blogs feeds In-Reply-To: <20090610112554.GE11389@gertrude.home.puzzling.org> References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> <20090610112554.GE11389@gertrude.home.puzzling.org> Message-ID: On Wed, 10 Jun 2009, Mary Gardiner wrote: > On Wed, Jun 10, 2009, Sulamita Garcia wrote: > > I've seen several messages on the last weeks but no actual developer > > answers... is this mailing list still alive? > > Semi-active. There isn't a lot of support for Planet any more, only for > the Venus fork at http://intertwingly.net/code/venus/ which is the only > version that has been substantially developed in the last three years. I didn't know any of this. > It would be really nice if we could finally get the webpage etc to > reflect this state of affairs. Indeed. The migration page looks useful. Note that Fedora and EPEL appear to still be supporting planet, so for some users (possibly including myself), it may be better to remain with that as there is a security response team and a package maintainer attached. (I'll file a bz on the Fedora package to ask about upgrading to venus). - James -- James Morris From jmorris at namei.org Thu Jun 11 09:52:12 2009 From: jmorris at namei.org (James Morris) Date: Thu, 11 Jun 2009 09:52:12 +1000 (EST) Subject: Wordpress.com based blogs feeds In-Reply-To: References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com> <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com> <20090610112554.GE11389@gertrude.home.puzzling.org> Message-ID: On Thu, 11 Jun 2009, James Morris wrote: > (I'll file a bz on the Fedora package to ask about upgrading to venus). https://bugzilla.redhat.com/show_bug.cgi?id=505191 -- James Morris From mary at puzzling.org Fri Jun 19 12:33:55 2009 From: mary at puzzling.org (Mary Gardiner) Date: Fri, 19 Jun 2009 12:33:55 +1000 Subject: Venus: parse errors on many feeds Message-ID: <20090619023355.GD16552@gertrude.home.puzzling.org> I am getting parse errors on many feeds in a way that suggests Feed Parser is failing. For an example, see the .ini file at https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is originally at http://blog.gingertech.net/feed/ ) Note that if I use feedparser directly, it has no problem with the file: $ pwd /home/mary/src/venus/trunk/planet/vendor $ python Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> feedparser.__file__ 'feedparser.pyc' >>> feedparser.parse('https://users.puzzling.org/users/mary/venus-test/rss.xml') {'feed': {'lastbuilddate': u'Sun, 14 Jun 2009 14:15:54 +0000', 'subtitle': u"Silvia's blog" ... However, if I run /home/mary/src/venus/trunk/planet.py test.ini, I get HTMLParseError emerging from within feedparser: $ /home/mary/src/venus/trunk/planet.py test.ini /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead import re, time, md5, sgmllib ERROR:planet.runner:Error processing https://users.puzzling.org/users/mary/venus-test/rss.xml ERROR:planet.runner:HTMLParseError: malformed start tag, at line 4, column 55 ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet data = feedparser.parse(feed, **options) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse feedparser.feed(data) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed sgmllib.SGMLParser.feed(self, data) ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 104, in feed self.goahead(0) ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead k = self.parse_endtag(i) ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag self.finish_endtag(tag) ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag self.unknown_endtag(tag) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag method() ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content value = self.popContent('content') ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent value = self.pop(tag) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop mfresults = _parseMicroformats(output, self.baseuri, self.encoding) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats p = _MicroformatsParser(htmlSource, baseURI, encoding) ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__ self.document = BeautifulSoup.BeautifulSoup(data) ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__ BeautifulStoneSoup.__init__(self, *args, **kwargs) ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__ self._feed(isHTML=isHTML) ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed self.builder.feed(markup) ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed self.goahead(0) ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag endpos = self.check_for_whole_start_tag(i) ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag self.error("malformed start tag") ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) System details: - Ubuntu 9.04 - Python 2.6.2 (Ubuntu package 2.6.2-0ubuntu1) - Venus trunk revno 113, which seems to be the latest -Mary From rory.nugent at nyu.edu Wed Jun 24 16:19:00 2009 From: rory.nugent at nyu.edu (Rory Nugent) Date: Wed, 24 Jun 2009 02:19:00 -0400 Subject: [Issue] RSS Feed with a future post Message-ID: Hey everyone, I have a small issue that I was hoping someone may be able to shed some light on. I'm using Planet Venus to pull in many feeds (obviously) and one of the feeds has a post date in the future. September 2010. So, the issue is that this post is ALWAYS displayed first on the page. I don't want this. Is there anyway around this short of having to contact the blog owner? Thanks everyone. -Rory From mary at puzzling.org Wed Jun 24 19:43:17 2009 From: mary at puzzling.org (Mary Gardiner) Date: Wed, 24 Jun 2009 19:43:17 +1000 Subject: [Issue] RSS Feed with a future post In-Reply-To: References: Message-ID: <20090624094317.GB17932@gertrude.home.puzzling.org> On Wed, Jun 24, 2009, Rory Nugent wrote: > So, the issue is that this post is ALWAYS displayed first on the page. > I don't want this. Is there anyway around this short of having to > contact the blog owner? I think the following setting does what you want "ignore_in_feed = updated". So: [http://example.com/atom.xml] name = Example feed ignore_in_feed = updated -Mary From brian.ewins at gmail.com Wed Jun 24 19:48:28 2009 From: brian.ewins at gmail.com (Baz) Date: Wed, 24 Jun 2009 10:48:28 +0100 Subject: [Issue] RSS Feed with a future post In-Reply-To: <20090624094317.GB17932@gertrude.home.puzzling.org> References: <20090624094317.GB17932@gertrude.home.puzzling.org> Message-ID: <2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com> 2009/6/24 Mary Gardiner : > On Wed, Jun 24, 2009, Rory Nugent wrote: >> So, the issue is that this post is ALWAYS displayed first on the page. >> I don't want this. Is there anyway around this short of having to >> contact the blog owner? > > I think the following setting does what you want "ignore_in_feed = > updated". So: > > [http://example.com/atom.xml] > name = Example feed > ignore_in_feed = updated Isn't there this: future_dates = ignore_entry That's supposed to ignore entries in the future until the future date has passed. You can also do future_dates = ignore_date which will just correct the date to today's. See http://www.intertwingly.net/code/venus/docs/normalization.html > > -Mary > -- > devel mailing list > devel at lists.planetplanet.org > http://lists.planetplanet.org/mailman/listinfo/devel > From rubys at intertwingly.net Thu Jun 25 09:09:48 2009 From: rubys at intertwingly.net (Sam Ruby) Date: Wed, 24 Jun 2009 19:09:48 -0400 Subject: Venus: parse errors on many feeds In-Reply-To: <20090619023355.GD16552@gertrude.home.puzzling.org> References: <20090619023355.GD16552@gertrude.home.puzzling.org> Message-ID: <4A42B23C.4060308@intertwingly.net> Mary Gardiner wrote: > I am getting parse errors on many feeds in a way that suggests Feed Parser is > failing. For an example, see the .ini file at > https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the > feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is > originally at http://blog.gingertech.net/feed/ ) Is this still an issue? Can you try the following: python tests/reconstitute.py \ https://users.puzzling.org/users/mary/venus-test/rss.xml - Sam Ruby > Note that if I use feedparser directly, it has no problem with the file: > > $ pwd > /home/mary/src/venus/trunk/planet/vendor > $ python > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) > [GCC 4.3.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import feedparser >>>> feedparser.__file__ > 'feedparser.pyc' >>>> feedparser.parse('https://users.puzzling.org/users/mary/venus-test/rss.xml') > {'feed': {'lastbuilddate': u'Sun, 14 Jun 2009 14:15:54 +0000', 'subtitle': u"Silvia's blog" ... > > However, if I run /home/mary/src/venus/trunk/planet.py test.ini, I get > HTMLParseError emerging from within feedparser: > > $ /home/mary/src/venus/trunk/planet.py test.ini > /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead > import re, time, md5, sgmllib > ERROR:planet.runner:Error processing https://users.puzzling.org/users/mary/venus-test/rss.xml > ERROR:planet.runner:HTMLParseError: malformed start tag, at line 4, column 55 > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet > data = feedparser.parse(feed, **options) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse > feedparser.feed(data) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed > sgmllib.SGMLParser.feed(self, data) > ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 104, in feed > self.goahead(0) > ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead > k = self.parse_endtag(i) > ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag > self.finish_endtag(tag) > ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag > self.unknown_endtag(tag) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag > method() > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content > value = self.popContent('content') > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent > value = self.pop(tag) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop > mfresults = _parseMicroformats(output, self.baseuri, self.encoding) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats > p = _MicroformatsParser(htmlSource, baseURI, encoding) > ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__ > self.document = BeautifulSoup.BeautifulSoup(data) > ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__ > BeautifulStoneSoup.__init__(self, *args, **kwargs) > ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__ > self._feed(isHTML=isHTML) > ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed > self.builder.feed(markup) > ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed > self.goahead(0) > ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead > k = self.parse_starttag(i) > ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag > endpos = self.check_for_whole_start_tag(i) > ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag > self.error("malformed start tag") > ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 115, in error > raise HTMLParseError(message, self.getpos()) > > System details: > - Ubuntu 9.04 > - Python 2.6.2 (Ubuntu package 2.6.2-0ubuntu1) > - Venus trunk revno 113, which seems to be the latest > > -Mary From rory.nugent at nyu.edu Thu Jun 25 11:05:10 2009 From: rory.nugent at nyu.edu (Rory Nugent) Date: Wed, 24 Jun 2009 21:05:10 -0400 Subject: [Issue] RSS Feed with a future post In-Reply-To: <2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com> References: <20090624094317.GB17932@gertrude.home.puzzling.org> <2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com> Message-ID: Thanks Baz. That seemed to do it! On Jun 24, 2009, at 5:48 AM, Baz wrote: > 2009/6/24 Mary Gardiner : >> On Wed, Jun 24, 2009, Rory Nugent wrote: >>> So, the issue is that this post is ALWAYS displayed first on the >>> page. >>> I don't want this. Is there anyway around this short of having to >>> contact the blog owner? >> >> I think the following setting does what you want "ignore_in_feed = >> updated". So: >> >> [http://example.com/atom.xml] >> name = Example feed >> ignore_in_feed = updated > > Isn't there this: > future_dates = ignore_entry > > That's supposed to ignore entries in the future until the future date > has passed. You can also do > future_dates = ignore_date > > which will just correct the date to today's. > > See http://www.intertwingly.net/code/venus/docs/normalization.html > >> >> -Mary >> -- >> devel mailing list >> devel at lists.planetplanet.org >> http://lists.planetplanet.org/mailman/listinfo/devel >> > -- > devel mailing list > devel at lists.planetplanet.org > http://lists.planetplanet.org/mailman/listinfo/devel From mary at puzzling.org Thu Jun 25 11:26:23 2009 From: mary at puzzling.org (Mary Gardiner) Date: Thu, 25 Jun 2009 11:26:23 +1000 Subject: Venus: parse errors on many feeds In-Reply-To: <4A42B23C.4060308@intertwingly.net> References: <20090619023355.GD16552@gertrude.home.puzzling.org> <4A42B23C.4060308@intertwingly.net> Message-ID: <20090625012623.GA3407@comp-pc019.ics.mq.edu.au> On Wed, Jun 24, 2009, Sam Ruby wrote: > Mary Gardiner wrote: > > I am getting parse errors on many feeds in a way that suggests Feed Parser is > > failing. For an example, see the .ini file at > > https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the > > feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is > > originally at http://blog.gingertech.net/feed/ ) > > Is this still an issue? Yes. > > Can you try the following: > > python tests/reconstitute.py \ > https://users.puzzling.org/users/mary/venus-test/rss.xml Output follows: $ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead import re, time, md5, sgmllib Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml HTMLParseError: malformed start tag, at line 4, column 55 File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet data = feedparser.parse(feed, **options) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse feedparser.feed(data) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed sgmllib.SGMLParser.feed(self, data) File "/usr/lib/python2.6/sgmllib.py", line 104, in feed self.goahead(0) File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead k = self.parse_endtag(i) File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag self.finish_endtag(tag) File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag self.unknown_endtag(tag) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag method() File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content value = self.popContent('content') File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent value = self.pop(tag) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop mfresults = _parseMicroformats(output, self.baseuri, self.encoding) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats p = _MicroformatsParser(htmlSource, baseURI, encoding) File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__ self.document = BeautifulSoup.BeautifulSoup(data) File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__ BeautifulStoneSoup.__init__(self, *args, **kwargs) File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__ self._feed(isHTML=isHTML) File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed self.builder.feed(markup) File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed self.goahead(0) File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag endpos = self.check_for_whole_start_tag(i) File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag self.error("malformed start tag") File "/usr/lib/python2.6/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) Unconfigured Planet 2009-06-25T01:25:29Z Venus Anonymous Coward $ From rubys at intertwingly.net Thu Jun 25 21:30:18 2009 From: rubys at intertwingly.net (Sam Ruby) Date: Thu, 25 Jun 2009 07:30:18 -0400 Subject: Venus: parse errors on many feeds In-Reply-To: <20090625012623.GA3407@comp-pc019.ics.mq.edu.au> References: <20090619023355.GD16552@gertrude.home.puzzling.org> <4A42B23C.4060308@intertwingly.net> <20090625012623.GA3407@comp-pc019.ics.mq.edu.au> Message-ID: <4A435FCA.3030906@intertwingly.net> Mary Gardiner wrote: > On Wed, Jun 24, 2009, Sam Ruby wrote: >> Mary Gardiner wrote: >>> I am getting parse errors on many feeds in a way that suggests Feed Parser is >>> failing. For an example, see the .ini file at >>> https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the >>> feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is >>> originally at http://blog.gingertech.net/feed/ ) >> Is this still an issue? > > Yes. >> Can you try the following: >> >> python tests/reconstitute.py \ >> https://users.puzzling.org/users/mary/venus-test/rss.xml > > Output follows: > > $ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml > /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead > import re, time, md5, sgmllib The above clearly should be fixed, but doesn't appear to be the problem. > Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml > HTMLParseError: malformed start tag, at line 4, column 55 On a fresh install of Ubuntu 9.04, adding *only* aptitude install bzr, I am not seeing this. Nor do I see this on an Ubuntu 8.04.2 machine that I've installed various things over an extended period of time. I'll also note that the following produces no matches: grep -i -r htmlparseerror * HTMLParseError is the name of the exception raised by the HTMLParser that is included in Python, but the feed parser does not make use of that particular library: grep HTMLParser planet/vendor/feedparser.py Is there anything unusual about your installation? Are others seeing this problem? - Sam Ruby From mary at puzzling.org Thu Jun 25 22:06:01 2009 From: mary at puzzling.org (Mary Gardiner) Date: Thu, 25 Jun 2009 22:06:01 +1000 Subject: Venus: parse errors on many feeds In-Reply-To: <4A435FCA.3030906@intertwingly.net> References: <20090619023355.GD16552@gertrude.home.puzzling.org> <4A42B23C.4060308@intertwingly.net> <20090625012623.GA3407@comp-pc019.ics.mq.edu.au> <4A435FCA.3030906@intertwingly.net> Message-ID: <20090625120601.GF30426@gertrude.home.puzzling.org> On Thu, Jun 25, 2009, Sam Ruby wrote: > Is there anything unusual about your installation? Are others seeing > this problem? It's a little hard to answer such a question (there are 'usual' installs of Linux distros?)... however... Having done some further investigation (ie, uninstalling every Python package that has to do with HTML parsing), it appears to happen only when the python-beautifulsoup package is installed. Interesting that this upsets Venus, but not Feed Parser when invoked directly (from the Venus source). Systems: Ubuntu 9.04, python 2.6, both systems upgrades rather than fresh installs. The "install python-beautifulsoup, break Venus" pattern seems pretty reliable though. -Mary From rubys at intertwingly.net Thu Jun 25 23:00:39 2009 From: rubys at intertwingly.net (Sam Ruby) Date: Thu, 25 Jun 2009 09:00:39 -0400 Subject: Venus: parse errors on many feeds In-Reply-To: <20090625120601.GF30426@gertrude.home.puzzling.org> References: <20090619023355.GD16552@gertrude.home.puzzling.org> <4A42B23C.4060308@intertwingly.net> <20090625012623.GA3407@comp-pc019.ics.mq.edu.au> <4A435FCA.3030906@intertwingly.net> <20090625120601.GF30426@gertrude.home.puzzling.org> Message-ID: <3d4032300906250600m74db1225o5c16a65db548634a@mail.gmail.com> On Thu, Jun 25, 2009 at 8:06 AM, Mary Gardiner wrote: > On Thu, Jun 25, 2009, Sam Ruby wrote: >> Is there anything unusual about your installation? ?Are others seeing >> this problem? > > It's a little hard to answer such a question (there are 'usual' installs > of Linux distros?)... however... :-) > Having done some further investigation (ie, uninstalling every Python > package that has to do with HTML parsing), it appears to happen only > when the python-beautifulsoup package is installed. Interesting that > this upsets Venus, but not Feed Parser when invoked directly (from the > Venus source). > > Systems: Ubuntu 9.04, python 2.6, both systems upgrades rather than > fresh installs. The "install python-beautifulsoup, break Venus" pattern > seems pretty reliable though. Confirmed. Temporary workaround: try: raise Exception # import BeautifulSoup except: BeautifulSoup = None > -Mary > -- > devel mailing list > devel at lists.planetplanet.org > http://lists.planetplanet.org/mailman/listinfo/devel - Sam Ruby