From sulamita at gmail.com Mon Jun 1 21:59:59 2009
From: sulamita at gmail.com (Sulamita Garcia)
Date: Mon, 1 Jun 2009 12:59:59 +0100
Subject: Wordpress.com based blogs feeds
Message-ID: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
Hi
I'm sorry if it's an old question, but I couldn't find any definitive
answer...
It seems that blogs hosted in Wordpress.com have only the last blog entry
shown in planets powered by Planet, and the summary instead full text, no
matter which is the configuration.
For example, in this planet http://nerds.valeta.org, running Planet 2.0 and
python 2.4.4, my blog (sulamita.net) and others hosted in wordpress.com(like
http://aurelio.wordpress.com/) have this problem. My blog is also syndicated
in live.linuxchix.org, not running planet, and seems fine. This started
several months ago, but only now I have some time to try help the maintainer
to find where is the problem.
Also, the titles are linked to the planet itself rather the blog being
syndicated.
for references, you can see my configuration settings here:
http://sulamita.wordpress.com/files/2009/06/settings.png
Any help is welcome.
regards,
--
Brain: Prepare yourself for your 15 minutes of fame
Pinky: after that, can I have 15 minutes of Macarena?
?v? Sulamita Garcia
/(_)\ http://sulamita.net/
^ ^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /archives/devel/attachments/20090601/f9768ca7/attachment.htm
From manolopm at gmail.com Wed Jun 3 03:29:41 2009
From: manolopm at gmail.com (Manolo Padron Martinez)
Date: Tue, 2 Jun 2009 18:29:41 +0100
Subject: Problem using planet
Message-ID: <55fad2460906021029r413ac995v4b0b76c4e2806c4d@mail.gmail.com>
Hi:
I'm using planet 2.0 and I found a problem with the parser (maybe the
problem is the code generated by the web that I'm trying to put in the
planet but I'm not sure).
The problematic feed is this
http://orangemachine.wordpress.com/category/planet/feed/ and the
problem is with this part of the xml
pquesada
It's seems that the parser is trying to use the tag
as a post and when it tries to parse it fails.
Any idea of where is the problem (I mean, a bug in the parser or a bug
in the code generated by the web) and a solution?
Regards from Canary Islands
Manuel Padr?n Mart?nez
From jhughes at openxtra.co.uk Fri Jun 5 21:34:23 2009
From: jhughes at openxtra.co.uk (Jack Hughes)
Date: Fri, 05 Jun 2009 12:34:23 +0100
Subject: New site
Message-ID: <4A2902BF.9070704@openxtra.co.uk>
Hello,
Wow! I thought it was going to be really difficult to set up the planet
software and configure it but it was really easy. I set up my new site
in no more than an hour or so with the minimum of fuss. The finished
article is here: http://www.planetnetworkmanagement.com/
The site collects a lot of the network management related blogs I read
together into a single place. Hopefully I'll find out about a whole lot
of other great blogs through the site.
Thanks for such great software.
Regards,
Jack Hughes
From sulamita at gmail.com Wed Jun 10 19:46:19 2009
From: sulamita at gmail.com (Sulamita Garcia)
Date: Wed, 10 Jun 2009 10:46:19 +0100
Subject: Wordpress.com based blogs feeds
In-Reply-To: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
Message-ID: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
I've seen several messages on the last weeks but no actual developer
answers... is this mailing list still alive?
On Mon, Jun 1, 2009 at 12:59 PM, Sulamita Garcia wrote:
> Hi
>
> I'm sorry if it's an old question, but I couldn't find any definitive
> answer...
>
> It seems that blogs hosted in Wordpress.com have only the last blog entry
> shown in planets powered by Planet, and the summary instead full text, no
> matter which is the configuration.
>
> For example, in this planet http://nerds.valeta.org, running Planet 2.0
> and python 2.4.4, my blog (sulamita.net) and others hosted in
> wordpress.com (like http://aurelio.wordpress.com/) have this problem. My
> blog is also syndicated in live.linuxchix.org, not running planet, and
> seems fine. This started several months ago, but only now I have some time
> to try help the maintainer to find where is the problem.
>
> Also, the titles are linked to the planet itself rather the blog being
> syndicated.
>
> for references, you can see my configuration settings here:
> http://sulamita.wordpress.com/files/2009/06/settings.png
>
> Any help is welcome.
>
> regards,
>
> --
> Brain: Prepare yourself for your 15 minutes of fame
> Pinky: after that, can I have 15 minutes of Macarena?
>
> ?v? Sulamita Garcia
> /(_)\ http://sulamita.net/
> ^ ^
>
>
--
Brain: Prepare yourself for your 15 minutes of fame
Pinky: after that, can I have 15 minutes of Macarena?
?v? Sulamita Garcia
/(_)\ http://sulamita.net/
^ ^
Josh Billings
- "Every man has his follies - and often they are the most interesting
thing he has got."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /archives/devel/attachments/20090610/39ce2d3f/attachment.htm
From mary at puzzling.org Wed Jun 10 21:25:54 2009
From: mary at puzzling.org (Mary Gardiner)
Date: Wed, 10 Jun 2009 21:25:54 +1000
Subject: Wordpress.com based blogs feeds
In-Reply-To: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
<61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
Message-ID: <20090610112554.GE11389@gertrude.home.puzzling.org>
On Wed, Jun 10, 2009, Sulamita Garcia wrote:
> I've seen several messages on the last weeks but no actual developer
> answers... is this mailing list still alive?
Semi-active. There isn't a lot of support for Planet any more, only for
the Venus fork at http://intertwingly.net/code/venus/ which is the only
version that has been substantially developed in the last three years.
It would be really nice if we could finally get the webpage etc to
reflect this state of affairs.
-Mary
From planet at philwilson.org Thu Jun 11 03:47:42 2009
From: planet at philwilson.org (Phil Wilson)
Date: Wed, 10 Jun 2009 18:47:42 +0100
Subject: Wordpress.com based blogs feeds
In-Reply-To: <61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
<61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
Message-ID: <9159c3dc0906101047p7688364as89a25163093dede9@mail.gmail.com>
2009/6/10 Sulamita Garcia :
> I've seen several messages on the last weeks but no actual developer
> answers... is this mailing list still alive?
When people know the answer, a response is normally forthcoming.
In this case it sounds like the Planet is not configured correctly. If
it were a plain Venus install then provided that it was looking at
http://sulamita.net/feed/ then there is no reason why more than one
item should not be displayed.
Cheers,
Phil
From jmorris at namei.org Thu Jun 11 08:30:22 2009
From: jmorris at namei.org (James Morris)
Date: Thu, 11 Jun 2009 08:30:22 +1000 (EST)
Subject: Wordpress.com based blogs feeds
In-Reply-To: <20090610112554.GE11389@gertrude.home.puzzling.org>
References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
<61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
<20090610112554.GE11389@gertrude.home.puzzling.org>
Message-ID:
On Wed, 10 Jun 2009, Mary Gardiner wrote:
> On Wed, Jun 10, 2009, Sulamita Garcia wrote:
> > I've seen several messages on the last weeks but no actual developer
> > answers... is this mailing list still alive?
>
> Semi-active. There isn't a lot of support for Planet any more, only for
> the Venus fork at http://intertwingly.net/code/venus/ which is the only
> version that has been substantially developed in the last three years.
I didn't know any of this.
> It would be really nice if we could finally get the webpage etc to
> reflect this state of affairs.
Indeed. The migration page looks useful.
Note that Fedora and EPEL appear to still be supporting planet, so for
some users (possibly including myself), it may be better to remain with
that as there is a security response team and a package maintainer
attached.
(I'll file a bz on the Fedora package to ask about upgrading to venus).
- James
--
James Morris
From jmorris at namei.org Thu Jun 11 09:52:12 2009
From: jmorris at namei.org (James Morris)
Date: Thu, 11 Jun 2009 09:52:12 +1000 (EST)
Subject: Wordpress.com based blogs feeds
In-Reply-To:
References: <61c2d10906010459x74ae816dw664e0405dacad737@mail.gmail.com>
<61c2d10906100246r65686854r1a9271de3b7cc5bb@mail.gmail.com>
<20090610112554.GE11389@gertrude.home.puzzling.org>
Message-ID:
On Thu, 11 Jun 2009, James Morris wrote:
> (I'll file a bz on the Fedora package to ask about upgrading to venus).
https://bugzilla.redhat.com/show_bug.cgi?id=505191
--
James Morris
From mary at puzzling.org Fri Jun 19 12:33:55 2009
From: mary at puzzling.org (Mary Gardiner)
Date: Fri, 19 Jun 2009 12:33:55 +1000
Subject: Venus: parse errors on many feeds
Message-ID: <20090619023355.GD16552@gertrude.home.puzzling.org>
I am getting parse errors on many feeds in a way that suggests Feed Parser is
failing. For an example, see the .ini file at
https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
originally at http://blog.gingertech.net/feed/ )
Note that if I use feedparser directly, it has no problem with the file:
$ pwd
/home/mary/src/venus/trunk/planet/vendor
$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.__file__
'feedparser.pyc'
>>> feedparser.parse('https://users.puzzling.org/users/mary/venus-test/rss.xml')
{'feed': {'lastbuilddate': u'Sun, 14 Jun 2009 14:15:54 +0000', 'subtitle': u"Silvia's blog" ...
However, if I run /home/mary/src/venus/trunk/planet.py test.ini, I get
HTMLParseError emerging from within feedparser:
$ /home/mary/src/venus/trunk/planet.py test.ini
/home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import re, time, md5, sgmllib
ERROR:planet.runner:Error processing https://users.puzzling.org/users/mary/venus-test/rss.xml
ERROR:planet.runner:HTMLParseError: malformed start tag, at line 4, column 55
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet
data = feedparser.parse(feed, **options)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse
feedparser.feed(data)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed
sgmllib.SGMLParser.feed(self, data)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 104, in feed
self.goahead(0)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead
k = self.parse_endtag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag
self.finish_endtag(tag)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag
self.unknown_endtag(tag)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag
method()
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content
value = self.popContent('content')
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent
value = self.pop(tag)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop
mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats
p = _MicroformatsParser(htmlSource, baseURI, encoding)
ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__
self.document = BeautifulSoup.BeautifulSoup(data)
ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
System details:
- Ubuntu 9.04
- Python 2.6.2 (Ubuntu package 2.6.2-0ubuntu1)
- Venus trunk revno 113, which seems to be the latest
-Mary
From rory.nugent at nyu.edu Wed Jun 24 16:19:00 2009
From: rory.nugent at nyu.edu (Rory Nugent)
Date: Wed, 24 Jun 2009 02:19:00 -0400
Subject: [Issue] RSS Feed with a future post
Message-ID:
Hey everyone,
I have a small issue that I was hoping someone may be able to shed
some light on. I'm using Planet Venus to pull in many feeds
(obviously) and one of the feeds has a post date in the future.
September 2010.
So, the issue is that this post is ALWAYS displayed first on the page.
I don't want this. Is there anyway around this short of having to
contact the blog owner?
Thanks everyone.
-Rory
From mary at puzzling.org Wed Jun 24 19:43:17 2009
From: mary at puzzling.org (Mary Gardiner)
Date: Wed, 24 Jun 2009 19:43:17 +1000
Subject: [Issue] RSS Feed with a future post
In-Reply-To:
References:
Message-ID: <20090624094317.GB17932@gertrude.home.puzzling.org>
On Wed, Jun 24, 2009, Rory Nugent wrote:
> So, the issue is that this post is ALWAYS displayed first on the page.
> I don't want this. Is there anyway around this short of having to
> contact the blog owner?
I think the following setting does what you want "ignore_in_feed =
updated". So:
[http://example.com/atom.xml]
name = Example feed
ignore_in_feed = updated
-Mary
From brian.ewins at gmail.com Wed Jun 24 19:48:28 2009
From: brian.ewins at gmail.com (Baz)
Date: Wed, 24 Jun 2009 10:48:28 +0100
Subject: [Issue] RSS Feed with a future post
In-Reply-To: <20090624094317.GB17932@gertrude.home.puzzling.org>
References:
<20090624094317.GB17932@gertrude.home.puzzling.org>
Message-ID: <2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com>
2009/6/24 Mary Gardiner :
> On Wed, Jun 24, 2009, Rory Nugent wrote:
>> So, the issue is that this post is ALWAYS displayed first on the page.
>> I don't want this. Is there anyway around this short of having to
>> contact the blog owner?
>
> I think the following setting does what you want "ignore_in_feed =
> updated". So:
>
> [http://example.com/atom.xml]
> name = Example feed
> ignore_in_feed = updated
Isn't there this:
future_dates = ignore_entry
That's supposed to ignore entries in the future until the future date
has passed. You can also do
future_dates = ignore_date
which will just correct the date to today's.
See http://www.intertwingly.net/code/venus/docs/normalization.html
>
> -Mary
> --
> devel mailing list
> devel at lists.planetplanet.org
> http://lists.planetplanet.org/mailman/listinfo/devel
>
From rubys at intertwingly.net Thu Jun 25 09:09:48 2009
From: rubys at intertwingly.net (Sam Ruby)
Date: Wed, 24 Jun 2009 19:09:48 -0400
Subject: Venus: parse errors on many feeds
In-Reply-To: <20090619023355.GD16552@gertrude.home.puzzling.org>
References: <20090619023355.GD16552@gertrude.home.puzzling.org>
Message-ID: <4A42B23C.4060308@intertwingly.net>
Mary Gardiner wrote:
> I am getting parse errors on many feeds in a way that suggests Feed Parser is
> failing. For an example, see the .ini file at
> https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
> feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
> originally at http://blog.gingertech.net/feed/ )
Is this still an issue?
Can you try the following:
python tests/reconstitute.py \
https://users.puzzling.org/users/mary/venus-test/rss.xml
- Sam Ruby
> Note that if I use feedparser directly, it has no problem with the file:
>
> $ pwd
> /home/mary/src/venus/trunk/planet/vendor
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import feedparser
>>>> feedparser.__file__
> 'feedparser.pyc'
>>>> feedparser.parse('https://users.puzzling.org/users/mary/venus-test/rss.xml')
> {'feed': {'lastbuilddate': u'Sun, 14 Jun 2009 14:15:54 +0000', 'subtitle': u"Silvia's blog" ...
>
> However, if I run /home/mary/src/venus/trunk/planet.py test.ini, I get
> HTMLParseError emerging from within feedparser:
>
> $ /home/mary/src/venus/trunk/planet.py test.ini
> /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
> import re, time, md5, sgmllib
> ERROR:planet.runner:Error processing https://users.puzzling.org/users/mary/venus-test/rss.xml
> ERROR:planet.runner:HTMLParseError: malformed start tag, at line 4, column 55
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet
> data = feedparser.parse(feed, **options)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse
> feedparser.feed(data)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed
> sgmllib.SGMLParser.feed(self, data)
> ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 104, in feed
> self.goahead(0)
> ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead
> k = self.parse_endtag(i)
> ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag
> self.finish_endtag(tag)
> ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag
> self.unknown_endtag(tag)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag
> method()
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content
> value = self.popContent('content')
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent
> value = self.pop(tag)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop
> mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats
> p = _MicroformatsParser(htmlSource, baseURI, encoding)
> ERROR:planet.runner: File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__
> self.document = BeautifulSoup.BeautifulSoup(data)
> ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__
> BeautifulStoneSoup.__init__(self, *args, **kwargs)
> ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__
> self._feed(isHTML=isHTML)
> ERROR:planet.runner: File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed
> self.builder.feed(markup)
> ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
> self.goahead(0)
> ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
> k = self.parse_starttag(i)
> ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
> endpos = self.check_for_whole_start_tag(i)
> ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
> self.error("malformed start tag")
> ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
> raise HTMLParseError(message, self.getpos())
>
> System details:
> - Ubuntu 9.04
> - Python 2.6.2 (Ubuntu package 2.6.2-0ubuntu1)
> - Venus trunk revno 113, which seems to be the latest
>
> -Mary
From rory.nugent at nyu.edu Thu Jun 25 11:05:10 2009
From: rory.nugent at nyu.edu (Rory Nugent)
Date: Wed, 24 Jun 2009 21:05:10 -0400
Subject: [Issue] RSS Feed with a future post
In-Reply-To: <2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com>
References:
<20090624094317.GB17932@gertrude.home.puzzling.org>
<2faad3050906240248v7cd9bd23hf57a947288269a8d@mail.gmail.com>
Message-ID:
Thanks Baz. That seemed to do it!
On Jun 24, 2009, at 5:48 AM, Baz wrote:
> 2009/6/24 Mary Gardiner :
>> On Wed, Jun 24, 2009, Rory Nugent wrote:
>>> So, the issue is that this post is ALWAYS displayed first on the
>>> page.
>>> I don't want this. Is there anyway around this short of having to
>>> contact the blog owner?
>>
>> I think the following setting does what you want "ignore_in_feed =
>> updated". So:
>>
>> [http://example.com/atom.xml]
>> name = Example feed
>> ignore_in_feed = updated
>
> Isn't there this:
> future_dates = ignore_entry
>
> That's supposed to ignore entries in the future until the future date
> has passed. You can also do
> future_dates = ignore_date
>
> which will just correct the date to today's.
>
> See http://www.intertwingly.net/code/venus/docs/normalization.html
>
>>
>> -Mary
>> --
>> devel mailing list
>> devel at lists.planetplanet.org
>> http://lists.planetplanet.org/mailman/listinfo/devel
>>
> --
> devel mailing list
> devel at lists.planetplanet.org
> http://lists.planetplanet.org/mailman/listinfo/devel
From mary at puzzling.org Thu Jun 25 11:26:23 2009
From: mary at puzzling.org (Mary Gardiner)
Date: Thu, 25 Jun 2009 11:26:23 +1000
Subject: Venus: parse errors on many feeds
In-Reply-To: <4A42B23C.4060308@intertwingly.net>
References: <20090619023355.GD16552@gertrude.home.puzzling.org>
<4A42B23C.4060308@intertwingly.net>
Message-ID: <20090625012623.GA3407@comp-pc019.ics.mq.edu.au>
On Wed, Jun 24, 2009, Sam Ruby wrote:
> Mary Gardiner wrote:
> > I am getting parse errors on many feeds in a way that suggests Feed Parser is
> > failing. For an example, see the .ini file at
> > https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
> > feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
> > originally at http://blog.gingertech.net/feed/ )
>
> Is this still an issue?
Yes.
>
> Can you try the following:
>
> python tests/reconstitute.py \
> https://users.puzzling.org/users/mary/venus-test/rss.xml
Output follows:
$ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml
/home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import re, time, md5, sgmllib
Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml
HTMLParseError: malformed start tag, at line 4, column 55
File "/home/mary/src/venus/trunk/planet/spider.py", line 437, in spiderPlanet
data = feedparser.parse(feed, **options)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 3525, in parse
feedparser.feed(data)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1662, in feed
sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.6/sgmllib.py", line 104, in feed
self.goahead(0)
File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag
self.finish_endtag(tag)
File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag
self.unknown_endtag(tag)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 569, in unknown_endtag
method()
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1512, in _end_content
value = self.popContent('content')
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 849, in popContent
value = self.pop(tag)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 764, in pop
mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 2218, in _parseMicroformats
p = _MicroformatsParser(htmlSource, baseURI, encoding)
File "/home/mary/src/venus/trunk/planet/vendor/feedparser.py", line 1823, in __init__
self.document = BeautifulSoup.BeautifulSoup(data)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
Unconfigured Planet
2009-06-25T01:25:29Z
Venus
Anonymous Coward
$
From rubys at intertwingly.net Thu Jun 25 21:30:18 2009
From: rubys at intertwingly.net (Sam Ruby)
Date: Thu, 25 Jun 2009 07:30:18 -0400
Subject: Venus: parse errors on many feeds
In-Reply-To: <20090625012623.GA3407@comp-pc019.ics.mq.edu.au>
References: <20090619023355.GD16552@gertrude.home.puzzling.org> <4A42B23C.4060308@intertwingly.net>
<20090625012623.GA3407@comp-pc019.ics.mq.edu.au>
Message-ID: <4A435FCA.3030906@intertwingly.net>
Mary Gardiner wrote:
> On Wed, Jun 24, 2009, Sam Ruby wrote:
>> Mary Gardiner wrote:
>>> I am getting parse errors on many feeds in a way that suggests Feed Parser is
>>> failing. For an example, see the .ini file at
>>> https://users.puzzling.org/users/mary/venus-test/test.ini which pulls in the
>>> feed at https://users.puzzling.org/users/mary/venus-test/rss.xml (this feed is
>>> originally at http://blog.gingertech.net/feed/ )
>> Is this still an issue?
>
> Yes.
>> Can you try the following:
>>
>> python tests/reconstitute.py \
>> https://users.puzzling.org/users/mary/venus-test/rss.xml
>
> Output follows:
>
> $ python tests/reconstitute.py http://users.puzzling.org/users/mary/venus-test/rss.xml
> /home/mary/src/venus/trunk/planet/reconstitute.py:16: DeprecationWarning: the md5 module is deprecated; use hashlib instead
> import re, time, md5, sgmllib
The above clearly should be fixed, but doesn't appear to be the problem.
> Error processing http://users.puzzling.org/users/mary/venus-test/rss.xml
> HTMLParseError: malformed start tag, at line 4, column 55
On a fresh install of Ubuntu 9.04, adding *only* aptitude install bzr, I
am not seeing this. Nor do I see this on an Ubuntu 8.04.2 machine that
I've installed various things over an extended period of time.
I'll also note that the following produces no matches:
grep -i -r htmlparseerror *
HTMLParseError is the name of the exception raised by the HTMLParser
that is included in Python, but the feed parser does not make use of
that particular library:
grep HTMLParser planet/vendor/feedparser.py
Is there anything unusual about your installation? Are others seeing
this problem?
- Sam Ruby
From mary at puzzling.org Thu Jun 25 22:06:01 2009
From: mary at puzzling.org (Mary Gardiner)
Date: Thu, 25 Jun 2009 22:06:01 +1000
Subject: Venus: parse errors on many feeds
In-Reply-To: <4A435FCA.3030906@intertwingly.net>
References: <20090619023355.GD16552@gertrude.home.puzzling.org>
<4A42B23C.4060308@intertwingly.net>
<20090625012623.GA3407@comp-pc019.ics.mq.edu.au>
<4A435FCA.3030906@intertwingly.net>
Message-ID: <20090625120601.GF30426@gertrude.home.puzzling.org>
On Thu, Jun 25, 2009, Sam Ruby wrote:
> Is there anything unusual about your installation? Are others seeing
> this problem?
It's a little hard to answer such a question (there are 'usual' installs
of Linux distros?)... however...
Having done some further investigation (ie, uninstalling every Python
package that has to do with HTML parsing), it appears to happen only
when the python-beautifulsoup package is installed. Interesting that
this upsets Venus, but not Feed Parser when invoked directly (from the
Venus source).
Systems: Ubuntu 9.04, python 2.6, both systems upgrades rather than
fresh installs. The "install python-beautifulsoup, break Venus" pattern
seems pretty reliable though.
-Mary
From rubys at intertwingly.net Thu Jun 25 23:00:39 2009
From: rubys at intertwingly.net (Sam Ruby)
Date: Thu, 25 Jun 2009 09:00:39 -0400
Subject: Venus: parse errors on many feeds
In-Reply-To: <20090625120601.GF30426@gertrude.home.puzzling.org>
References: <20090619023355.GD16552@gertrude.home.puzzling.org>
<4A42B23C.4060308@intertwingly.net>
<20090625012623.GA3407@comp-pc019.ics.mq.edu.au>
<4A435FCA.3030906@intertwingly.net>
<20090625120601.GF30426@gertrude.home.puzzling.org>
Message-ID: <3d4032300906250600m74db1225o5c16a65db548634a@mail.gmail.com>
On Thu, Jun 25, 2009 at 8:06 AM, Mary Gardiner wrote:
> On Thu, Jun 25, 2009, Sam Ruby wrote:
>> Is there anything unusual about your installation? ?Are others seeing
>> this problem?
>
> It's a little hard to answer such a question (there are 'usual' installs
> of Linux distros?)... however...
:-)
> Having done some further investigation (ie, uninstalling every Python
> package that has to do with HTML parsing), it appears to happen only
> when the python-beautifulsoup package is installed. Interesting that
> this upsets Venus, but not Feed Parser when invoked directly (from the
> Venus source).
>
> Systems: Ubuntu 9.04, python 2.6, both systems upgrades rather than
> fresh installs. The "install python-beautifulsoup, break Venus" pattern
> seems pretty reliable though.
Confirmed. Temporary workaround:
try:
raise Exception # import BeautifulSoup
except:
BeautifulSoup = None
> -Mary
> --
> devel mailing list
> devel at lists.planetplanet.org
> http://lists.planetplanet.org/mailman/listinfo/devel
- Sam Ruby