[Devel] RSS, titles, entities

Murray Cumming murrayc at murrayc.com
Wed May 26 23:38:19 EST 2004


On Tue, 2004-05-25 at 14:00 +1000, Jeff Waugh wrote:
> <quote who="Murray Cumming">
> 
> > Has anyone made more progress with porting planet to feedparser3? I see
> > that the API has changed a bit - some methods that we use now seem to be
> > marked as private (they have a _ at the start).
> 
> I had a quick look. We tend to tweak the internals and route around
> feedparser a bit, so the new, simpler API makes it a bit harder. Mark'll be
> happy to help out, I'm sure.
> 
> > The current feedparser in planet incorrectly escapes character entities
> > (such as &uuml;) in item titles, so I thought it would make sense to fix
> > it in feedparser 3 if necessary.
> 
> This is not a bug. <title> elements in RSS are *not* html or encoded html,
> so you can't put entities in them.

Thaks. I'm willing to believe this, but I can't find anything about it
in the spec.

>  You can, however, use utf-8 characters
> instead. I would be reasonably happy to accept an un-fuck-me patch to make
> this work, for feeds that insist on using entities in non-html elements.
> 
> RSS means forever working around breakage...

Yes, because a lot of feeds from a lot of blog software seems to produce
these elements.

But I think this would be something for feedparser rather than planet. I
have patched my local feedparser.py for this, but won't try to send
anything upstream until we've ported to feedparser 3.

-- 
Murray Cumming
murrayc at murrayc.com
www.murrayc.com






More information about the Devel mailing list