[Devel] RSS, titles, entities

Murray Cumming murrayc at murrayc.com
Wed May 26 23:38:19 EST 2004

On Tue, 2004-05-25 at 14:00 +1000, Jeff Waugh wrote:
> <quote who="Murray Cumming">
> > Has anyone made more progress with porting planet to feedparser3? I see
> > that the API has changed a bit - some methods that we use now seem to be
> > marked as private (they have a _ at the start).
> I had a quick look. We tend to tweak the internals and route around
> feedparser a bit, so the new, simpler API makes it a bit harder. Mark'll be
> happy to help out, I'm sure.
> > The current feedparser in planet incorrectly escapes character entities
> > (such as &uuml;) in item titles, so I thought it would make sense to fix
> > it in feedparser 3 if necessary.
> This is not a bug. <title> elements in RSS are *not* html or encoded html,
> so you can't put entities in them.

Thaks. I'm willing to believe this, but I can't find anything about it
in the spec.

>  You can, however, use utf-8 characters
> instead. I would be reasonably happy to accept an un-fuck-me patch to make
> this work, for feeds that insist on using entities in non-html elements.
> RSS means forever working around breakage...

Yes, because a lot of feeds from a lot of blog software seems to produce
these elements.

But I think this would be something for feedparser rather than planet. I
have patched my local feedparser.py for this, but won't try to send
anything upstream until we've ported to feedparser 3.

Murray Cumming
murrayc at murrayc.com

More information about the Devel mailing list