[Devel] RSS, titles, entities
Murray Cumming
murrayc at murrayc.com
Wed May 26 23:38:19 EST 2004
On Tue, 2004-05-25 at 14:00 +1000, Jeff Waugh wrote:
> <quote who="Murray Cumming">
>
> > Has anyone made more progress with porting planet to feedparser3? I see
> > that the API has changed a bit - some methods that we use now seem to be
> > marked as private (they have a _ at the start).
>
> I had a quick look. We tend to tweak the internals and route around
> feedparser a bit, so the new, simpler API makes it a bit harder. Mark'll be
> happy to help out, I'm sure.
>
> > The current feedparser in planet incorrectly escapes character entities
> > (such as ü) in item titles, so I thought it would make sense to fix
> > it in feedparser 3 if necessary.
>
> This is not a bug. <title> elements in RSS are *not* html or encoded html,
> so you can't put entities in them.
Thaks. I'm willing to believe this, but I can't find anything about it
in the spec.
> You can, however, use utf-8 characters
> instead. I would be reasonably happy to accept an un-fuck-me patch to make
> this work, for feeds that insist on using entities in non-html elements.
>
> RSS means forever working around breakage...
Yes, because a lot of feeds from a lot of blog software seems to produce
these elements.
But I think this would be something for feedparser rather than planet. I
have patched my local feedparser.py for this, but won't try to send
anything upstream until we've ported to feedparser 3.
--
Murray Cumming
murrayc at murrayc.com
www.murrayc.com
More information about the Devel
mailing list