Limit Posting Length

Sam Ruby rubys at
Thu Jun 8 00:50:38 EST 2006

Baz wrote:
> On 6/7/06, Sam Ruby <rubys at> wrote:
>> If that doesn't meet your needs, the idea of truncating HTML is a bit
>> harder than you might think.   Imagine a HTML table.  If you are merely
>> counting words, then you will likely end up truncating in the middle of
>> a row.
> Sam: mike was referring back to a post I made back in January. I took
> the approach of using counting 'content' (including whitespace[1])
> towards the allowed length, and maintained a stack of opened tags.
> When I reached the length limit, I added an ellipsis and closed open
> tags. You're correct that I wouldn't have handled tables correctly;
> but its not so bad, you'll just be missing cells on one row.
> I was extending the sanitizer in feedparser which is aware of what
> tags should be empty or not. It wasn't 100% correct,  since I didn't
> bother checking whether the content was html/xhtml etc, and its
> horrible from an I18N point of view[2] it was just a rough cut for
> reviewing.

* Note to Jeff: please release a Planet 1.0 ASAP so that we can *
* begin collaborating on features such as these.  Thank you.    *

The code is quite good, and I'll gladly integrate it into my branch of 
Planet.  A config.ini option could specify the truncation length 
(overrideable on a per-feed basis), all one would need to do is to 
substitute <TMPL_VAR content> with <TMPL_VAR summary> in your various 
.tmpl files.

How do people want to handle images?  One approach would be to simply 
substitute the alt text when it is provided and to omit the image 
entirely when it is not.

- Sam Ruby

> -Baz
> [1] whitespace in 'pre' tags and textareas is meaningful, obviously,
> I'm sure there are other cases. To keep the code simple I just assumed
> all whitespace was meaningful.
> [2] theres the rtl/ltr issue, plus deciding what is a word in
> non-western text; and making sure combining characters are not seen as
> punctuation, etc. It would really take a knowledgable user to provide
> guidance on this, which isn't me.

