Limit Posting Length

Baz brian.ewins at gmail.com
Wed Jun 7 23:14:59 EST 2006


On 6/7/06, Sam Ruby <rubys at intertwingly.net> wrote:
> If that doesn't meet your needs, the idea of truncating HTML is a bit
> harder than you might think.   Imagine a HTML table.  If you are merely
> counting words, then you will likely end up truncating in the middle of
> a row.

Sam: mike was referring back to a post I made back in January. I took
the approach of using counting 'content' (including whitespace[1])
towards the allowed length, and maintained a stack of opened tags.
When I reached the length limit, I added an ellipsis and closed open
tags. You're correct that I wouldn't have handled tables correctly;
but its not so bad, you'll just be missing cells on one row.

I was extending the sanitizer in feedparser which is aware of what
tags should be empty or not. It wasn't 100% correct,  since I didn't
bother checking whether the content was html/xhtml etc, and its
horrible from an I18N point of view[2] it was just a rough cut for
reviewing.

-Baz

[1] whitespace in 'pre' tags and textareas is meaningful, obviously,
I'm sure there are other cases. To keep the code simple I just assumed
all whitespace was meaningful.
[2] theres the rtl/ltr issue, plus deciding what is a word in
non-western text; and making sure combining characters are not seen as
punctuation, etc. It would really take a knowledgable user to provide
guidance on this, which isn't me.


More information about the devel mailing list