Limit Posting Length
brian.ewins at gmail.com
Wed Jun 7 23:14:59 EST 2006
On 6/7/06, Sam Ruby <rubys at intertwingly.net> wrote:
> If that doesn't meet your needs, the idea of truncating HTML is a bit
> harder than you might think. Imagine a HTML table. If you are merely
> counting words, then you will likely end up truncating in the middle of
> a row.
Sam: mike was referring back to a post I made back in January. I took
the approach of using counting 'content' (including whitespace)
towards the allowed length, and maintained a stack of opened tags.
When I reached the length limit, I added an ellipsis and closed open
tags. You're correct that I wouldn't have handled tables correctly;
but its not so bad, you'll just be missing cells on one row.
I was extending the sanitizer in feedparser which is aware of what
tags should be empty or not. It wasn't 100% correct, since I didn't
bother checking whether the content was html/xhtml etc, and its
horrible from an I18N point of view it was just a rough cut for
 whitespace in 'pre' tags and textareas is meaningful, obviously,
I'm sure there are other cases. To keep the code simple I just assumed
all whitespace was meaningful.
 theres the rtl/ltr issue, plus deciding what is a word in
non-western text; and making sure combining characters are not seen as
punctuation, etc. It would really take a knowledgable user to provide
guidance on this, which isn't me.
More information about the devel