Using regexp_filter (venus)

Sam Ruby rubys at intertwingly.net
Wed Mar 14 01:27:49 EST 2007


Amit Chakradeo (अमित चक्रदेव) wrote:
> Hi,
> 
>    I am trying to filter out some items which have the string 
> GolfNow.com. The examples on the documentation page (URI style) mention 
> that the options to be passed is  exclude but the code looks at the key 
> "--exclude".

shell/py.py adds the -- before the option; meaning that you should not.

> I tried using the following ways, but both of them did not work (I still 
> see articles containing the GolfNow.com string!)
> filters= regexp_sifter.py?exclude=GolfNow\.com
> 
> filters= regexp_sifter.py?--exclude=GolfNow\.com
> 
> But if I pass in the options on command line it seems to work:
> cat cache_item_file | python regexp_sifter.py  --exclude GolfNow\.com
> (no output which is good)
> 
> ???

I ran some tests, and the code seems to be working.

Filters are applied when spidering, adding a filter later won't remove 
anything from the cache.

Could that be the point of confusion?  If so, that can be fixed for most 
cases.... spider could be modified to actively delete cache entries 
which have been filtered.  This will only remove entries which are 
actually present in a feed, and only for feeds that actually change.

> Thanks!
> Amit

- Sam Ruby


More information about the devel mailing list