Using regexp_filter (venus)
rubys at intertwingly.net
Wed Mar 14 01:27:49 EST 2007
Amit Chakradeo (अमित चक्रदेव) wrote:
> I am trying to filter out some items which have the string
> GolfNow.com. The examples on the documentation page (URI style) mention
> that the options to be passed is exclude but the code looks at the key
shell/py.py adds the -- before the option; meaning that you should not.
> I tried using the following ways, but both of them did not work (I still
> see articles containing the GolfNow.com string!)
> filters= regexp_sifter.py?exclude=GolfNow\.com
> filters= regexp_sifter.py?--exclude=GolfNow\.com
> But if I pass in the options on command line it seems to work:
> cat cache_item_file | python regexp_sifter.py --exclude GolfNow\.com
> (no output which is good)
I ran some tests, and the code seems to be working.
Filters are applied when spidering, adding a filter later won't remove
anything from the cache.
Could that be the point of confusion? If so, that can be fixed for most
cases.... spider could be modified to actively delete cache entries
which have been filtered. This will only remove entries which are
actually present in a feed, and only for feeds that actually change.
- Sam Ruby
More information about the devel