SV: Re: venus -n option
antonio_eggberg at yahoo.se
Tue Mar 27 03:24:15 EST 2007
--- Morten Høybye Frederiksen <morten at wasab.dk> skrev:
> On 3/25/07, Sam Ruby <rubys at intertwingly.net> wrote:
> > My general advice is to treat the cache as the resource, not the xml.
> > If, for example, you had a program which built a small database of file
> > names and hashes of the content for each, subsequent runs would be able
> > to tell which files are new and which files had changed.
> FWIW: For my WordPress plugin  I simply record the modification
> time and size of each file in the cache, and on subsequent runs only
> reparse the files that are changed. Since I don't check the content, I
> might miss out on some updates, but it's a lot faster not having to
> load each entry.
ok. But it only works with wp :-) or did I miss something.. This could
solve my problem is it available as OS or?? For me parsing takes about
half of the time.
On a different issue I wonder if the following thought has been given
in terms of feed crawling. i.e Adaptive crawling... In my use case
approximately 70% of the blog doesn't get updated once a day how ever
I have some regular daily news site which gets updated by the hour. So
I don't want to visit the 70% of the sites every time. My thought is
that imagine you crawl every hour so..
hour 1 : crawl a feed --> No update
hour 2 : crawl again the same feed --> no update
push the crawl with +2 hours (a config option)
hour 5 : crawl again --> no update
push the crawl another +2 hours (so it will be crawled 4 hours from now)
Same goes for feeds that gets update often but opposite direction.. you
could also have a general check option --> yes which will check all
feeds but that could be once a month user activated..
I think this would minimize network load..
is there anything like that .. the option activity_threshold is bit blurry to
me. Does those feeds never crawled again.. no?
Just some thoughts..
>  http://www.wasab.dk/morten/blog/archives/2006/10/22/wp-venus
Flyger tiden iväg? Fånga dagen med Yahoo! Mails inbyggda
kalender. Dessutom 250 MB gratis, virusscanning och antispam. Få den på: http://se.mail.yahoo.com
More information about the devel