Spent the last week or 2 totally reworking SWISH::Prog. Reorganized the class layout to mirror the aggregator/parser/indexer/searcher paradigm I described some time ago. It has started to look a little like KinoSearch in that respect, with the addition of the aggregators and parser (which is of course Swish-e’s contribution to IR).
After mulling/experimenting for several days over how best to write the spider, I have decided to use WWW::Mechanize along with WWW::Rules and write from scratch. Then I’ll provide backwards API compat for the Swish-e 2.4 spider.pl script config files/callbacks/etc. This proved easier than a direct port, and allows me to provide extensible caching/queueing/user_agent classes rather than hardcoding everything in a single script/library. I toyed with WWW::CheckSite but in order to make it work with the aggregator API required so many gymnastics it finally became easier to just write the spider myself. And a good programming exercise as well. 🙂
Leave a Reply
You must be logged in to post a comment.