This article and this article were very helpful. And so was the fact that we use Template Toolkit and a shared template directory for all sites.
an eddy in the bitstream
This article and this article were very helpful. And so was the fact that we use Template Toolkit and a shared template directory for all sites.
Just uploaded SWISH-Prog 0.03 and Search-Tools 0.02 to the CPAN. They join SWISH-API-More, SWISH-API-Stat and SWISH-API-Object on the CPAN this week.
Whew.
One of the things I like most about using other IR libraries as backends is that many of them offer language bindings in multiple other languages. So PHP, Ruby, Python, etc., users can be happy right away with search ability.
Of course, if the indexing program is Perl instead of C, there is that added requirement. But hopefully, if the indexing API is well documented, there’s nothing to stop implementations in other scripting languages besides Perl.
Take Xapian for example. They have bindings available in nearly every major scripting language. So if you don’t like the way Swish-e implements the indexing scheme, there’s nothing to stop you from writing your own in your favorite language. At which point, you’re not really using Swish-e any more. But you could mix/match depending on your needs. Use Swish-e’s spider and SWISH::Filter, but your own parser and indexer, for example.
The general idea right now is to get the core C libraries functional, at least for SwishParser and SwishConfig. Then start working on the “swish-e” command line program replacement. I intend to write the replacement in Perl, since that will be much easier to write and performance should only see a small hit from startup costs. I’ll use SWISH::Prog to handle the basic spider/fs stuff, as well as config parsing.
Funny: I don’t think I had that in mind when I originally started SWISH::Prog but it now seems like a totally obvious fit.
SWISH::Prog::Config just underwent some major surgery. It can now parse version2 config files using the excellent Config::General, and can convert to the current SwishConfig XML format.
I’ll probably start with a Xapian backend since that’s fairly stable (though UTF-8 support is still not official till 1.0). Need to write SWISH::Index and SWISH::Search APIs (though the latter will likely look just like SWISH::API).
Everything in due time.
Thoughts on the Swish-e project (http://swish-e.org/).
Understand why folks like Swish-e 2 Fast. Easy to configure. Flexible. Keep it that way.
Understand why folks crave Swish-e 3 Folks like Fast, Easy and Flexible. They want to bring those qualities to bear on more difficult challenges.
* I18n demands multi-byte charset support. UTF-8 is the accepted standard. Swish-e 2 is stuck with single-byte charsets.
* Data sets are huge these days. Swish-e 2 doesn’t scale well past a few million documents.
* Huge data sets means lots of time spent indexing. Stable incremental index support (add, update, delete) is a must. Swish-e 2 has incremental support but it is buggy and the code is opaque.
* It’s a polylingual world. Swish-e lacks modern script language bindings beyond Perl.
C == Fast but C == Slow C (or other compiled languages) provide the best speed. The core code base for Swish-e is all in C. C is the *lingua franca* of the open source world.
But C is harder to write than most scripting languages, and takes much longer to develop and debug (and thus maintain). So projects that involve C attract fewer developers from the community. Fewer developers means (on the whole) that development time is slower. A couple good C developers can turn out good code quickly, but as with all OSS, maintenance and legacy become big issues — community (people) issues that become software issues. See the maintenance issue below.
Modular == cool. Monolithic != cool. Swish-e 2 revolves around the swish-e command line tool, a monolithic tool that parses, indexes and searches. A good step has been taken with libswish-e for splitting out search into its own library. Let’s continue that direction by splitting up the parser, indexer and searcher into separate, modular components (libraries). That increases Flexibility (while probably impacting Fast).
Maintainable code is a feature What happens when you get hit by a bus? Or get bored? Or move on? Or change careers and take up that basket weaving profession you’ve always secretly craved? Who will maintain your code? Did you document what you wrote? Did you check in your latest changes? Are your comments clear? Don’t fall into the trap that the code is the documentation. Swish-e is more than the development team *de jour*; folks will keep using it after you’re gone. OSS projects all suffer from this problem: check out the orphaned projects on sf.net.
Community is a feature Getting folks involved is one of the joys (and struggles) of OSS projects. Making it easy for folks to get involved, whether contributing documentation, tests, patches or good beer, is a good way to keep the fun in your own involvement.
Since writing C is a skill that fewer people have, encourage folks to write tests (using the TAP format), documentation, and how-tos.
And consider how much code needs to actually be written in C. Much of the strong parts of Swish-e are actually Perl scripts that support and supplement the core C program.
Don’t reinvent the wheel There are lots of search tools out there. Information retrieval is a hot subject right now. Swish-e 2 has some cool features. It’s Fast, Easy and Flexible. But it doesn’t do everything folks want it to.
However, other projects are strong where Swish-e is weak. There is quality open source IR code out there that does UTF-8, incremental indexing and good scaling. Good programmers are lazy. Let’s use other folks’ code to get the features we want.
Play to your strengths Those other IR projects might be weak where Swish-e 2 is strong. They might be Slow, Hard or Inflexible in key areas. Let’s figure out what makes Swish-e Fast, Easy and Flexible and concentrate on making those parts of the code easy to integrate with the quality pieces from those other IR projects. Remember: modular is cool.
This is supposed to be Fun Remember?
© 2025 peknet
Theme by Anders Noren — Up ↑