an eddy in the bitstream

Category: projects (Page 24 of 25)

Xapian

Bill pointed me at Xapian as a potential direction for a better Swish-e. I like what I’ve seen so far. Xapian is a C++ library for probablistic information retrieval, supports UTF encoding, and provides lots of language bindings via SWIG. Nice. I’ll post more as I play more.

CrayDoc

I’ve been putting together a history of CrayDoc for a presentation to the local MidWest XML users’ group.

Turns out the product goes back over 10 years. I inherited it in 2001, when I started in my current job at Cray. There was a time when Cray was owned by SGI, and during that time the documentation server was not known as CrayDoc — it was Dynaweb, a third party product. But starting in 2002, with the shipment of CrayDoc 1.0 (though I guess that versioning is erroneous, now that I know the history), CrayDoc is again a home-grown product. Versions 1 and 2 were 100% Perl, though now we use the SWISH-E search engine, which is written in C.

My work on CrayDoc has been a real education in CGI programming, HTML, databases, and code design. When the presentation is done, I’ll post it here for posterity.

UPDATE 11/22/2004: posted here PDF

SWISH-E and ranking algorithms

I’ve been actively making noise on the swish-e discussion list for over a year now. It’s a great open source indexing and searching tool. Love it. Loooove it. How’s that for Geek Love?

Part of the power of swish-e (the product is UPPERCASE, the command is lower, and I’m a lazy typist…) is in the libxml2 parser from the GNOME project. That thing flies. I’ve since started using the libxml2 tools in my other work as well.

Part of my work with swish-e has been in improving the ranking algorithm. I found a wealth of info on that subject, thanks in part to the success of google — which makes it easy to find information about what makes google work so well. How’s that for the tail wagging the dog? Or something like that.

Anyway, this has led me down the road of natural language query and methods of relevance ranking. Pretty dense stuff. My wee brain starts to twist and shudder. But I found this a good start and this even more helpful.

I have an email in to the developer about the open source status of the NITLE Semantic Engine, which looks like a really interesting idea. The author wrote this article about vector ranking, which I found very lucid.

« Older posts Newer posts »

© 2025 peknet

Theme by Anders NorenUp ↑