I’ve been putting together a history of CrayDoc for a presentation to the local MidWest XML users’ group.

Turns out the product goes back over 10 years. I inherited it in 2001, when I started in my current job at Cray. There was a time when Cray was owned by SGI, and during that time the documentation server was not known as CrayDoc — it was Dynaweb, a third party product. But starting in 2002, with the shipment of CrayDoc 1.0 (though I guess that versioning is erroneous, now that I know the history), CrayDoc is again a home-grown product. Versions 1 and 2 were 100% Perl, though now we use the SWISH-E search engine, which is written in C.

My work on CrayDoc has been a real education in CGI programming, HTML, databases, and code design. When the presentation is done, I’ll post it here for posterity.

UPDATE 11/22/2004: posted here PDF

Decision ’04

every blog is an election blog, right? politics, shmolitics. but this usa presidential election seems to matter more than most. as if the fate of the world hinges on it.

i wonder, though, if what we lefties are hoping for is not so much a way to avoid a conservative future but a way to forgive the weak-willed past. every dem who talks big now sided with the poor evidence of a puppet president, hungry for war, 2 years ago. our lone voice in the wilderness went down in a freakish plane accident about the same time. did we not protest loud enough before? did we not make our voices heard? did we not say ‘no’ with enough persistent force?

or is it just that half our fellow americans think gwb is doing a fine job as it is. that’s what boggles my mind.

and here’s what grieves me the most: no one is talking publically about why it is that “terrorists” (a pegorative normally spelled s-c-a-p-e-g-o-a-t) would attack the usa in the first place. that seems to me to be the most important issue. what is it about america’s way in the world that angers so many nonamericans? and how might all americans be implicated in that discussion?

perhaps that’s why it never happens.

SWISH-E and ranking algorithms

I’ve been actively making noise on the swish-e discussion list for over a year now. It’s a great open source indexing and searching tool. Love it. Loooove it. How’s that for Geek Love?

Part of the power of swish-e (the product is UPPERCASE, the command is lower, and I’m a lazy typist…) is in the libxml2 parser from the GNOME project. That thing flies. I’ve since started using the libxml2 tools in my other work as well.

Part of my work with swish-e has been in improving the ranking algorithm. I found a wealth of info on that subject, thanks in part to the success of google — which makes it easy to find information about what makes google work so well. How’s that for the tail wagging the dog? Or something like that.

Anyway, this has led me down the road of natural language query and methods of relevance ranking. Pretty dense stuff. My wee brain starts to twist and shudder. But I found this a good start and this even more helpful.

I have an email in to the developer about the open source status of the NITLE Semantic Engine, which looks like a really interesting idea. The author wrote this article about vector ranking, which I found very lucid.

MOTU for sale

UPDATE: it sold. good luck to the next owner; make lots of good recordings.

Shameless consumerism. I’m selling the audio system used to record several projects, including the Brett Larson debut record, and several House of Mercy recordings (with Peter Rasmussen at the helm). I’ve listed it on ebay but you can find pics here.

HTML Highlighting

I can’t count the hours I’ve spent hacking at a foolproof highlighter for HTML. But I’m nearing a really good approximation of foolproof. I’ve posted HTML::HiLiter to the Perl CPAN.

The really hard thing about this was creating a regular expression that is fast enough to be useful but accurate enough to work 99% of the time. I ended up using the HTML::Parser module, which is ‘fast enough’ and very powerful, due to the embedded C code and some good design. I’ve also looked at HTML::Tree but because HTML::Parser was a standard module in Perl 5.6.x it makes more sense to me right now to use a widespread standard. It increases the chance that folks might find HTML::HiLiter useful.

The most recent version (0.11) is due to get posted soon. I’m excited about it: I’ve improved the speed and accuracy, and added several features to help support my other recent project: SWISH::HiLiter — an extension to the SWISH::API class.

Both these projects are open source and come out of my Cray work on CrayDoc. A huge project for me, and a real learning experience: character encodings, HTML syntax, and the power of Perl regular expressions. I’d wager that my Perl skills increased %500 as a result of this project.

If you use it, let me know what you think.

perldoc 5.8.1

Just added the Perl docs for v 5.8.1 to the docs/ section. This is, of course, my favorite programming language… Now it’s searchable via the main search tool. No more waiting if is down.

glibc docs

I added the latest glibc docs to the docs section. Mostly because I needed a quick searchable reference as I teach myself C. Of course, I found out afterwards that glibc is not supported on Mac OS X, so it proved kind of moot. But at least the reference is handy and it was a good exercise in usability. I’ll probably use that method again.