Swish3 Status 19 Sept 2008

A long hiatus for a full summer and then some contract work.

Some benchmarks of the latest tokenization algorithm shows that pure-ASCII tokenization is about 20% faster and UTF-8 tokenization is about 2% slower. So I’ll take it.

Benchmark was performed by using perl/docmaker.pl to generate 100 random “docs” in both encodings (ASCII and random UTF-8) and then timed using swish_lint with and without the -t option.

Also recently fixed some failing tests on Linux and a memory warning.

Leave a Reply