an eddy in the bitstream

Category: projects (Page 22 of 25)

Swish3 Proposal

Thoughts on Swish-e version 3.

Assumptions * In order to keep Swish-e fast and portable, some key parts need to be written in a compiled language like C.

* C developers are increasingly harder to recruit to OSS projects like Swish-e.

* C is slower to develop and more difficult to maintain than non-compiled languages like Python or Perl.

* To encourage more code contributors to the project and make the project more useful to more people, make the core C parts library modules with well-defined and documented APIs. This makes the code more maintainable and flexible, and allows integration of other IR libraries like Xapian.

Core C Libraries *NOTE The following list is no longer accurate. libswish3 combines all these into one library.*

SwishUtils (libswishu) Common shared functions for things like IO, string handling, times, errors, memory and hashing.

I’ve started this one.

SwishConfig (libswishc) Parse config files into in-memory data structures, and read/write index config headers.

I’ve started this one.

SwishParser (libswishp) Parse documents into properties and wordlist.

I’ve started this one.

SwishIndex (libswishi) Store properties and wordlists.

TODO.

SwishSearch (libswishs) Parse queries and fetch results from an index.

Could be re-working of existing libswish-e to expect UTF-8 (which SwishUtils supports).

Characters script

Inspired by Simon Cozens’ secret software idea, here’s a script in my ~/bin dir that I use often. It prints all the glyphs and decimal equivalents for ASCII and any other UTF8 range you specify. I find it especially useful for writing XML/HTML when I want to specify a numerical entity value.

#!/usr/bin/perl # # Copyright 2005 perl@peknet.com # Released under the Free Beer License # # # print chart of chars and matching nums # just latin1 by default # otherwise, specify start/stop numerals at cmd line # # NOTE the ANSI color stuff is unused

use strict; use warnings; use Term::ANSIColor; binmode STDOUT, ‘:utf8’; print ‘ ‘; my $on = color(‘bold’); my $off = color(‘reset’); my $c = 0; my $start = shift @ARGV || 161; my $stop = shift @ARGV || 255; for (33 .. 126, $start .. $stop) { my $n = $_; if ($_ < 100) { $n = ” $n”; } print(“$n “, chr($_), ‘ ‘); if (++$c == 6) { print “\n “; $c = 0; } } print “\n”;

« Older posts Newer posts »

© 2025 peknet

Theme by Anders NorenUp ↑