Thoughts on Swish-e version 3.

Assumptions * In order to keep Swish-e fast and portable, some key parts need to be written in a compiled language like C.

* C developers are increasingly harder to recruit to OSS projects like Swish-e.

* C is slower to develop and more difficult to maintain than non-compiled languages like Python or Perl.

* To encourage more code contributors to the project and make the project more useful to more people, make the core C parts library modules with well-defined and documented APIs. This makes the code more maintainable and flexible, and allows integration of other IR libraries like Xapian.

Core C Libraries *NOTE The following list is no longer accurate. libswish3 combines all these into one library.*

SwishUtils (libswishu) Common shared functions for things like IO, string handling, times, errors, memory and hashing.

I’ve started this one.

SwishConfig (libswishc) Parse config files into in-memory data structures, and read/write index config headers.

I’ve started this one.

SwishParser (libswishp) Parse documents into properties and wordlist.

I’ve started this one.

SwishIndex (libswishi) Store properties and wordlists.


SwishSearch (libswishs) Parse queries and fetch results from an index.

Could be re-working of existing libswish-e to expect UTF-8 (which SwishUtils supports).