I can’t count the hours I’ve spent hacking at a foolproof highlighter for HTML. But I’m nearing a really good approximation of foolproof. I’ve posted HTML::HiLiter to the Perl CPAN.
The really hard thing about this was creating a regular expression that is fast enough to be useful but accurate enough to work 99% of the time. I ended up using the HTML::Parser module, which is ‘fast enough’ and very powerful, due to the embedded C code and some good design. I’ve also looked at HTML::Tree but because HTML::Parser was a standard module in Perl 5.6.x it makes more sense to me right now to use a widespread standard. It increases the chance that folks might find HTML::HiLiter useful.
The most recent version (0.11) is due to get posted soon. I’m excited about it: I’ve improved the speed and accuracy, and added several features to help support my other recent project: SWISH::HiLiter — an extension to the SWISH::API class.
Both these projects are open source and come out of my Cray work on CrayDoc. A huge project for me, and a real learning experience: character encodings, HTML syntax, and the power of Perl regular expressions. I’d wager that my Perl skills increased %500 as a result of this project.
If you use it, let me know what you think.
Leave a Reply
You must be logged in to post a comment.