Marvin’s got some good remarks on Perl’s UTF-8 regexp vis-a-vis tokenizing strings. His remarks are timely, as I have been spending/wasting time lately in libswish3’s C tokenizing functions. My goal was to replace them with Perl regexp matching, but that may have been pre-mature given Marvin’s remarks.