With Unicode it is possible for strings to look the same, but with slight differences in which codepoints are used. For example the é in Café can be <U+0065 U+0301> or <U+00E9>. The solution is to use Unicode normalization, which is supported in every major programming language. Both versions of Café will be normalized to use U+00E9. In the best situation the application inserting data into the database will do the normalization, but that often not the case. This gives the following issue: If you search for Café in the normalized form it won't return non-normalized entries. I made a proof-of-concept parser plugin which indexes the normalized version of words. A very short demo: mysql> CREATE TABLE test1 (id int auto_increment primary key, -> txt TEXT CHARACTER SET utf8mb4, fulltext (txt)); Query OK, 0 rows affected (0.30 sec) mysql> CREATE TABLE test2 (id int auto_increment primary key, -> txt TEXT CHARACTER SET utf8mb4, fulltext (txt) WITH P...