Resolving charset encoding mix-ups / mojibake

Hi everyone,

I think I missed an earlier reply. Use whatever character set you like, but just make sure it’s consistent. We generally recommend UTF8 and some form of utf8 collation (utf8_general_ci, utf8_unicode_ci, or one of the newer mb4 types) since most modern MySQL servers will default to it, or use it implicitly.

When you see stuff like that, what’s being said here is that MySQL is failing to be able to compare two strings of text because the collations on the two strings are different. One is utf8_general_ci, and the other is latin1, but the latin1 collation is “implicit” which means that this is being used because that’s what MySQL is defaulting to in its environment. That would be set in the /etc/my.cnf file, and also in the various OJS character set/connection set options in your config.inc.php file.

Fixing this sort of thing is not straight forward. Our approach is to export the MySQL database as a mysql dump file, then use tools like FTFY or iconv to convert the files to UTF-8. We then edit the file to make sure that the collations and character sets on the tables are utf8, make sure that the SET NAMES command at the top of the mysql dump is UTF8, and then re-import (or import into a new mysql database). Then see what you get. It’s rarely perfect, and usually involves trial and error, and even manual editing of database records to fix characters that don’t look right.

Cheers,
Jason

1 Like