I had to do two things to ensure char encoding helped when OJS rendered in the browser.
Firstly, check whether your database is correctly encoded. You can do this by querying tables and checking the data. If your database is not correctly encoded you will need to export out your data as latin1 and reimport as utf8.
To do this I used a PHP library and wrote a script. The most important part of the script is getting the encoding right:
'default-character-set' => 'utf8',
'connection-character-set' => 'latin1',
This will ensure you sql dump dumps the correctly encoded character sets.
When you reimport the dumped sql mysql should be able to reencode the imported data correctly.
The second thing I had to do was change the connection encoding. By default. OJS leaves this to the database but because mysql defaults the server to latin1 you will need to override it:
connection_charset = utf8
You can get info about your db encoding using:
SHOW VARIABLES WHERE variable_name like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
Why character_set_server is used and not character_set_connection; no idea. It’s something that probably requires more investigation.
These two issues solved my OJS encoding issue when moving from OJS 2.4 to 3.x. However, we now have people still copying and pasting windows 1252 encoded text (probably due to some users still using older versions of Windows and Word; I’m using Linux all the time which defaults to utf8 so testing this can be difficult) into the database so it looks like we have an ongoing problem.
For now, though, the OJS database should be correctly encoded as utf8.