Character encoding OJS

After upgrade from OJS 3.0.1 to OJS I have issues with character encoding. Database in old version of OJS before update was
I changed it to utf8 general_ci
but I still have very garbled characters in citations and other parts where Spanish and other characters appear.
client_charset = utf-8
connection_charset = utf8
database_charset = utf8
charset_normalization = Off

Server connection collation is utf8_general_ci.
Please advise how to make garbled characters readable. Tools as iconv are not available.

I had a similar problem; I had an old 2.4 db which was migrated to 3.x. I tried iconv but without success (was throwing errors), so I ended up using the PHP lib GitHub - neitanod/forceutf8: PHP Class Encoding featuring popular Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings. to build a quick and dirty export/import tool. Using the lib I exported as latin1 (which is actually windows 1252) and reimported as utf8.

Please can you elaborate in more detaisl how you have done that.

I had to do two things to ensure char encoding helped when OJS rendered in the browser.

Firstly, check whether your database is correctly encoded. You can do this by querying tables and checking the data. If your database is not correctly encoded you will need to export out your data as latin1 and reimport as utf8.

To do this I used a PHP library and wrote a script. The most important part of the script is getting the encoding right:

    'default-character-set' => 'utf8',
    'connection-character-set' => 'latin1',

This will ensure you sql dump dumps the correctly encoded character sets.

When you reimport the dumped sql mysql should be able to reencode the imported data correctly.

The second thing I had to do was change the connection encoding. By default. OJS leaves this to the database but because mysql defaults the server to latin1 you will need to override it:

connection_charset = utf8

You can get info about your db encoding using:

SHOW VARIABLES WHERE variable_name like 'char%';
| Variable_name            | Value                      |
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |

Why character_set_server is used and not character_set_connection; no idea. It’s something that probably requires more investigation.

These two issues solved my OJS encoding issue when moving from OJS 2.4 to 3.x. However, we now have people still copying and pasting windows 1252 encoded text (probably due to some users still using older versions of Windows and Word; I’m using Linux all the time which defaults to utf8 so testing this can be difficult) into the database so it looks like we have an ongoing problem.

For now, though, the OJS database should be correctly encoded as utf8.

@haydenyoung, I have the same problem trying to migrate one journal to OJS3. Would you care to share the script that you were using to convert the database?

Just in case anybody else is having the same problem, I’ve stumbled upon this script:

It does the conversion of latin1 characters stored in utf8 database via binary

CONVERT(CAST(CONVERT(`{$column}` USING latin1) AS binary) USING utf8)

You can use it on 2.4.x prior to migration. Just don’t forget to update for utf-8 connection afterwards.

It works.