Character encoding OJS 3.1.1.2

After upgrade from OJS 3.0.1 to OJS 3.1.1.2 I have issues with character encoding. Database in old version of OJS before update was latin1_swedish.ci
I changed it to utf8 general_ci
4
but I still have very garbled characters in citations and other parts where Spanish and other characters appear.
client_charset = utf-8
connection_charset = utf8
database_charset = utf8
charset_normalization = Off

Server connection collation is utf8_general_ci.
Please advise how to make garbled characters readable. Tools as iconv are not available.
Thanks

I had a similar problem; I had an old 2.4 db which was migrated to 3.x. I tried iconv but without success (was throwing errors), so I ended up using the PHP lib GitHub - neitanod/forceutf8: PHP Class Encoding featuring popular Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings. to build a quick and dirty export/import tool. Using the lib I exported as latin1 (which is actually windows 1252) and reimported as utf8.

Please can you elaborate in more detaisl how you have done that.

I had to do two things to ensure char encoding helped when OJS rendered in the browser.

Firstly, check whether your database is correctly encoded. You can do this by querying tables and checking the data. If your database is not correctly encoded you will need to export out your data as latin1 and reimport as utf8.

To do this I used a PHP library and wrote a script. The most important part of the script is getting the encoding right:

    'default-character-set' => 'utf8',
    'connection-character-set' => 'latin1',

This will ensure you sql dump dumps the correctly encoded character sets.

When you reimport the dumped sql mysql should be able to reencode the imported data correctly.

The second thing I had to do was change the connection encoding. By default. OJS leaves this to the database but because mysql defaults the server to latin1 you will need to override it:

connection_charset = utf8

You can get info about your db encoding using:

SHOW VARIABLES WHERE variable_name like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Why character_set_server is used and not character_set_connection; no idea. It’s something that probably requires more investigation.

These two issues solved my OJS encoding issue when moving from OJS 2.4 to 3.x. However, we now have people still copying and pasting windows 1252 encoded text (probably due to some users still using older versions of Windows and Word; I’m using Linux all the time which defaults to utf8 so testing this can be difficult) into the database so it looks like we have an ongoing problem.

For now, though, the OJS database should be correctly encoded as utf8.

@haydenyoung, I have the same problem trying to migrate one journal to OJS3. Would you care to share the script that you were using to convert the database?

Just in case anybody else is having the same problem, I’ve stumbled upon this script:

It does the conversion of latin1 characters stored in utf8 database via binary

CONVERT(CAST(CONVERT(`{$column}` USING latin1) AS binary) USING utf8)

You can use it on 2.4.x prior to migration. Just don’t forget to update config.inc.php for utf-8 connection afterwards.

It works.

3 Likes

@szmigieldesign, thank you so much for the script! It worked perfectly in my case as well.

I want to share my experience in case it might be helpful for someone in the future:

Though the script worked as intended, I noticed something peculiar: there were no visible changes in the database when viewed through PHPMyAdmin (that is, the strange characters are still visible). However, I can confirm that the script positively impacted the website’s functionality. The Spanish version of the site now displays correctly, free from any strange characters.

It’s worth mentioning that while the main article content appears problem-free, the discussions in Spanish within each manuscript still show some odd symbols. However, this does not affect the main content, which is the most crucial part. Moreover, new conversations in Spanish are also displaying correctly. This discrepancy between the discussions and manuscript texts might be linked to some changes I made earlier in a few tables (changes in collation probably, I can’t remember because it was some time ago), though I’m yet to confirm this.

P.S.: I updated from version 3.2.1-4 to 3.3.0-16. It was supposed to be using UTF8, but somehow, the tables were being written in latin1.

Once again, thank you very much for your help!