Special Character Problems

  • Application Version
    OJS 3.1.2
  • Description of issue
    Special Character Encoding is failing to produce the intended encoding. In OJS 2, the intended encoding was present.
  • Steps you took leading up to the issue
    Upgraded from OJS 2.3.6.0 > 2.4.8.5 > 3.1.2.0
  • What you tried to resolve the issue
    Verified that config.inc.php settings remained the same per other forum topics that stated that incorrect and consistent were better than correcting. Settings are:
    client_charset = utf-8
    connection_charset = Off
    database_charset = Off
  • Screenshots: none
  • Error log messages if applicable: none

An example is (shows in the database and on the OJS 3 page):
Pérou Hermans
which in OJS 2 showed up in the database in this way, however, it was presented on the page as:
Pérou Hermans

A difference:
Old database had the default collation: latin1_swedish_ci
Old database had the default characterset: latin1

New database had the default collation: utf8mb4_unicode_ci
New database had the default characterset: utf8mb4

Field level on both sides for the author middle name (authors.middle_name, author_settings.setting_value) is utf8_general_ci, utf8

Happy to hear thoughts on a fix.
Best, +A

Hi @AndrewGearhart,

The difference in databases that you’ve noted is indeed the culprit. A long time ago, MySQL databases had a Latin1 default, and recent distros etc. correctly use UTF8. To fix this, you’ll need to work with mysql/mysqldump rather than with OJS’s configuration files. I’d encourage you to work with the MySQL client as well to ensure that, independent of OJS, the content in the database is the format it is supposed to be. See e.g. Convert mysql database from latin1 to utf8 the RIGHT way | Dan Collis-Puro (I haven’t tried this but at a glance it looks like the problem you’re trying to solve).

Once you have everything in your database correctly UTF-8 encoded, the recommended settings for OJS are here: An unexpected error has occurred. Please reload the page and try again. POPUP - #8 by asmecher

Be careful not to accept any new content while your encoding is garbled; if you get a few encodings mixed together, it’ll be very hard to disentangle.

Regards,
Alec Smecher
Public Knowledge Project Team