Strange Characters in abstracts since upgrading to OJS 3.2.1.1 (Bootstrap)

asmecher · September 24, 2020, 5:55pm

There are quite a few directions discussed in this thread and now some upgrade history on your journal, and it’s tough to get an accurate sense of it all remotely – and there’s a risk that things are getting more complicated if your new and old data don’t agree on a consistent character encoding. I’d recommend picking a goal and working towards it.

What we recommend for character set configuration is a normal UTF8 configuration:

client_charset = utf-8
connection_charset = utf8

(In OJS 3.1.2 and older, there used to be a database_charset setting as well. Starting with 3.2.x, this is no longer used and can be ignored.)

In MySQL, your database should also be set up to use UTF8. On mine, it reports:

MariaDB [ojs-stable]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.003 sec)

I achieve this when creating a database by using a DEFAULT CHARACTER SET clause on the CREATE DATABASE statement, e.g.: CREATE DATABASE my_database_name DEFAULT CHARACTER SET utf8;

Having everything set up properly for UTF-8, you may still have problems with accented characters if your existing data is not properly encoded. Dealing with this can be a pain – you’ll need to convert (transcode) your database contents. You might find guidance on that on Stackoverflow.com.

It’ll be tempting to just experiment with settings to see if they work – which you can do, of course, but be careful not to mix different encodings into your database, or it’ll be a lot tougher to disentangle.

Regards,
Alec Smecher
Public Knowledge Project Team