I am trying to migrate from OJS 2.4 series to OJS 3.X, but want to clean the data before I do. I also need to know what the charset should be set to in order to avoid the issues mentioned below.
Currently, I have the following settings on a Linux box:
The server charset is cp1252 West European (latin1)
The server connection collation is utf8mb4_unicode_ci
The OJS configuration has the following:
client_charset = utf-8
connection_charset = Off
database_charset = Off
charset_normalization = Off
So, I am having a ghostbusters experience on some of the articles.
When an article has diacritical marks or other characters in the title, abstract, or even the author’s bio statement, it displays fine on the website. As soon as I try to edit the submission, the title and abstract fields are empty. The text is saved in the database since it displays on the website, but when you try to edit the submission, it disappears.
Second, I have also noticed that sometimes words with diacritical marks does not always display correctly. For example, François may display as Franois on the frontend.
I have a lot of articles and some do have diacritical marks either in titles, abstracts, and sometimes bio statement. I am looking for a way to fix this automatically.
What should the setting be in the configuration to fix these issues? Should I add utf8mb4_unicode_ci and cp1252 West European (latin1) in the configuration?
I would suggest working outside of OJS, e.g. with iconv, mysqldump, etc., to make sure your database is properly UTF-8-encoded. You should be able to find guidance on this e.g. in StackOverflow.com – search for keywords like transcodemysqlutf-8. Don’t assume that your database is in Latin1 just because that’s your default encoding – you may need to verify what form the database is in via the MySQL command-line client.
Regards,
Alec Smecher
Public Knowledge Project Team
I will look into your recommendation. I do not have access to command line, and I do not know the commands for working in command line. I am trying to sort it out so that I can update today.
I checked phpMyAdmin. The Collation for the OJS database is listed as latin1_swedish_ci. The server connection collation is shown as utf8mb4_unicode_ci (there is option to change this)
Most of the tables in the OJS database have latin1_swedish_ci listed in the Collation column, and in the Type column, it is InnoDB.
The remaining tables listed below have utf8_general_ci in the Collation column but these tables have myISAM in the Type column: