Charset/encoding problem after upgrading from OJS 3.1.2.1 to 3.2.0.3

First of all, thank you for continuous development of OJS. Version 3.2 seems to be a major step forward.

Yesterday I’ve updated our installation to v.3.2.0.3. Today I’ve found out that some Polish language characters are not displayed correctly (question marks instead of a correct symbol).


The question marks appear also in the database, so I guess something went wrong during the process. The problem concerns only the publication_settings table, so mostly titles, abstracts and references.
Moreover, the contributors field in some submissions is empty.

The update process was first tested on my local server and everything seems to be fine there. In both cases it was full package upgrade with web-based upgrade of the db. The only difference is that (for whatever reason) the database encoding on my production server is set to latin1_swedish_ci (by default, I am not sure whether my provider allows me to change that, the encoding for every table is still utf8_general_ci). The thing is it has never caused any problems in the past versions of the system.

I am planning to revert the database and start the update for scratch. I would like to ask you what should I look for to avoid these issues.

Additional information that may be helpful here is that my previous attempts to upgrade OJS to v.3.2.0 were failed because of this error: 1, 2.

On a side note, I would like to issue a possible problem with editing “metadata” of archive issues. At first I wanted to manually edit all corrupted symbols, so I started with one of the older issues. I didn’t want to create a new version of the articles for such a reason, so I’ve unpublished the issue first. But, after that, all articles added to the issue have gone (they are not tied to the issue anymore). While editing them from the “submissions” tab I noticed that their statuses are still “published”, I cannot unpublish them, send them to production and assign to an issue, and edit any of their metadata. Maybe I should have unpublished all articles in the issue separately (not the whole issue), maybe the problem is that the articles were submitted via Import/Export Quick Submit plugin (possibly even in the v.2.4.x of the installation), but I don’t think this is an expected behavior of the new versioning feature.

EDIT: I wrote above that while db encoding is latin1_swedish_ci, the encoding for every table is utf8_general_ci. This is not true. I’ve just noticed that some tables have latin1_swedish_ci encoding as well. Those are: email_templates_settings, publications, publication_settings, publication_galleys, publication_galley_settings and publication_categories. (I am talking about db after the upgrade).

EDIT2: Typos and repeats.

Hi @p_urbanczyk,

When you created your database, did you create it with a UTF-8 character set? On my machine, I use…

CREATE DATABASE my_database_name DEFAULT CHARACTER SET utf8;

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

thanks for quick reply. My production server is hosted by one of the biggest french hosting provider (I am not sure whether using a proper company name is allowed here) and, unfortunately, I have very little control over it. The problem is certainly on my side (at least the updating problem) and tomorrow I’ll check what I can do about it. For now, I’ve just reverted the update.

I think that it might be even this issue, since the connection character set in my db is also set to utf8mb4_unicode_ci. Anyway, I am pretty sure that changing one of these two - character set or connection character set - before an upgrade would do.

At this state of my (reverted) database:

  • character encoding is set to latin1_swedish_ci
  • every single table encoding is utf8_general_ci
  • connection character set is set to utf8mb4_unicode_ci

Seems that I need to bring it to order with the means I have at my disposal. Thank you for your assistance. If you have any note on that, I am open for any suggestions. I wont do anything about it until tomorrow.

Same problem here, all Cyrillic letter in the Titles turn into ??? ???

Hi @jonovski,

Cross-posted here: Multiple languages in ojs3.2

It’s best to stick to a single threat, in order to avoid having the same conversation with different people in multiple places.

Thanks,
Alec Smecher
Public Knowledge Project Team