Charset/encoding problem after upgrading from OJS 3.1.2.1 to 3.2.0.3

First of all, thank you for continuous development of OJS. Version 3.2 seems to be a major step forward.

Yesterday I’ve updated our installation to v.3.2.0.3. Today I’ve found out that some Polish language characters are not displayed correctly (question marks instead of a correct symbol).
polish-diacritics
The question marks appear also in the database, so I guess something went wrong during the process. The problem concerns only the publication_settings table, so mostly titles, abstracts and references.
Moreover, the contributors field in some submissions is empty.

The update process was first tested on my local server and everything seems to be fine there. In both cases it was full package upgrade with web-based upgrade of the db. The only difference is that (for whatever reason) the database encoding on my production server is set to latin1_swedish_ci (by default, I am not sure whether my provider allows me to change that, the encoding for every table is still utf8_general_ci). The thing is it has never caused any problems in the past versions of the system.

I am planning to revert the database and start the update for scratch. I would like to ask you what should I look for to avoid these issues.

Additional information that may be helpful here is that my previous attempts to upgrade OJS to v.3.2.0 were failed because of this error: 1, 2.

On a side note, I would like to issue a possible problem with editing “metadata” of archive issues. At first I wanted to manually edit all corrupted symbols, so I started with one of the older issues. I didn’t want to create a new version of the articles for such a reason, so I’ve unpublished the issue first. But, after that, all articles added to the issue have gone (they are not tied to the issue anymore). While editing them from the “submissions” tab I noticed that their statuses are still “published”, I cannot unpublish them, send them to production and assign to an issue, and edit any of their metadata. Maybe I should have unpublished all articles in the issue separately (not the whole issue), maybe the problem is that the articles were submitted via Import/Export Quick Submit plugin (possibly even in the v.2.4.x of the installation), but I don’t think this is an expected behavior of the new versioning feature.

EDIT: I wrote above that while db encoding is latin1_swedish_ci, the encoding for every table is utf8_general_ci. This is not true. I’ve just noticed that some tables have latin1_swedish_ci encoding as well. Those are: email_templates_settings, publications, publication_settings, publication_galleys, publication_galley_settings and publication_categories. (I am talking about db after the upgrade).

EDIT2: Typos and repeats.

Hi @p_urbanczyk,

When you created your database, did you create it with a UTF-8 character set? On my machine, I use…

CREATE DATABASE my_database_name DEFAULT CHARACTER SET utf8;

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

thanks for quick reply. My production server is hosted by one of the biggest french hosting provider (I am not sure whether using a proper company name is allowed here) and, unfortunately, I have very little control over it. The problem is certainly on my side (at least the updating problem) and tomorrow I’ll check what I can do about it. For now, I’ve just reverted the update.

I think that it might be even this issue, since the connection character set in my db is also set to utf8mb4_unicode_ci. Anyway, I am pretty sure that changing one of these two - character set or connection character set - before an upgrade would do.

At this state of my (reverted) database:

  • character encoding is set to latin1_swedish_ci
  • every single table encoding is utf8_general_ci
  • connection character set is set to utf8mb4_unicode_ci

Seems that I need to bring it to order with the means I have at my disposal. Thank you for your assistance. If you have any note on that, I am open for any suggestions. I wont do anything about it until tomorrow.

Same problem here, all Cyrillic letter in the Titles turn into ??? ???

Hi @jonovski,

Cross-posted here: Multiple languages in ojs3.2 - #8 by asmecher

It’s best to stick to a single threat, in order to avoid having the same conversation with different people in multiple places.

Thanks,
Alec Smecher
Public Knowledge Project Team

Hi all,

we have run into the same issue and we only found it after upgrading dev, test and prod system, since to notice it you will need to look at an article that contains special characters in the title.

We have been using a fresh installation of OJS since 3.1.0.1 and did one upgrade with no such issues to 3.1.2.1. All our tables were utf8-general-ci, so when we added the collation setting in config.inc.php we were not prepared to run into this. New tables were created with swedish-general-ci, though and all the “???” mentioned in the comments of @p_urbanczyk and @jonovski

The database seems to have been created with swedish-general-ci. We are wondering why the last upgrade did not cause similar issues? Maybe no new tables were created from 3.1.0.1 to 3.1.2.1

So probably we will do something like the following to amend the whole mess: Create a new database and set collation to utf8-general-ci in the process. Export the data from the old database, import it to the new database. Rename both old and new database to switch from one to the other. We were thinking about just changing the collation within phpmyadmin, but are not sure about possible side-effects.

Wondering if anybody is already beyond this issue and has handled it differently.

Thanks,
Gabriele

I had the same issue after a softaculous update. It changed field collations to latin1_swedish_ci and this messed up the special characters in Titles and Keywords. My quick fix was to duplicate the fields in the database, change the collation to utf8_general_ci, copy the data back in and then correct the special characters. (I didn’t have many to correct.) The relevant fields for me were called setting_value and the tables were controlled_vocab_entry_settings (for keywords), publication_settings (for titles) and announcement_settings (for announcements).