Character issue, missing text and characters

I am trying to migrate from OJS 2.4 series to OJS 3.X, but want to clean the data before I do. I also need to know what the charset should be set to in order to avoid the issues mentioned below.

Currently, I have the following settings on a Linux box:

The server charset is cp1252 West European (latin1)
The server connection collation is utf8mb4_unicode_ci

The OJS configuration has the following:
client_charset = utf-8
connection_charset = Off
database_charset = Off
charset_normalization = Off

So, I am having a ghostbusters experience on some of the articles.

When an article has diacritical marks or other characters in the title, abstract, or even the author’s bio statement, it displays fine on the website. As soon as I try to edit the submission, the title and abstract fields are empty. The text is saved in the database since it displays on the website, but when you try to edit the submission, it disappears.

Second, I have also noticed that sometimes words with diacritical marks does not always display correctly. For example, François may display as Franois on the frontend.

I have a lot of articles and some do have diacritical marks either in titles, abstracts, and sometimes bio statement. I am looking for a way to fix this automatically.

What should the setting be in the configuration to fix these issues? Should I add utf8mb4_unicode_ci and cp1252 West European (latin1) in the configuration?

How do I fix this?

Please, help!

-Newone

Can anyone please respond?

Hi @newone,

I would suggest working outside of OJS, e.g. with iconv, mysqldump, etc., to make sure your database is properly UTF-8-encoded. You should be able to find guidance on this e.g. in StackOverflow.com – search for keywords like transcode mysql utf-8. Don’t assume that your database is in Latin1 just because that’s your default encoding – you may need to verify what form the database is in via the MySQL command-line client.

Regards,
Alec Smecher
Public Knowledge Project Team

Thanks Alec @asmecher,

I will look into your recommendation. I do not have access to command line, and I do not know the commands for working in command line. I am trying to sort it out so that I can update today.

I checked phpMyAdmin. The Collation for the OJS database is listed as latin1_swedish_ci. The server connection collation is shown as utf8mb4_unicode_ci (there is option to change this)

Most of the tables in the OJS database have latin1_swedish_ci listed in the Collation column, and in the Type column, it is InnoDB.

The remaining tables listed below have utf8_general_ci in the Collation column but these tables have myISAM in the Type column:

user_interests
processes
metadata_description_settings
metadata_descriptions
filter_settings
filters
external_feed_settings
external_feeds
controlled_vocab_entry_settings
controlled_vocab_entries
controlled_vocabs
citation_settings
citations
books_for_review_settings
books_for_review_authors
books_for_review
article_notes

So, I have two different types of collation in one table. I will look into it and report back.