Database Character Set Issue when upgrading

Hi All,

I am running OJS version 3.2.1.1 and I am trying to upgrade to the latest version 3.3.0.11

Up until now my OJS config.inc.php settings relating to charset have been defined as follows:

[i18n]
locale = en_US
client_charset = utf-8
connection_charset = Off
database_charset = Off
charset_normalization = Off

My database has the following charset / collation values:

SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| latin1                   | latin1_swedish_ci    |
+--------------------------+----------------------+

If I dump the DB - a typical table looks like this:

DROP TABLE IF EXISTS `access_keys`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `access_keys` (
  `access_key_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `context` varchar(40) NOT NULL,
  `key_hash` varchar(40) NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `assoc_id` bigint(20) DEFAULT NULL,
  `expiry_date` datetime NOT NULL,
  PRIMARY KEY (`access_key_id`),
  KEY `access_keys_hash` (`key_hash`,`user_id`,`context`)
) ENGINE=InnoDB AUTO_INCREMENT=406 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

During previous version 3.X minor / patch upgrades - this issue has not happened.

The issue: After upgrading - the front-end displays certain special characters like -, ', ", etc as â�~B��~@~\ or �~B

This can be seen when cat’ing the dump file or when editing it using vim on linux as well, but of course not in the DB when inspecting via the mysql command of via phpMyAdmin.

The issue here (I believe) is encoding / wrong charset / collation - I am just not sure how to correct it. :frowning:

I have tried updating the charset and collation options of all tables in PHPMyAdmin to utf8mb4 and utf8mb4_general_ci. I have also set the DB to these values on a global level. I have also found this script which I adapted to try and force converts the content in each table to the correct charset Converting mysql string data form latin1 to utf8 for utf8 data stored in utf8 tables via latin1 connection · GitHub but that also did not yield the correct results.

(Of course these tests were done in a separate DB and upgrades done in a separate hosting env.)

Is there something simple that I am missing here?
Has anyone had similar issues and can provide some advice?

My next step is to find a MySQL guru who can help investigating this, but I thought I would ask here first. Thanks!

TLDR; You need to move to utf8mb4.
You have some posts in the forum talking about this.