[OJS 3.3.0.8] Error encoding character after Upgrade to OJS 3.3.0.8

Hi everybody,
I would need help to solve this font problem, I have several jobs that have wrong letters like: ’ or [“do you have any idea how to correct this error?

I have the setting like this:

;;;;;;;;;;;;;;;;;;;;;
Database Settings ;
;;;;;;;;;;;;;;;;;;;;;

; Database collation
collation = utf8_general_ci

;;;;;;;;;;;;;;;;;;;;;;;;;
Localization Settings ;
;;;;;;;;;;;;;;;;;;;;;;;;;

; Client output/input character set
client_charset = utf-8

Database connection character set
connection_charset = utf8

Thanks for every support.
Bye
Tiziano

Hi @Tiziano ,

did you check the collation at your database? Are you actually using utf8_general_ci on every table?

Hi @gonzalognzl, yes every table has utf8_general_ci

Hi Tiziano,

Try the following:

Login to your installation database

$ mysql your-database -u your-database-user -p

Run this query to check your character set:

mysql> SELECT TABLE_NAME, COLUMN_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'your-database' AND COLLATION_NAME IS NOT NULL AND COLLATION_NAME != 'utf8_general_ci';

If the above query returns any tables with collations that are not ‘utf8_general_ci’ these collations will need to be converted to ‘utf8_general_ci’

mysql> ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

Please let me know if you need any further assistance.

Best regards,

Josh Noronha (he/his)
Systems Specialist
PKP|PS Support Team

Subscribe to the PS Bulletin for quarterly updates on
PKP software, support, and community events

Hi @jnoronha , thanks for your help, I followed your advice, but I was sure that all the tables were in utf8_general_ci, I confirm that, I still need your help to solve this that looks like an upgrade bug, because, we have many journals with OJS, and for others when I upgraded to version 3.3.0.8 they did not have this result, all the characters are correct, I could not tell the difference.

Bye
Tiziano

Hi @Tiziano,
did you check database-collation as well? There is not only a collation for each table, but for the database as a whole.
(Gabriele)

Hi @ojs_univie, yes, all the database has the same collation. :slight_smile:

Bye
Tiziano

When you write “all the database” it still sounds to me like you are referring to the tables. Here’s an SQL-query of what I mean:

USE db_name;
SELECT @@character_set_database, @@collation_database;

edit: source MySQL :: MySQL 8.0 Reference Manual :: 10.3.3 Database Character Set and Collation
(Gabriele)

Yes, I ran the command, and here is the result:

Schermata 2021-11-30 alle 13.32.58

What do the broken characters look like in the database? Are they okay there? Which jobs are affected?
(Gabriele)

I have the same problem, I dont know how to fix it? can anyone help? @Tiziano

Hi @thelaris, @ojs_univie solutions have not been found yet, with our technicalities we are still working but there seems to be a bug in the update, Looking with a hex editor, I see this:

000006D0 61 74 69 63 61 20 63 6C 69 6E 69 63 61 20 C3 83
000006E0 C2 A8 20 6E 65 63 65 73 61 72 69 6F 20 70 69
000006F0 C3 83 C2 B9 20 6C 61 76 6F 72 6F 20 66 61 6D 69

The wrong sequences are:
C3 83 C2 A8 which should be an accented lowercase e.
C3 83 C2 B9 which should be an accented lowercase u.

They are valid characters in utf-8, or rather they can be interpreted as valid characters because: in the 2 byte sequences the most significant bits are 11 (here everything starts for C, which has bits 7 and 6 at 1), and the second byte has the 2 most significant bits equal to 10 (here the second bytes start for A, 8 or B and are fine). But they are sequences of 4 bytes to represent each character, and this is not good. The good thing is that we know what should be there instead of garbage. So I went to look here: https://www.utf8-chartable.de/. How is accented lowercase e encoded in utf-8? C3 A8. What about the lowercase accented u? C3 B9.

So:
C3 A8 has become C3 83 C2 A8
C3 B9 has become C3 83 C2 B9

if I remove from both sequences the “middle” 83 C2 I get the right encoding. How the extra bytes have been inserted I don’t know, but it seems the result of a bug, and the suspicion obviously falls on the update procedure.

How can fix this problem?
Thanks for the support

Bye
Tiziano

Many of journal have this problem after update ojs to 3.3.0.8. I do not know fid they fix in 3.3.0.10? @Tiziano @ojs_univie

There must be some kind of latin1 in play here somewhere. I am 100% sure about that bit. I would recommend the following:

I would really like to take your word for it that the collations are all fine, but it would be good to see some proof of that. Maybe provide the output of a query where the broken character can be seen as well.

(Gabriele)