Problems displaying UTF-8 encoded latin characters

After upgrading from versión 2.4.6 to 3.3.0.5 special characters such as: ñ, à… does not display correctly and get displayed something like this: Psicomotricidad, Movimiento y Emoción.

I’ve validated that the client_charset and connection_charset and both are set to UTF-8 and if I copy the text printed and run it via the PHP function utf8_decode it is printed correctly.

i18n relevant configuration

[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = utf-8

; Database connection character set
connection_charset = utf8

Is there any way to fix this issue with a configuration parameter or re-encode the database without having to edit every entry?

Hi @evaldescu

It sounds like your database was originally in latin1, and you switched it to utf8 during the upgrade. Can you look in your database, via PHPMyAdmin or something, and look at the character encodings on the database tables?

What happens if you try using latin1 as a client character set?

Cheers,
Jason

Thanks @jnugent!

Yes I can access the databse via PHPMyAdmin and I can tell the following:

All the tables are in either utf8_general_ci or utf8mb4_general_ci

Also I tried using latin1 as client_charset and everything breaks even worse.

Using UTF-8

image

Using latin1

image

Hi @evaldescu

If you still have your 2.4.6 config.inc.php file handy, can you share the [i18n] section in it? Just that part please, not the whole file with your database credentials.

Best
Jason

Hi @jnugent

This is i18n for the 2.4.6 install:

[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = utf-8

; Database connection character set
; Must be set to “Off” if not supported by the database server
; If enabled, must be the same character set as “client_charset”
; (although the actual name may differ slightly depending on the server)
connection_charset = Off

; Database storage character set
; Must be set to “Off” if not supported by the database server
database_charset = Off

; Enable character normalization to utf-8 (recommended)
; If disabled, strings will be passed through in their native encoding
; Note that client_charset and database collation must be set
; to “utf-8” for this to work, as characters are stored in utf-8
charset_normalization = Off

Aha. I think that what is happening here is that although your client character set for 2.4.6 is utf-8, your connection character set was defaulting to latin1 because it’s not specified. Maybe also your database _charset. You could try setting connection_charset to Off as well in your OJS 3 site and see how that goes.

Cheers,
Jason

Noup, it didn’t work:

i18n with UTF-8
[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = utf-8

; Database connection character set
;connection_charset = utf8_unicode_ci
;database _charset = utf8_unicode_ci
charset_normalization = Off

image

i18n using latin1

;;;;;;;;;;;;;;;;;;;;;;;;;
; Localization Settings ;
;;;;;;;;;;;;;;;;;;;;;;;;;

[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = latin1

; Database connection character set
;connection_charset = utf8_unicode_ci
;database _charset = utf8_unicode_ci
charset_normalization = Off

image

Okay, so what I’m concerned about now is that during the upgrade process, if the settings weren’t the same, your content has become mix-encoded. Fixing this can be a huge hassle, so I’d probably recommend a restore to 2.4.6 at this point and a careful review of your existing database configuration and then running another upgrade.

At PKP|PS we get clients migrating to us with databases like this occasionally, and we have good luck with FTFY, but it requires a database SQL file to work on, and your mileage may vary.

https://ftfy.readthedocs.io/en/latest/#

Best,
Jason

1 Like

Thanks! That did the trick now everything looks as expected.

1 Like