Problem encoding German special characters/umlauts when using client_charset = utf-8, database issue [OJS 3.3.0.5]

Dear OJS community,

when upgrading from OJS version 3.2.1.2 to 3.3.0.5, there is a problem encoding German special characters/umlauts.

Screenshot 2021-07-06 091233

This is most likely related to the fact that I had to change the value of the connection_charset function in config.inc.php from previously off to utf8. This is because the upgrade did not work with the value off. Changing the value to off after the upgrade with utf8 is not possible, because the web page content then could not be loaded.

The real problem is that our database already contains characters in wrong encoding. Here is an example of the email_templates_default_data table:

Screenshot 2021-07-06 140916

The collation of the database was originally set to the MYSQL default setting latin1_swedish_ci, I changed it to utf8_general_ci about a year ago.

So the special characters/umlauts inserted incorrectly in the database are output correctly on the website with the setting connection_charset = off in config.inc.php. Only with the mandatory change to connection_charset = off due to the upgrade they are output incorrectly.

By the way, we use the encoding setting client_charset = utf-8 for the client output/input character set.

Now my question: do I have to correct the wrongly encoded database entries by script? Or is there another possibility?

I’ve already tried this using an Emacs Lisp script – I’m more familiar with elisp than PHP – to replace individual incorrect ISO characters with UTF characters in the database. Replacing the characters seems to have worked, but unfortunately recovering the database with the now correct encoded characters using the backup restore function in cpanel has been unsuccessful.
Here is my very basic elisp script:

(defun encode ()
(interactive)

(beginning-of-buffer)
(save-excursion (query-replace-regexp "ü" "ü"))
(save-excursion (query-replace-regexp "ä" "ä"))
(save-excursion (query-replace-regexp "ö" "ö"))
(save-excursion (query-replace-regexp "Ö" "Ö"))
(save-excursion (query-replace-regexp "ß" "ß"))
(save-excursion (query-replace-regexp "Ã " "à"))
(save-excursion (query-replace-regexp "á" "á"))
(save-excursion (query-replace-regexp "â" "â"))
(save-excursion (query-replace-regexp "ã" "ã"))
(save-excursion (query-replace-regexp "ù" "ù"))
(save-excursion (query-replace-regexp "ú" "ú"))
(save-excursion (query-replace-regexp "û" "û"))
(save-excursion (query-replace-regexp "Ù" "Ù"))
(save-excursion (query-replace-regexp "Ú" "Ú"))
(save-excursion (query-replace-regexp "Û" "Û"))
(save-excursion (query-replace-regexp "Ãœ" "Ü"))
(save-excursion (query-replace-regexp "ò" "ò"))
(save-excursion (query-replace-regexp "ó" "ó"))
(save-excursion (query-replace-regexp "ô" "ô"))
(save-excursion (query-replace-regexp "è" "è"))
(save-excursion (query-replace-regexp "é" "é"))
(save-excursion (query-replace-regexp "ê" "ê"))
(save-excursion (query-replace-regexp "ë" "ë"))
(save-excursion (query-replace-regexp "À" "À"))
(save-excursion (query-replace-regexp "Á" "Á"))
(save-excursion (query-replace-regexp "Â" "Â"))
(save-excursion (query-replace-regexp "Ã" "Ã"))
(save-excursion (query-replace-regexp "Ä" "Ä"))
(save-excursion (query-replace-regexp "Ã…" "Å"))
(save-excursion (query-replace-regexp "Ç" "Ç"))
(save-excursion (query-replace-regexp "È" "È"))
(save-excursion (query-replace-regexp "É" "É"))
(save-excursion (query-replace-regexp "Ê" "Ê"))
(save-excursion (query-replace-regexp "Ë" "Ë"))
(save-excursion (query-replace-regexp "ÃŒ" "Ì"))
(save-excursion (query-replace-regexp "Í" "Í"))
(save-excursion (query-replace-regexp "ÃŽ" "Î"))
(save-excursion (query-replace-regexp "Ï" "Ï"))
(save-excursion (query-replace-regexp "Ñ" "Ñ"))
(save-excursion (query-replace-regexp "Ã’" "Ò"))
(save-excursion (query-replace-regexp "Ó" "Ó"))
(save-excursion (query-replace-regexp "Ô" "Ô"))
(save-excursion (query-replace-regexp "Õ" "Õ"))
(save-excursion (query-replace-regexp "Ø" "Ø"))
(save-excursion (query-replace-regexp "Ã¥" "å"))
(save-excursion (query-replace-regexp "æ" "æ"))
(save-excursion (query-replace-regexp "ç" "ç"))
(save-excursion (query-replace-regexp "ì" "ì"))
(save-excursion (query-replace-regexp "í" "í"))
(save-excursion (query-replace-regexp "î" "î"))
(save-excursion (query-replace-regexp "ï" "ï"))
(save-excursion (query-replace-regexp "ð" "ð"))
(save-excursion (query-replace-regexp "ñ" "ñ"))
(save-excursion (query-replace-regexp "õ" "õ"))
(save-excursion (query-replace-regexp "ø" "ø"))
(save-excursion (query-replace-regexp "ý" "ý"))
(save-excursion (query-replace-regexp "ÿ" "ÿ"))
(save-excursion (query-replace-regexp "€" "€"))

)

Any hints and suggestions would be greatly appreciated!

Best regards,
Michael

Hi @adm_sub,

This doesn’t sound like exactly the same issue you’re facing, but I wonder if the solution for fixing the character encoding problems from this post may be helpful.

Regards,

Erik
PKP Team

Hi @ewhanson,

please excuse my late reply.

I finally managed to convert the latin1 characters in our database to UTF-8 characters. I did not use the method mentioned above, but instead used the following SQL command and executed it for each affected table in the database.

SQL command for the table email_templates_default_data:

UPDATE email_templates_default_data SET 
    subject=convert(cast(convert(subject using  latin1) as binary) using utf8),
    body=convert(cast(convert(body using  latin1) as binary) using utf8),
    description=convert(cast(convert(description using  latin1) as binary) using utf8) 
WHERE 1

This method took a few minutes, but was effective and did not produce any other errors.

Kind regards,
Michael

1 Like

This topic was automatically closed after 5 days. New replies are no longer allowed.