Cyrillic text - unreadable in DB, front-end look and indexing process is OK

Hello all.
I am sorry for question which might be stupid. I have searched docs and forum but have not found an exact answer. I need rather advise than urgent help.
My main concern (and actual question) is how dangerous the situation (below) is?:
So, description:
We use OJS since 2012 (https://ojs.tdmu.edu.ua/ ) Current version is 3.2.1.2
Everything look like and working OK.
However during preparing of update to OJS 3.3 I payed some more attention to actual representation of content in DB. And found that ALL cyrillic text in DB is unreadable.
Example of article ВИВЧЕННЯ УРБАНІСТИЧНОЇ ЛЕКСИКИ ТА ПОНЯТТЯ ЛОКУСУ НА ЗАНЯТТЯХ ІЗ ДИСЦИПЛІНИ «УКРАЇНСЬКА МОВА ЯК ІНОЗЕМНА» (НА ПРИКЛАДІ ТЕМИ «У МІСТІ. УРОК 1») | Медична освіта
In DB:
s11674-p16414-DB
On front-end:
s11674-p16414-art
GoogleScholar and CrossRef indexing working OK as well.
Current DB is MySQL 5.7, collation is utf8_unicode_ci , engine - InnoDB
Settings in config.php
collation = utf8_unicode_ci
client_charset = utf-8
connection_charset = utf-8
Detail examination of DB show me that some tables and/or fields has different collations:
utf8_general_ci or even latin1_shvedish
On my dev server I tried to use recipe
ojs-tools/fix-database-encoding.md at master · kaitlinnewson/ojs-tools · GitHub
I managed to convert entire DB to utf8_unicode_ci but nothing changes - front-end looks OK, text in DB is unreadable.
My main concern (and actual question) is how dangerous such situation is? Any risks? What should I expect?

Hi @semtecher,

Indeed, this can be a problem. I’m assuming that if this is decoded in the right way, it’s not crucial and still can be converted into utf8. OJS 3.3 is stricter in terms of managing data and during upgrade it can lead to a fatal error regarding mixed collations.

1 Like

Hello @Vitaliy Thank you for attention.
Update to 3.3 is my main concern.
We use virtualization + backups + extra dev env. So it is not a problem to make experiments (except time)
Actually I made one (described above) - but nothing changes - front-end looks OK, text in DB is unreadable (during direct access via console)

The symbols on the screenshot seem to look like cyrillic characters encoded in latin1, although I’m not an expert in encodings. Maybe it’s a double encoding.

If it’s not something related to problems displaying utf8 on the machine’s console, and your inspection of the tables shows that indeed it’s a problem with a database collation and encoding, then upgrade problems are likely. I can foresee several places where errors are possible when upgrading to 3.3.0

I’d suggest making a test upgrade even if all looks normal, as I always do irrespective of the situation. It helps to prepare for the upgrade on the production instance.

I’d also look at similar issues described on SO, e.g.: https://stackoverflow.com/a/38363567/6711224

1 Like

@Vitaliy thank you for advises.
Definitely - problem not in console, UA/RU locales installed…
Soon I will prepare new dev server and will try to upgrade to 3.3. Thank you for link. Will check and try to deal with this as well.
Once I have restored DB from backup on Linux - and it was OK (Frontend and DB - as on pictures above). In contrast - any attempt to restore DB on Windows (my local PC) have as a result - unreadable Cyrillic text ON FRONTEND as well (!).
This is one more reason why I have started for this topic.
In contrast - I does not have such issues with DB of other systems I managing (LMS Moodle for example)