Strange Characters in abstracts since upgrading to OJS 3.2.1.1 (Bootstrap)

I am seeing some strrange characters since upgrading. They were definitely not there before. For example:
“mathematisingâ€
We can’t
how to produce ‘validity from within’

and worse in abstracts in foreign languages.

What can I do?

Hi @gail,

Your character set configuration probably got changed when you upgraded – see the client_charset and connection_charset options in config.inc.php. As a result OJS is seeing the data in a different way, resulting in garbled encodings. Did the settings change when you upgraded?

Regards,
Alec Smecher
Public Knowledge Project Team

Here are the CURRENT AND OLD settings:

CURRENT

;;;;;;;;;;;;;;;;;;;;;;;;;
; Localization Settings ;
;;;;;;;;;;;;;;;;;;;;;;;;;

[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = utf-8

; Database connection character set
; Must be set to “Off” if not supported by the database server
; If enabled, must be the same character set as “client_charset”
; (although the actual name may differ slightly depending on the server)
connection_charset = utf8

OLD:
;;;;;;;;;;;;;;;;;;;;;;;;;
; Localization Settings ;
;;;;;;;;;;;;;;;;;;;;;;;;;

[i18n]

; Default locale
locale = en_US

; Client output/input character set
client_charset = utf-8

; Database connection character set
; Must be set to “Off” if not supported by the database server
; If enabled, must be the same character set as “client_charset”
; (although the actual name may differ slightly depending on the server)
connection_charset = Off

; Database storage character set
; Must be set to “Off” if not supported by the database server
database_charset = Off

; Enable character normalization to utf-8 (recommended)
; If disabled, strings will be passed through in their native encoding
; Note that client_charset and database collation must be set
; to “utf-8” for this to work, as characters are stored in utf-8
charset_normalization = Off

I changed the ; Database connection character set to off and that seems to have fixed it. Thanks, Alex.

Hi @gail,

Good, your new and old OJS settings are now consistent and the system should continue to function as it did. You’re probably double-encoding UTF8 characters in your database, however, and might want to look into transcoding it at some point in the future. There are some threads in this forum about it, and also on e.g. stackoverflow.com.

Regards,
Alec Smecher
Public Knowledge Project Team

Why when I corrected the below to Off I got locked out the system and had to reinstall a backup. It was very tiresome. How can I get rid of the strange characters without this problem?

; Database connection character set
; Must be set to “Off” if not supported by the database server
; If enabled, must be the same character set as “client_charset”
; (although the actual name may differ slightly depending on the server)
connection_charset = Off

Hi @gail,

Can you describe more specifically what you mean by “locked out”?

Regards,
Alec Smecher
Public Knowledge Project Team

I could only access the basic dashboard but no longer access submissions or edit the website. It wasn’t just a matter of restoring the config.inc.php file. It resulted in me having to reinstall the whole website to rectify the problems.

Hi @gail,

What were you able to ascertain about it? Were there any logged error messages, either in the browser or on the server side (PHP log)?

Regards,
Alec Smecher
Public Knowledge Project Team

No I can’t remember and I dare not try to recreate them! It is a systemic problem as the issue occurs on a number of different pages for example international advisory board, referencing guidelines and some abstracts (not in the pdfs). It affects writing in other languages as well as some other punctuation marks in English.These were not there before an OJS upgrade so I welcome help on how to fix this.

Hi @gail,

Unfortunately I can’t debug without further information – if by some chance you do encounter the problem again, I can help work through it.

Regards,
Alec Smecher
Public Knowledge Project Team

I have just copied some abstracts back into the landing page for the articles. In one I replaced the English and Greek abstracts but the strange characters remain. For example: Too Many Mind | Murmurations: Journal of Transformative Systemic Practice
I unpublished the paper, saved and republished, emptied cache and reloaded. What can I check or enable to see why these problems persist?

Hi @gail,

There are quite a few directions discussed in this thread and now some upgrade history on your journal, and it’s tough to get an accurate sense of it all remotely – and there’s a risk that things are getting more complicated if your new and old data don’t agree on a consistent character encoding. I’d recommend picking a goal and working towards it.

What we recommend for character set configuration is a normal UTF8 configuration:

client_charset = utf-8
connection_charset = utf8

(In OJS 3.1.2 and older, there used to be a database_charset setting as well. Starting with 3.2.x, this is no longer used and can be ignored.)

In MySQL, your database should also be set up to use UTF8. On mine, it reports:

MariaDB [ojs-stable]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.003 sec)

I achieve this when creating a database by using a DEFAULT CHARACTER SET clause on the CREATE DATABASE statement, e.g.: CREATE DATABASE my_database_name DEFAULT CHARACTER SET utf8;

Having everything set up properly for UTF-8, you may still have problems with accented characters if your existing data is not properly encoded. Dealing with this can be a pain – you’ll need to convert (transcode) your database contents. You might find guidance on that on Stackoverflow.com.

It’ll be tempting to just experiment with settings to see if they work – which you can do, of course, but be careful not to mix different encodings into your database, or it’ll be a lot tougher to disentangle.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi Alex, yes, the client_charset = utf-8
connection_charset = utf8 are correct in the config.inc file.

But I don’t think I am looking in the right place for the next part. I went in to MYSQL, found the correct database but it only offered check or repair. In check, it is just a list without any similar content to what you mentioned.
Thanks
Gail (still keeping going!)