Charset problems (ISO-8859-1 x UTF-8)

Hello all,

I started administering an OJS portal that has been without a lot of upgrades and maintenance for a long time. As a result, the system has problems with accent characters, typical of incorrect charset configuration.

The portal runs on FreeBSD whose default charset is ISO8859-1 and mysql tables are as ISO-8859-1.
PHP also has the ISO-8859-1 standard charset.

Today config.inc.php is as follows:

locale = pt_BR
client_charset = iso-8859-1
connection_charset = iso-8859-1
database_charset = iso-8859-1
charset_normalization = Off

OJS pages are incorrectly accented but publications are not.
I believe the database must have records in utf and iso-88591 because these values have changed several times.
How can I get this right? Should I migrate everything to utf-8 and recode the database?

Very soon I will need to migrate this site to another server that runs debian OS and keeps as default charset UTF-8.

Regards,

Renato L. Sousa

Hi @rensousa,

When you say that publications are not showing correctly, can you describe what you mean?

Thanks,
Alec Smecher
Public Knowledge Project Team

hi @asmecher,

I refer to the display of the accentuation of the Portuguese language - Brazil (pt-BR).
I was able to fix it with the setting below:

locale = pt-BR
client_charset = utf-8
connection_charset = utf8
database_charset = utf8
charset_normalization = On

It took me a while to realize that I needed to clear data cache to apply the settings.

Regards,

Renato

1 Like

Hi guys,
We are facing issues with character encoding when try to upgrade from 3.1.1.2 to 3.1.2.1. When we make a copy of the database of 3.1.1.2 and run OJS 3.1.2.1 with it, even before running the upgrade, all the spanish special characters like ´ or ñ are replaced by ó or Ñ. On the 3.1.1.2 it works fine. Did you change something respecting this on the upgrade? Anyone is facing similar issues?

We’ve tried every combination in configuration, even clearing cache files each time, but none of them works. This is the config.inc.php setting:

locale = es_ES
client_charset = utf-8
connection_charset = utf-8
database_charset = utf-8
charset_normalization = utf-8

Here you can see what I say: http://icono14.net/ojs-3121/index.php/icono14/index

It is taking us a lot of time and we aren’t been able to upgrade the journal. Could you please help us?

Thanks
Daniel Becerra
ICONO14

Hi @celuloide,

Did you attempt to correct a utf8 to a utf-8? Look at e.g. Charset problems (ISO-8859-1 x UTF-8) – the inconsistencies are important! Different libraries that OJS uses depend on UTF8 being written in different ways.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
Yes I did! I’ve tried every single combination, even clearing cache on between.
On our OJS 3.1.1-2 this setup works well…
imagen

Thanks for your help,

Daniel Becerra
ICONO14

Hi @celuloide,

The above settings posted by @rensousa are correct. You have invalid settings for both connection_charset and database_charset. The charset_normalization setting has been removed so it’s not doing anything.

The invalid settings are going to the third-party ADODB library; I’m not sure what its behaviour is when it gets a setting it doesn’t understand, but at a guess, I suspect it connect using the database default character set.

I would suggest taking a complete backup before you tinker with character sets, since it’s really easy to mix two configurations together by experimenting with this, but very hard to resolve that once it’s happened. Consistency is key.

If you set everything as it’s supposed to be, but you’re still seeing garbled characters like ó, then it’s likely that the database is incorrectly encoded in the database. This is more of a database management issue than an OJS issue, so you might have better luck looking e.g. on Stackoverflow.com – or maybe try a tool like ftfy.

If you use the configuration recommended above, and your database contents are correctly encoded, then everything should work – if not, it’ll be one of the two problems.

Regards,
Alec Smecher
Public Knowledge Project Team

Thank you @asmecher,
It is very possible our database have contents with wrong encoding, but the same database in another server with the settings I posted before looks to work fine: http://icono14.dysing.es/ojs/index.php/icono14/index
However, if I put the @rensousa settings, mojibakes shows up. Does this mean something for you?

Regards,

Daniel Becerra
ICONO14

Hi @celuloide,

I would suggest looking at the process you’re using to move the database between servers – maybe a missing DEFAULT CHARACTER SET utf8 clause on the CREATE DATABASE statement, or a missing --default-character-set parameter on mysqldump (off the top of my head)? If you can identify one of the settings that’s causing you grief e.g. in the journal_settings table, one way to compare the contents between the two to ensure it’s the same is to call the SQL LENGTH function on it from each – that may help determine whether the SQL contents have gotten garbled during the transfer.

Just to re-iterate, if you have utf-8 where you should have utf8, that’s wrong and may cause problems – but if you have the same mistake consistently between OJS2 and OJS3, they should both behave the same.

Regards,
Alec Smecher
Public Knowledge Project Team