Charset encode confusion

I have a server running OCS 2.3.6 in portuguese (pt_BR). In the production site, all characters are being displayed in the right way, there is no problem, but when I try to get a report in ocs, there are some strange characters like “ÇÃO”. I think they’re being displayed as Latin1.

My config.inc.php shows:
client_charset = utf-8
connection_charset = Off
database_charset = Off
charset_normalization = On

The collation of my DB seems to be UTF-8, but the data inside is displayed as Latin~1.

I know it is strange but I think we changed some configuration about that once, and maybe we made a mistake even though we didn’t realised at the time.

I need to be able to print the report with the correct charset. How would be the best way to fix that?

Thanks in advance.

Hi @Israel87,

I think this’ll depend on your PHP and MySQL configuration, but usually when connection_charset and database_charset are set to Off the data will be stored in Latin1, regardless of the database’s collation. They generally should be…

connection_charset = utf8
database_charset = utf8

(Note that client_charset is utf-8 including a hyphen – this is intentional.)

If you change these and find that your OCS website is now spitting out weird encodings, it’s because the database contained invalid double-encodings and needs to be dumped, transcoded, and reloaded. A tool like iconv should be able to help with this. Beware of operating with a new configuration without cleanly transcoding your database – that’ll result in a mixture of encodings that’s very hard to clean up.

Regards,
Alec Smecher
Public Knowledge Project Team

I still couldnt fix the way my DB is storing the data. I have changed the config.inc.php to:

client_charset = utf-8
connection_charset = utf8
database_charset = utf8
charset_normalization = On

When I run “show variables like ‘character%’;” and “show variables like ‘collation%’;”, Mysql shows:

character_set_client, utf8
character_set_connection, utf8
character_set_database, utf8
character_set_filesystem, binary
character_set_results, utf8
character_set_server, utf8
character_set_system, utf8
character_sets_dir, /usr/share/mysql/charsets/

collation_connection, utf8_general_ci
collation_database, utf8_general_ci
collation_server, utf8_general_ci

The collations of the fields are utf8_general_ci. The dump of the DB shows that the default charset of the tables are utf8.

but the data from OCS is still being stored as latin1. I already cleared the templates and data but it didn’t change anything. The only thing I didn’t do, as some resources I searched stated, is to put “SET NAMES utf8” before each connection to the DB. Is this really necessary?

Is there anything I am missing?

Hi @Israel87,

I think your configuration is now correct, but the data is already in the database in the wrong encoding. I suspect you’ll have to run a database dump through iconv (or equivalent) and reload it to the DB again in order to fix it. Try searching this forum for iconv for further details.

Regards,
Alec Smecher
Public Knowledge Project Team