Authors names with umlaut characters are not being displayed?

All - looking for suggestions on how to solve this problem. If you need more information please let us know.

Thank you
We have OCS version 2.3.5.0 running on Debian7 PHP 5 Apache2 MySql

Our OCS admin sent this email descriping the problem.
I’ve received a report that the special characters in authors’ names (ä, ë, ï, ü, õ, é, ã, etc.) are not allowing some author’s names to be displayed.

For example, the fourth paper in https://mbgocs.mobot.org/index.php/tdwg/2012/schedConf/presentations should be:

The BioVeL Data Refinement Workflow for Occurrence Data

Anton Güntsch, Cherian Mathew, Vera Hernandez Ernst, Matthias Obst, Sarah Bourlat, Alan R Williams, Yde de Jong, Alex Hardisty

but is showing as:

The BioVeL Data Refinement Workflow for Occurrence Data

, Cherian Mathew, Vera Hernandez Ernst, Matthias Obst, Sarah Bourlat, Alan R Williams, Yde de Jong, Alex Hardisty

because the last name of the first author has an u-umlaut character.

When I review the system info, it shows:
i18n
locale en_US
client_charset utf-8
connection_charset Off
database_charset Off
charset_normalization On

Debian returns this output.
locale -a
C
C.UTF-8
en_US.utf8
POSIX

Hi @Mike,

The first thing I’d suggest is disabling charset_normalization. This was introduced back when UTF-8 support in various parts of the ecosystem were spotty, and they’ve gotten reliable since then.

Generally speaking it’s best to have connection_charset and database_charset set to utf8. (Note: client_charset should be utf-8, but connection_charset and database_charset would be utf8 without the dash – that’s intentional!) However, if your database contents are not properly UTF8 encoded, you might disrupt a working but weird configuration in a way that would require a database transcoding (using tools like iconv or ftfy) to correct.

Thanks,
Alec Smecher
Public Knowledge Project Team

Alec,

Thank you for the detailed explanation. It is much appreciated.

Mike