OAI validation and problem with utf8conditioner

Hello,
I understand something is wrong with UTF8 in XML files on the OJS install, but if I look for utf8conditioner to check for more details, the site is not working anymore and searching Google does not help.
Can you please advise reparatory steps to somebody who is not PHP coder?
Thank you

REQUEST https://ojs.journals.cz/?verb=Identify GET
WARN Malformed response: not well-formed (invalid token) at line 131, column 91, byte 6721 at /usr/lib64/perl5/vendor_perl/XML/Parser.pm line 187. . The most common reason for malformed responses is illegal bytes in UTF-8 streams (e.g. the inclusion of Latin1 characters with codes>127 without creating proper UTF-8 mutli-byte sequences). You might find the utf8conditioner, found on the OAI tools page helpful for debugging.

FAIL Failed to parse Identify response
FAIL ABORT: Failed to parse Identify response from server at base URL ‘https://ojs.journals.cz/’.
The OAI-PMH data provider with base URL https://ojs.journals.cz/ has failed initial validation. Problems reported must be corrected before validation can continue.

Sun Jan 27 18:06:01 2019

We have the same problem. When I try to validate one of our 150 journals on this site - OAI-PMH Data Provider Validation and Registration - I get the same answer:
Malformed response: mismatched tag at line 20, column 2, byte 1405 at /usr/lib64/perl5/vendor_perl/XML/Parser.pm line 187. . The most common reason for malformed responses is illegal bytes in UTF-8 streams (e.g. the inclusion of Latin1 characters with codes>127 without creating proper UTF-8 mutli-byte sequences). You might find the utf8conditioner, found on the OAI tools page helpful for debugging.
I too would like some advise
Best
Niels Erik

Hi @nef,

The underlying toolset that your OJS installation relies on (MySQL, PHP, etc) is pretty lenient about invalid UTF8 characters – which means that when the data goes into an environment that is stringent, you’ll encounter errors there even if the data is invalid upstream. The best solution will be to find and correct the invalid data. These usually arise from imports, database administration, etc.

I’d suggest examining your OAI data to see if you can find where the bad data is appearing, and then tracking it from there into one of your database tables. If you can identify the bad content by looking at the OAI interface, but aren’t sure where that data exists in the database, I can potentially help.

Regards,
Alec Smecher
Public Knowledge Project Team