Rebuilding search index - PDF characters

I just discovered rebuild fails when in the PDF file there are characters like these:

π’‚π’π’Šπ’Žπ’‚π’π’†π’”

I have this configuration in config.inc.php in OJS 3.3.0-15:

index[application/pdf] = β€œ/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr β€˜[:cntrl:]’ ’ '”

But rebuilding search index still fails.

What can I do?

Hi @lcmartinezru,

Just a heads up that I moved your post to a new post here, as it was distinct from the post you posted under. If you have a similar, but separate issue it is best to create a new forum post and not publish under a previous post.

Best regards,

Roger
PKP Team

Hello Roger,

Actually I posted it there because it generated a similar error when indexing, and I could identify it ocurred when trying to index the word π’‚π’π’Šπ’Žπ’‚π’π’†π’” from the PDF file.

PHP Fatal error: Uncaught PDOException: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: β€˜\xF0\x9D\x92\x82\xF0\x9D…’ for column revistas.submission_search_keyword_list.keyword_text at row 1 in /var/www/html/lib/pkp/lib/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:119
Stack trace:
#0 /var/www/html/lib/pkp/lib/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php(119): PDOStatement->execute(NULL)
#1 /var/www/html/lib/pkp/lib/vendor/laravel/framework/src/Illuminate/Database/Connection.php(489): Doctrine\DBAL\Driver\PDOStatement->execute()
#2 /var/www/html/lib/pkp/lib/vendor/laravel/framework/src/Illuminate/Database/Connection.php(664): Illuminate\Database\Connection->Illuminate\Database{closure}(β€˜INSERT INTO sub…’, Array)
#3 /var/www/html/lib/pkp/lib/vendor/laravel/framework/src/Illuminate/Database/Connection.php(631): Illuminate\Database\Connection->runQueryCallback(β€˜INSERT INTO sub…’, Array, Object(Closure))
#4 /var/www/html/lib/pkp/lib/vendor/laravel/framework/src/ in /var/www/html/lib/pkp/lib/vendor/laravel/framework/src/Illuminate/Database/Connection.php on line 671

Now I’m trying to rebuild search index without the PDF extraction tool to see what happens.