Garbage being inserted in submission_search_keyword_list

Hello,

This week I’m running rebuildSearchIndex on a test version of OJS-3.3.0-14, and noticed that several strange keywords are being inserted in submission_search_keyword_list. It looks like this:

mysql> select * from submission_search_keyword_list where length(keyword_text) = 40;
+------------+------------------------------------------+
| keyword_id | keyword_text                             |
+------------+------------------------------------------+
|    3425799 | 002f6abpnnnnmumznidmmn2mmki1pwgi6dnr9jko |
|    3584507 | 003s7jmpzi1pirvyrqjidvv2fjeveljc9rmgp8wp |
|    3440779 | 009a1rbqrvntltepjwnza2tdq22jfvt8nevb7nds |
|    3507587 | 00b9yrf7u4uitdqf4no1piclefnszhifpibfoq2j |
|    3583888 | 00bdd31o1vc9teza8u6v2ijp8uazp06jves01wh5 |
|    3418260 | 00fswnv8wje3iujagbzhft2rtbi86wqhyo6ouoxi |
|    3365819 | 00kwv9vp568ovv7moylwtojd3n8jtkfztj797fll |
|    3489429 | 00luoriceyg2hhpddi3ejonmndfn8rm86b85xok1 |
|    3569491 | 00mokgtxkhcbletaiginki0utmyktputkac6zxlh |
|    3531000 | 00o3iexe3qfz3hh2bjmlyabjqvp8kor31oltgs67 |
|    3350769 | 00pruda1iguwnpsyzdael5ievuxmcm08wvncf9td |
|    3491861 | 00szqlexdgagqm25jqru0nzhdfnlusogf0b9ii9w |
|    3429041 | 00tpv48msooe7qibcn6kh300rvbdmqfreevk5cob |
|    3432865 | 00vhyx21lpc0ojssmvexwp2igtegn3hf9wrndnod |
|    3375869 | 00wdakpswveypgksyckecxk2ogeo4aktb49wwyvc |
|    3490502 | 00x9kz85kks1pd5fdqqqka0cjnrjbjsgvqtniysr |
|    3248853 | 01032018-edital-6-2018esidencia-pedagogi |
|    3549473 | 011rwqzlliu2yorljiuhyrspyvrmu7pvjlvuu1ir |
|    3584201 | 019mmapnqjnrddadzsyz1ayvmc1wm1xmmspnsjlg |
|    3506055 | 019ob7a32tnuvpwgftfqttuu3eru1mru3stbzzre |
|    3568287 | 01bzcvmqpwavbj7uatbvpigvtd09yx27iwz1so1l |
|    3413995 | 01cephr0icyx0umslolxbikn7p3dofrgt9k3b94k |
|    3409035 | 01dbrpeqvrmrtule3xxnhxt1cjw3b2z2rao64iil |
|    3307423 | 01dgjkqnnejq4nmkhmhgc7u7ucdayqymqgrzdutb |
|    3567918 | 01gsuc7uiieegkgkk8mumr3mqiqyzcw4812zldnj |
|    3554814 | 01hgtnunknbnfnombgemgqkg8n19xg0wmw0ixskx |

And there are almost 90.000 records like this so far. I suspect this comes from faulty PDFs, that we surely have. The output of rebuildSearchIndex is now arout 2700 lines long and shows several errors.

Is there a way to run rebuildSearchIndex so it shows which files are defective? O ran a script that uses pdftotext to check the files, bit I’m not sure it got all the broken ones.

If not broken files, what else could cause this type of values to be inserted in submission_search_keyword_list?

Regards,

Oberdan