I recently realized that in multiple of my OJS journals (all on 3.3.0-13) the fulltext search is not working at all. Metadata can be found, however.
In two of these journals, I checked the config.inc.php and made sure that the indexing of PDF files is not commented. I also checked that that the server gives the appropriate output when calling the pdftotext command given in the config.inc.php . Furthermore, I checked that the PDFs contain extractable text. All is set!
Then I ran php tools/rebuildSearchIndex.php and the process takes ages, because there are 2.500+ articles in one of the journals. Still, when searching specifically for a keyword only in the PDFs, I get an empty result. I also cleared the template and the data cache in the administration. Still the same result.
When I check the database of one of the journals for the search term that is in the full text, I get an empty result back:
select * from submission_search_keyword_list where keyword_text like "%Sukzessionsstadien%";
Empty set (0.01 sec)
A query for a keyword in the metadata, however, gives the expected result:
select * from submission_search_keyword_list where keyword_text like "%Bremen%";
| keyword_id | keyword_text |
| 277326 | -bremen |
| 247427 | bremen |
| 248205 | bremens |
3 rows in set (0.02 sec)
Are there any more precautions, I have to take that the search is working?
As luck would have it, I possibly have a workaround for you. Please try this change. You will need to edit the following file and change the four occurrences of PKPString::regexp_replace with a generic call to preg_replace
The issue might that the former method expects correct UTF8 data and the server tools that generate the file stream may not be providing it. You’ll need to reindex your content after making this change with:
Will PKPString::regex_replace be updated in future releases of the 3.3 LTS version to work as intended with non-UTF-8 server tools? My fear is that with the next OJS update this modification in the code will be gone on my side and the search will not index new PDFs appropriately. Could you tell me on your plans?