In short: OJS 2 searches PDF files well, and OJS 3 has problems.
Details:
On the same server I have OJS 2.4.8.5 and 3.1.2.4 installed for testing.
I have the same article in both OJS versions. I added the article using the QuickSubmit plugin.
Both OJS versions use the same “pstotext” and “pdftotext” files.
Fragments of the config.inc.php file (identical in both OJS versions):
OJS 2 indexed the test article well and finds the words in this article well.
OJS 3 has partially indexed the test article and finds only some of the words in this article.
I would suggest running the PDF text extraction tool manually on your PDF to see what text is being extracted. Take the command from the configuration file and replace the %s with the path and filename to the PDF you expect to see indexed.
Depending on how your PDFs are being generated, the tool may have a hard time extracting text from it.
Regards,
Alec Smecher
Public Knowledge Project Team
The server administrator performed the tests.
The pdftotext tool processed a PDF file with the article I wrote about in the first post.
The administrator performed a test on a file in the OJS 2 directory and on a file in the OJS 3 directory.
In both cases an identical TXT file was created (both files had the same md5 checksum).
One of these files can be downloaded from here:
The result is the same as on my server: OJS 2 indexes all words, and OJS 3 indexes only some words.
I think this test shows that the problem is not related to my server, my installation or my configuration.