[OJS 3.1.1-2] Issue with full-text search


we have the newest version of OJS installed and are trying to get the full-text search running. We already took care of the following steps:

  • enable section for pdftotext in config.inc.php
  • test pdftotext with galley file
  • run php tools/rebuildSearchIndex.php
  • clear cache with rm -rf cache/_db/*
  • also searched database for specific words that only appear in PDFs (no result)

Running rebuildSearchIndex.php gives us the following errors, but also results in a message that all articles were indexed:

PHP Warning:  Declaration of PKPUsageEventPlugin::getEnabled() should be compatible with LazyLoadPlugin::getEnabled($contextId = NULL) in /srv/www/.../html/lib/pkp/plugins/generic/usageEvent/PKPUsageEventPlugin.inc.php on line 386
PHP Warning:  Declaration of SubmissionFileDAO::fromRow($row) should be compatible with PKPSubmissionFileDAO::fromRow($row, $fileImplementation) in /srv/www/.../html/classes/article/SubmissionFileDAO.inc.php on line 23

Do you have any advice on what we could try next to get the search running?

Thanks in advance for your help.

@ojsbsb, were you able to get past this? I am having the same problem.

Check that, so I was initially using
index[application/pdf] = “/usr/bin/pstotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”
to index the pdfs and was not getting any results. Once I switched to:
index[application/pdf] = “/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”
things looked much better.

Thanks for your reply, @jbutler! We were using pdftotext from the beginning, but I would agree it is confusing why pstotext is mentioned in config.inc.php as first example for pdf indexing tools, since there is a PostScript section below that.

So your indexing now works completely? On re-visiting our issue, it seems like

  • metadata is indexed from all articles
  • full-text seems only to be indexed from the most recent issue

May I ask what version of OJS you are using, and what version of database and poppler-tools (the package pdftotext usually is installed with)?

Hi, yes indexing looks good now. That is interesting it’s grabbing the full text from the most recent issue. Maybe a permissions issue…
Debian, currently running ojs | poppler-utils 0.26.5-2 | poppler-data 0.4.7-1