we have the newest version of OJS installed and are trying to get the full-text search running. We already took care of the following steps:
- enable section for pdftotext in
- test pdftotext with galley file
- clear cache with
rm -rf cache/_db/*
- also searched database for specific words that only appear in PDFs (no result)
Running rebuildSearchIndex.php gives us the following errors, but also results in a message that all articles were indexed:
PHP Warning: Declaration of PKPUsageEventPlugin::getEnabled() should be compatible with LazyLoadPlugin::getEnabled($contextId = NULL) in /srv/www/.../html/lib/pkp/plugins/generic/usageEvent/PKPUsageEventPlugin.inc.php on line 386
PHP Warning: Declaration of SubmissionFileDAO::fromRow($row) should be compatible with PKPSubmissionFileDAO::fromRow($row, $fileImplementation) in /srv/www/.../html/classes/article/SubmissionFileDAO.inc.php on line 23
Do you have any advice on what we could try next to get the search running?
Thanks in advance for your help.
@ojsbsb, were you able to get past this? I am having the same problem.
Check that, so I was initially using
index[application/pdf] = “/usr/bin/pstotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”
to index the pdfs and was not getting any results. Once I switched to:
index[application/pdf] = “/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”
things looked much better.
Thanks for your reply, @jbutler! We were using pdftotext from the beginning, but I would agree it is confusing why pstotext is mentioned in config.inc.php as first example for pdf indexing tools, since there is a PostScript section below that.
So your indexing now works completely? On re-visiting our issue, it seems like
- metadata is indexed from all articles
- full-text seems only to be indexed from the most recent issue
May I ask what version of OJS you are using, and what version of database and poppler-tools (the package pdftotext usually is installed with)?
Hi, yes indexing looks good now. That is interesting it’s grabbing the full text from the most recent issue. Maybe a permissions issue…
Debian, currently running ojs 126.96.36.199 | poppler-utils 0.26.5-2 | poppler-data 0.4.7-1