[OJS 3] Full text search not working

Full text search doesn’t appear to be working in OJS 3.0.

I’ve ensured that

index[application/pdf] = "/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"

is uncommented in config.inc.php, and I’ve run php tools/rebuildSearchIndex.php in conjunction with rm -rf cache/_db/* to rebuild the index and clean the cache.

I’ve confirmed that I have pdftotext available and that the command runs fine when run manually.

When I enter a unique word/phrase from a PDF, I get no results. This is with an unmodified/default OJS installation.

Is there something I’m missing, or is this a known issue?

Jacob

Hi @kardeiz,

Make sure you flush your data cache or search for a term that’s not been searched for within 24 hours. OJS uses cached queries for keyword searches.

Regards,
Alec Smecher
Public Knowledge Project Team

Thanks for the reply. As noted above, I ran rm -rf cache/_db/* and I just now tried deleting all the other cache components. I also tried some new terms, but still seeing No Results.

Hi @kardeiz,

I would suggest exploring the full-text index in the database a little. Check to see that the keyword you’re searching for is in the submission_search_keyword_list table (the keyword_text column); look up the corresponding keyword_id, and look from there for the same keyword_id in the submission_search_object_keywords table. That’ll give you (using object_id) a list of entries in submission_search_objects that the keyword appear in.

Regards,
Alec Smecher
Public Knowledge Project Team

Alec, thanks for your continued support.

That table is empty except for words from title, author, abstract, and type (?).

I just retried everything from scratch and still encounter the issue.

Steps to replicate:

  1. Download OJS 3.0.0 from https://pkp.sfu.ca/ojs/ojs_download/.
  2. Uncomment line in config.inc.php with command for using pdftotext for application/pdf.
  3. Create new database and accessible files folder.
  4. Launch php server with php -S localhost:3000 in directory.
  5. Follow install steps to define database etc.
  6. Create new journal and issue (only completing required fields).
  7. Create new article. I uploaded the PDF in the submission stage (probably not necessary).
  8. As admin, go to article’s production tab. Add PDF as Galley and schedule for publication.
  9. Ensure issue is published.
  10. Search for unique term produces no results; MySQL query on submission_search_keyword_list produces no relevant rows.

Am I missing something?

Also, I noticed that full text search does not appear to be working at http://journals.sfu.ca/present3/index.php/boot3/, but that could be because that journal isn’t set to full text index.

Hi @kardeiz,

Does using tools/rebuildSearchIndex.php result in the database getting populated?

Regards,
Alec Smecher
Public Knowledge Project Team

No it does not.

Thanks!

Did you confirm that this command runs as the web user from the command line? Sometimes filesystem permissions mean that you can run the command, but apache or nobody or www cannot.

In my followup testing, I’ve been serving the site using the php dev server, which runs as me (and I can run pdftotext).

Thanks!

Hi @kardeiz,

I was able to reproduce the problem locally and tracked it down to the use of a wrong constant – see Full-text indexing not working · Issue #2079 · pkp/pkp-lib · GitHub for a link to a patch (both links are equivalent, just applied to different branches). Can you apply that patch, rebuild the search index, flush the data cache, and test again?

Regards,
Alec Smecher
Public Knowledge Project Team

Awesome! I can confirm that that fix does resolve my issue. Thanks for the quick response.

Hi @kardeiz,

Thanks for helping debug! This will be released in the next OJS 3.x.

Regards,
Alec Smecher
Public Knowledge Project Team