OJS 3.1.2.4 - problem with rebuildSearchIndex

I want to rebuild the Search Index.

After running the command “tools/rebuildSearchIndex.php”, the following messages appeared:

user@wwwnew:/home/www/ojs3/html$ /usr/local/php_7.2.25/bin/php tools/rebuildSearchIndex.php

Clearing index … done

Indexing “Mechanik SC TEST” … PHP Warning: Declaration of SubmissionDisciplineEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL) should be compatible with ControlledVocabEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL, $filter = NULL) in /home/www/ojs3/html/lib/pkp/classes/submission/SubmissionDisciplineEntryDAO.inc.php on line 20

PHP Warning: Declaration of SubmissionSubjectEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL) should be compatible with ControlledVocabEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL, $filter = NULL) in /home/www/ojs3/html/lib/pkp/classes/submission/SubmissionSubjectEntryDAO.inc.php on line 44

4 articles indexed

Now “search” does not find the words in some articles that I added before rebuilding the indexes.

In the article that I added after rebuilding the indexes, “search” works fine.

All articles are PDF documents.

Regards
Wojtek

Additional information.

I checked the tables: submission_search_keyword_list, submission_search_objects and submission_search_object_keywords. In these tables, only a few words remain in the article that I added before rebuilding the indexes.
I checked that “search” found these words.

Regards
Wojtek

Hi @WSMH,

Do you have the PDF text extraction tools configured in your config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team

Yes, I have configured.
I have the following entries in the config.inc.php file:

index[application/pdf] = “/usr/bin/pstotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”

index[application/pdf] = “/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr ‘[:cntrl:]’ ’ '”

If I add a new article (QuickSubmit), it is indexed and “search” works fine.
After rebuilding the indexes, what I wrote in the first post happened.

Regards
Wojtek

New informations.

The server administrator has run “rebuildSearchIndex” with administrator privileges.
Now there were no error messages:

root@wwwnew:/home/www/ojs3/html# /usr/local/php_7.2.25/bin/php tools/rebuildSearchIndex.php

Clearing index … done

Indexing “Mechanik SC TEST” … 5 articles indexed

But some entries have been removed from the tables and now “search” does not find some words in the latest article (before rebuilding the indexes, the “search” function in this article worked correctly).

The number of entries in the tables before rebuilding the indexes:

submission_search_keyword_list - 1810
submission_search_objects - 41
submission_search_object_keywords - 5456

And after rebuilding the indexes:

submission_search_keyword_list - 1343
submission_search_objects - 41
submission_search_object_keywords - 3483

Before rebuilding the indexes, I noted 3 keywords: “Symantec”, “Endpoint” and “Protection”. All three were in the “submission_search_keyword_list” table and “search” found those words.

After rebuilding the indexes, there is only “Endpoint” in the table. “Symantec” and “Protection” have been removed. The “search” function only finds “Endpoint”.

Summary:

  1. Before the first index rebuilding, I had 4 articles. In all four articles, the “search” function worked correctly.

  2. After the first index rebuilding, most of the words disappeared from the database and the “search” function did not work properly.

  3. I added a new (fifth) article. The “search” function worked correctly in this article.

  4. After the second index rebuilding, most of the words in the fifth article disappeared from the database. Now the “search” function works incorrectly in all five articles.

The first 16 records of the “submission_search_keyword_list” table before and after the second index rebuild (in alphabetical order). All records shown in the pictures refer to the same article (fifth).

Regards
Wojtek

Hi @WSMH,

Submission keywords (that is, the “Keywords” metadata field) is currently not indexed and not searchable in OJS 3.x. I’ve filed that here: https://github.com/pkp/pkp-lib/issues/5388

That appears to be the major part of your concern, correct?

Regards,
Alec Smecher
Public Knowledge Project Team

Thank you for your response.

Here is the “fifth” article: http://ojs3.mechanik.media.pl/index.php/m_sc_test/article/view/8
There are only two words in the “Keywords” metadata field: “słowa” and “kluczowe”. The words “Symantec”, “Endpoint” and “Protection” have never been in this field. There have never been words in these fields that were removed from the database when rebuilding indexes (look at the image in the previous post).

Regards
Wojtek

Edit.

I see that in the previous post I wrote: “I noted 3 keywords:”.
My mistake. I should have written, “I noted 3 words from article:”.
Sorry for the lack of precision.