3.1.1.2 No search results for some journals

We’re running OJS 3.1.1.2.

I have rebuilt the indexes using tools/rebuildSearchIndex.php and the config.inc.php is configured to index full text:

; PDF
; index[application/pdf] = "/usr/bin/pstotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"
index[application/pdf] = "/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"

; PostScript
index[application/postscript] = "/usr/bin/pstotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"
; index[application/postscript] = "/usr/bin/ps2ascii %s | /usr/bin/tr '[:cntrl:]' ' '"

; Microsoft Word
index[application/msword] = "/usr/bin/antiword %s"
; index[application/msword] = "/usr/bin/catdoc %s"

All above apps are correctly installed. However, we’re getting mixed results when searching. For example, when searching our journal at Search | South African Journal of Science with keywords such as Science we get no results. In fact, no matter what we search for we get nothing.

Other journals are returning results but certain keywords which appear in submissions are not being found.

In noticed that the number of records in the various search tables halved after the reindex.

The reindex was carried out successfully about 2 weeks ago.

I’m wondering if there is something I’m missing when building the search index.

Any help much appreciated.

Thanks

Hayden

I’ve done some further research using the search query:

SELECT
o.submission_id,
MAX(s.context_id) AS journal_id,
MAX(i.date_published) AS i_pub,
MAX(ps.date_published) AS s_pub,
COUNT(*) AS count
FROM
submissions s,
published_submissions ps,
issues i,
submission_search_objects o NATURAL JOIN submission_search_object_keywords o0 NATURAL JOIN submission_search_keyword_list k0
WHERE
s.submission_id = o.submission_id AND
ps.submission_id = s.submission_id AND
i.issue_id = ps.issue_id AND
k0.keyword_text = ? AND i.journal_id = ?
GROUP BY o.submission_id
ORDER BY count DESC
LIMIT 500

replacing the ?s with a keyword and journal_id with the troublesome journal.

It appears that the submission_search_objects table is never populated with any of the published submissions from the problematic journal. submission_search_objects is populated when running the rebuild indexes cmd tool?

Hi @haydenyoung,

Are you sure the reindex tool ran successfully? If you’re using e.g. a secure shell connection to your server, it can terminate, which will stop any processes it’s running. You might want to look into using nohup to run the reindex script.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher

Thanks for the recommendation.

Yes, detached the indexing job from the terminal using nohup. Also tried ctrl+z, then jobs then disown to run the job then detach it. I logged out and back in a few times to check the job was still running to make sure it hadn’t exited prematurely.

From my findings above, I think that the job runs fine but the above SQL query doesn’t pull results for a particular journal. However, not sure why.

Hi @haydenyoung,

You can use the debug option in config.inc.php to get OJS to dump all SQL queries it’s executing. (Be warned, it’ll dump them to the browser for anyone who hits the site while the option is enabled, and the SQL dump will also interfere with AJAX requests while it’s enabled, so use the option judiciously.)

Query results are cached for 24 hours via ADODB’s caching mechanism, so if you’re not seeing the same results via the search interface that you expect to see from the database, or if you’re not seeing database queries that you expect to be logging, try a different keyword that you haven’t recently searched.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher

Thanks again for the recommendations. Okay debug sql now turned on.

Not sure if it is related but I’m getting the following error:

Cannot create /var/www/html/cache/_db/fa
/var/www/html/cache/_db/fa/adodb_fa63a228540e5d643f1bcddd166ac708.cache cache failure: /var/www/html/cache/_db/fa/adodb_fa63a228540e5d643f1bcddd166ac708.cache file/URL not found (this is a notice and not an error) 

Which is strange since permissions match the web server:

$ ls -dl /var/www/html/cache
drwxr-xr-x 6 www-data www-data 36864 Aug 24 09:00 /var/www/html/cache

Creating /var/www/html/cache/_db does fix the problem and the db cache file is created although this does not fix the missing search results.

The search query:

SELECT o.submission_id, MAX(s.context_id) AS journal_id, MAX(i.date_published) AS i_pub, MAX(ps.date_published) AS s_pub, COUNT(*) AS count FROM submissions s, published_submissions ps, issues i, submission_search_objects o NATURAL JOIN submission_search_object_keywords o0 NATURAL JOIN submission_search_keyword_list k0 WHERE s.submission_id = o.submission_id AND s.status = 3 AND ps.submission_id = s.submission_id AND i.issue_id = ps.issue_id AND i.published = 1 AND k0.keyword_text = 'science' AND i.journal_id = '8' GROUP BY o.submission_id ORDER BY count DESC LIMIT 500

Running this with the journal id and the search term directly on the mysql database also returns no results.

I’m wondering why the rebuild index script isn’t pulling keywords for this particular journal. Could it be related to the missing _db directory in the cache?

Hi @haydenyoung,

You can manually specify a journal for reindexing on the command line:

php tools/rebuildSearchIndex.php myJournalPath

…where myJournalPath is the path of the journal (e.g. the part of the URL identifying the journal).

I’d suggest trying to reindex that journal specifically, making sure that the command isn’t aborted prematurely. If you’re not sure whether it’s completing, you could try adding a line like

echo "Done.\n";

…to the end of the execute function in tools/rebuildSearchIndex.php. Then watch for the Done. message to be sure that the script has completed successfully.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher

again thanks for the recommendation.

Okay ran indexing on the troublesome journal;

php5.6 /var/www/html/tools/rebuildSearchIndex.php sajs

and got:

search.cli.rebuildIndex.indexingByJournalNotSupported

Then tried it on a journal which I know is successfully indexing:

php /var/www/html/tools/rebuildSearchIndex.php jesa

but got same message:

search.cli.rebuildIndex.indexingByJournalNotSupported

So it looks like it’s dying here:

What part of the code handles the keyword extraction and sql insertion? I might be able to wrap it in a test harness to work out why it doesn’t like this particular journal.

Hi @haydenyoung,

Oops, my mistake – the option to re-index on a per-journal basis is only supported using the Lucene search index (currently undergoing rewrite in OJS 3.x), not the built-in search index that you’re using.

Is it feasible to rebuild the entire index, with the “Done” message added? If the message appears, that would at least ensure that the script completed successfully before debugging further.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher

Sorry for not getting back to you sooner. So I ran the indexer on both our staging site and also the production site; there are about 10M to 16M search terms according to the database so the reindex took some time (about 5 days on our staging server which is a small Amazon ec2). So it looks like the indexing has successfully worked which is great. However, I did notice in the logs that sometimes the indexer bombs out with some kind of waiting too long for query error from MySQL. If the error shows again I’ll add it here.

Thanks for your help. If any other issues arise I’ll report back.

Thanks

Hayden

1 Like

Hi there, hi @haydenyoung,
we currently see a similar problem. We run a multi-journal OJS installation (3.1.0.1) and the search is working with all our journals that publish articles submitted by using the standard workflow. However, search is not working with one of our journals, where we publish articles by native xml import of whole issues. Could there be a relation to the problems described in the former posts?
Cheers,
Sleipnir

Edit: @bozana
I put you here because in the question Problem with search after upgrading to OJS 3.1.0.1 you asked for different workflows publishing articles. Maybe you could also help here :grinning:

you need to rebuild the search index. You can find instructions how to do that from the forum.

When you import articles with xml they are not added to the search index by default.