OJS searching - convert to use alternate documents?

We have a site where we are posting collections of articles, not new original publications of content. Many sites that do this just include reference information and access (URL) links if available (arXiv, INSPIRE, …), but we want users to easily be able to access and work with the actual content of the papers.

However for that we don’t have nor desire to get permission to repost the original source content from all articles as that would be difficult and expensive, so instead similar to these other sites we are posting the metadata for the content of each as a galley, and users can then choose that and get a URL for the original content.

This then makes access almost as convenient as if we were actually posting it but fits within legal copyright limits.

However, that means that searching on the site is then only for the actual online content, which is the title, authors, summary/abstract, and keywords (are they searched?), and does not include the actual article content – a major restriction.

I am wondering how this is all handled in OJS. When one would include a galley of the original content, I am wondering if all such content is kept in a separate repository from the article descriptions and then accessed from the actual article if requested. If so, that would mean that indexing and searching would be over that single collected central repository, which is defined somewhere in the code or settings.

In that case, would it be possible for us to retarget the search and indexing code to instead use our own local repository, which is not published due to the above restrictions, but can contain local copies of the materials.

If this was possible, to coerce OJS searching mechanisms to use our local repository, it would achieve the main goal of content searching but be easier than us having to create our own local index/search (Lucene’ish) searching system over our repository for which we would add a new button on the main page (or subsume the current search button).

Any/all inputs and comments welcome.

TIA

Hi @guthrie,

Interesting use case - thanks for sharing. Would you mind indicating which specific version of OJS you’re working from? @bozana - would you (or another dev team member) be able to advise on this?

-Roger
PKP Team

Sorry to have omitted this: 3.4.05 version.

PS: It seems like there could be other similar approaches. e.g. the way to specify the source article link as is done with a galley now and use that for standard current indexing and search, but do not present the actual content of it to the user, or just present it as an access link URL,

This would mean some other type of metadata attribute for an article, which means: here’s a link to the article source, but indirect and searchable but not to be directly displayed.

(Search fragments are allowed under fair-usage.)

Any follow-up ideas or suggestions?

Hi @guthrie,

There are a couple of things getting in the way of making this easy…

  1. OJS doesn’t interact with “remote galleys” aside from linking users to the external resource – it never fetches the remote resource for local use. So it’s excluded from the search engine, except for the submission’s metadata and other things that OJS does have locally.
  2. We’ve rewritten the search code for 3.6, so time invested into modifying the search code for 3.3, 3.4 or 3.5 will be a dead end that’ll have to get rewritten for 3.6 and newer. (The good news is that the 3.6 search engine implementation is much cleaner and would be a better target to work with.)

One way of going about this, I suppose, would be to work with the Lucene plugin, so that OJS uses that as the basis for searching – and then use an external toolset to inject the contents of externally-hosted resources into the Lucene back-end that OJS is using. But that’s not a trivial amount of work.

Sorry, I think there are no easy answers here!

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like

Thanks, I think I got it.

Thank you for the ideas and explanation.

I hadn’t’ understood your point #1 before - that galley’s are not fetched and served locally, only on demand. I don’t understand your term “remote galleys” - could you elaborate that in this context?

For source access: That is basically what we want to do, so it seems like we could just use the standard OJS system, since it would not mean reposting any content lacking legal permissions, but it is just giving an indirect link where users can access it directly themselves - the same thing we are doing except we make the link visible to the user to click themselves, but with the same result.

Yes?

For searching: you comment: “an external toolset to inject the contents of externally-hosted resources into the Lucene back-end that OJS is using"…”

isn’t that the same as what I had proposed, where actual article course contents ae not served locally, but only indexed by the search sub-system which then serves search results.

(Does it locally cache them?)

I suppose what I need to understand better is to clarify:

“OJS doesn’t interact with “remote galleys” (?) aside from linking users to the external resource – it never fetches the remote resource for local use.

If I point to article source with a galley which is the URL of the source, how is that treated by the Lucene plugin, or is it? Are all galley’s treated the same?

TIA

Hi @guthrie,

By “remote galley”, I mean a galley for which you specify a URL to where it’s hosted externally, as opposed to uploading it to OJS.

When using remote galleys, OJS never downloads the PDF (or whatever format is hosted at the URL). Regardless of whether you use Lucene or OJS’s built in search, OJS never fetches (or indexes) the file.

I make reference to Lucene because there will be tools from that ecosystem to ingest and index PDFs, and it might be possible to feed your external content into the system that way. It’ll take some Lucene expertise, though – something I’m in short supply of!

Regards,
Alec Smecher
Public Knowledge Project Team

2 Likes

This topic was automatically closed after 12 days. New replies are no longer allowed.