PDF.js viewer fails when proxied [fix and solution]

ojs

#1

This post is a summary of an issue related to PDF.js and the EZproxy software by OCLC used to by libraries to proxy resources. It’s somewhat more verbose than it might be to include all likely search terms for this issue.

In this example I’ll use the OJS server http://mapress.com/ but the issue affects many for-pay installs of OJS when used through EZproxy. I’ll be using my institutional EZproxy install. For the example resource:

http://mapress.com/j/zt/article/view/zootaxa.4150.4.1/7282

We proxy it as

http://mapress.com.helicon.vuw.ac.nz/j/zt/article/view/zootaxa.4150.4.1/7282

but the iframe is being rewritten as:

<iframe src=“http://mapress.com.helicon.vuw.ac.nz/j/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=http%3A%2F%2Fmapress.com%2Fj%2Fzt%2Farticle%2FviewFile%2Fzootaxa.4150.4.1%2F7282” width=“100%” height=“100%” style=“min-height: 500px;” allowfullscreen webkitallowfullscreen></iframe>

Clearly the second instance of host name in the URL isn’t being rewritten. When the javascript tries to access that non-rewritten URL it’s flagged as a cross-site security issue and fails hard with the message "PDF.js v1.0.907 (build: e9072ac) Message: Unexpected server response (0) while retrieving PDF "http://mapress.com/j/zt/article/viewFile/zootaxa.4150.4.1/7282"."

The short-term fix for EZproxy admins is for the EZproxy stanza for this resource to include the two lines to rewrite the javascript:

Find file=http%3A%2F%2Fmapress.com
Replace file=http%3A%2F%2F^Pmapress.com^

The longer-term solution would be to switch from using absolute to using relative URLs when passing the info to PDF.js. Is this likely to be possible?


#2

Would EZProxy’s DJ or HJ directives be of help here for the general case rather than replacing specific hostnames?

It seems more like a proxy issue than an OJS issue, but I admit I am partial to the solution of using relative urls whenever possible.

This would mean adjusting the $pdfUrl either at assignment:


Or altering it just before passing it to pdfjs:


#3

The DJ (Domain Javascript) and HJ (Host Javascript) implement the basic logic of “rewrite the host names in bitstreams identified as javascript as well as bitstreams identified as HTML.” They are very useful when navigation / redirects are done using javascript. Neither rewrites things that look like host names in the path or query parts of the URL.

So they cannot be used to fix this issue, alas.

I agree that this is not an OJS issue, but it is an issue for a number of OJS users and/or their clients.


#4

We are also having EZProxy setting wrong URL for the iframe; and instead of fixing the absolute URL, we decided to use relative URI instead:
$relativeUrl = '/index.php/' . $journal->getPath() . '/article/download/' . $article->getBestArticleId() . '/' . $galley->getBestGalleyId()
This variable is calculated within the plugin class, and replaces the $pdfUrl in the iframe src attribute.

Would this cause any issues? @ctgraham or someone else who knows.

Sean


#5

You may be able to do this for your specific install, but it won’t be able to be generalized in other cases where resful urls remove the index.php or journal path, etc.

In a general case, you would probably want to remove the base url from the fully formed URL. There is a core function to assist with this:

That said, if your install will always use the /index.php/ prefix to a journal path, your code should be fine.


#6

Thanks @ctgraham ! For sure, we will try to integrate removeBaseUrl function on $pdfUrl instead.


#7

@ctgraham I just tried using Core::removeBaseUrl on the URL generated by the PKPPageRouter within the PDFJsViewer plugin.
I’ve noticed that the core function did not generated the “/index.php/” part of URL, even though our installation uses it.
I’ll investigate further.


#8

Hmmm… it looks like this is an intentional feature of this function:

This might be more complex that I would have hoped. If not Core::removeBaseUrl, I’m not familiar with a function in the code which will construct a relative URL.


#9

I’m thinking there should be a way for PHP to detect whether “/index.php/” has been configured (or simply whether it is part of the URL), and maybe the core function can be modified to conditionally remove the URI instead of always removing.


#10

Whether or not OJS’s “index.php” is part of the URL is configured with the restful_urls option in config.inc.php.

You can use Config::getVar('general', 'restful_urls') to check this setting. Because various other functions depend on Core::removeBaseUrl() as-is, we won’t be able to modify this function without either adding a new parameter or auditing all current usages.


#11

We were able to fix our issue by carefully inspecting our EZProxy configuration and fixing bad configs.
There was a typo in a stanza that affected OJS proxying. After fixing that typo, the base URL was correctly generated by this plugin. No modification for the plugin was necessary.