PDF.js viewer fails when proxied [fix and solution]

This post is a summary of an issue related to PDF.js and the EZproxy software by OCLC used to by libraries to proxy resources. It’s somewhat more verbose than it might be to include all likely search terms for this issue.

In this example I’ll use the OJS server http://mapress.com/ but the issue affects many for-pay installs of OJS when used through EZproxy. I’ll be using my institutional EZproxy install. For the example resource:

http://mapress.com/j/zt/article/view/zootaxa.4150.4.1/7282

We proxy it as

http://mapress.com.helicon.vuw.ac.nz/j/zt/article/view/zootaxa.4150.4.1/7282

but the iframe is being rewritten as:

<iframe src=“http://mapress.com.helicon.vuw.ac.nz/j/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=http%3A%2F%2Fmapress.com%2Fj%2Fzt%2Farticle%2FviewFile%2Fzootaxa.4150.4.1%2F7282” width=“100%” height=“100%” style=“min-height: 500px;” allowfullscreen webkitallowfullscreen></iframe>

Clearly the second instance of host name in the URL isn’t being rewritten. When the javascript tries to access that non-rewritten URL it’s flagged as a cross-site security issue and fails hard with the message "PDF.js v1.0.907 (build: e9072ac) Message: Unexpected server response (0) while retrieving PDF "http://mapress.com/j/zt/article/viewFile/zootaxa.4150.4.1/7282"."

The short-term fix for EZproxy admins is for the EZproxy stanza for this resource to include the two lines to rewrite the javascript:

Find file=http%3A%2F%2Fmapress.com
Replace file=http%3A%2F%2F^Pmapress.com^

The longer-term solution would be to switch from using absolute to using relative URLs when passing the info to PDF.js. Is this likely to be possible?

1 Like

Would EZProxy’s DJ or HJ directives be of help here for the general case rather than replacing specific hostnames?

It seems more like a proxy issue than an OJS issue, but I admit I am partial to the solution of using relative urls whenever possible.

This would mean adjusting the $pdfUrl either at assignment:

Or altering it just before passing it to pdfjs:

The DJ (Domain Javascript) and HJ (Host Javascript) implement the basic logic of “rewrite the host names in bitstreams identified as javascript as well as bitstreams identified as HTML.” They are very useful when navigation / redirects are done using javascript. Neither rewrites things that look like host names in the path or query parts of the URL.

So they cannot be used to fix this issue, alas.

I agree that this is not an OJS issue, but it is an issue for a number of OJS users and/or their clients.

We are also having EZProxy setting wrong URL for the iframe; and instead of fixing the absolute URL, we decided to use relative URI instead:
$relativeUrl = '/index.php/' . $journal->getPath() . '/article/download/' . $article->getBestArticleId() . '/' . $galley->getBestGalleyId()
This variable is calculated within the plugin class, and replaces the $pdfUrl in the iframe src attribute.

Would this cause any issues? @ctgraham or someone else who knows.

Sean

You may be able to do this for your specific install, but it won’t be able to be generalized in other cases where resful urls remove the index.php or journal path, etc.

In a general case, you would probably want to remove the base url from the fully formed URL. There is a core function to assist with this:

That said, if your install will always use the /index.php/ prefix to a journal path, your code should be fine.

Thanks @ctgraham ! For sure, we will try to integrate removeBaseUrl function on $pdfUrl instead.

@ctgraham I just tried using Core::removeBaseUrl on the URL generated by the PKPPageRouter within the PDFJsViewer plugin.
I’ve noticed that the core function did not generated the “/index.php/” part of URL, even though our installation uses it.
I’ll investigate further.

Hmmm… it looks like this is an intentional feature of this function:

This might be more complex that I would have hoped. If not Core::removeBaseUrl, I’m not familiar with a function in the code which will construct a relative URL.

I’m thinking there should be a way for PHP to detect whether “/index.php/” has been configured (or simply whether it is part of the URL), and maybe the core function can be modified to conditionally remove the URI instead of always removing.

Whether or not OJS’s “index.php” is part of the URL is configured with the restful_urls option in config.inc.php.

You can use Config::getVar('general', 'restful_urls') to check this setting. Because various other functions depend on Core::removeBaseUrl() as-is, we won’t be able to modify this function without either adding a new parameter or auditing all current usages.

1 Like

We were able to fix our issue by carefully inspecting our EZProxy configuration and fixing bad configs.
There was a typo in a stanza that affected OJS proxying. After fixing that typo, the base URL was correctly generated by this plugin. No modification for the plugin was necessary.

Hi Sean,

Can you share with us details of how you fixed this? We are having a similar issue. Thank you!

Hi @radjr, replying on Sean’s behalf. Our fix involves instructing the subscriber institution to add stanzas to their EZproxy config that look something like this (based on the journal’s domain):

T JPS
U https://jps.library.utoronto.ca
HJ https://jps.library.utoronto.ca

We’ve had a subscriber institution confirm that this has resolved the issue on their end. Hope this helps!

By means of update - if you are on OJS 3.1.2 and the previous stanzas no longer work, it may be due to the URL encoding issue described here Subscribers can't access PDF.js via EZProxy due to URL encoding · Issue #5204 · pkp/pkp-lib · GitHub

A temporary fix could include a find-and-replace of the encoded part in your stanza. If you do, you can provide the functioning stanza to OCLC for inclusion in their database. Here is an example of our makeshift fix - Iter: Gateway to the Middle Ages and Renaissance - OCLC Support

Hi Everyone,

I know this thread is older but I’d like to add to this conversation in 2021 because a journal subscriber last week opened a ticket about urls breaking again due to EZProxy. We found that the merged solution in this thread was not enough to solve it. I would like to add our additional solution included modifications to the javascript in the/pdfJsViewer/templates/display.tpl template file.

In order for this change to be enabled across the board, the display.tpl file had to be replaced in all our themes that used pdfJsViewer/templates/display.tpl. I’ve also commented on the closed issue, hoping to see if it can be reopened and this solution can be contributed. Thank you, take care and stay safe.

Best Regards,
Rachel

<script type="text/javascript">
// Creating iframe's src in JS instead of Smarty so that EZProxy-using sites can find our domain in $pdfUrl and do their rewrites on it.
            $(document).ready(function() {ldelim}
            var urlBase = "{$pluginUrl}/pdf.js/web/viewer.html?file=";
            var pdfUrl = "{$pdfUrl}";
            var encodedPdfUrl = encodeURIComponent(pdfUrl);
            encodedPdfUrl = encodedPdfUrl.replace("https%3A%2F%2F", "https://").replace("http%3A%2F%2F", "http://");
            $("#pdfCanvasContainer > iframe").attr("src", urlBase + encodedPdfUrl);
            {rdelim});
</script>
1 Like

@wangra , in cases where a prior issue has already been closed/resolved, but where additional edge-case or use-cases exist which need to be addressed, please feel free to open a new GitHub issue and submit a PR against that issue.

Thanks for your contribution!

1 Like