HTML galleys empty after upgrade to OJS 3.2.1-1

I upgraded OJS from 3.1.2 to 3.2.1-1. Pdf files are displayed correctly, but html files are empty. I can open and see html file by going to Admin interface->Workflow->Publication->Galleys->HTML, but not in the journal.

Hi @Nerijus_Baliunas,

I’d suggest checking your PHP error log for details. Are you using any additional plugins (besides the ones that come with OJS 3.2.1-1), e.g. themes?

Regards,
Alec Smecher
Public Knowledge Project Team

HI @asmecher @Nerijus_Baliunas,
I am having the same issue with OJS 3.2.1-1, i.e. some HTML galleys don’t render in the public view but download fine from the admin area

A few other points:

  • This is only happening for some HTML galleys. Most are loading fine
  • Galley can be downloaded from Admin interface->Workflow->Publication->Galleys->HTML
  • There are no error message in php/apache logs. The request shows as 200 in the apache access log. I tried turning on display errors in config.inc.php but no message is displayed
  • I looked in the db tables - the file is correctly marked as text/html. The various ids also seem to match up, ie. galley_id, file_id, assoc_id
  • i can see the file fine in the ojs files directory. If i copy file to the ojs public folder, it is served fine & rendered in the browser
  • in the galley view, the contents of the iframe src is entirely empty, i.e. 0 bytes returned
  • No obvious errors in browser developer tools, i.e. shows as a successful 200 response for bother the wrapper html page and the iframe. No errors in console
  • we’re using the Manuscript theme but switching to default doesn’t help
  • we haven’t installed any plugins other than those in plugin gallery (e.g. enabled DOI plugin etc)

Any advice much appreciated as drawing a blank here, especially with no error messages to go on.

Thanks
Eoghan

1 Like

Hi again,
I did some more digging, and the problem seems to be arising during the processing of the HTML. Specifically, the preg_replace_callback to “Perform replacement for ojs://
 URLs” seems to strip all content and return and empty string for $contents (https://github.com/pkp/ojs/blob/master/plugins/generic/htmlArticleGalley/HtmlArticleGalleyPlugin.inc.php#L169) No error is thrown, which would explain the lack of PHP log errors. When the call to preg_replace_callback() is commented out, the HTML Galley renders fine

Some other bits of info that might be useful:

  • This issue doesn’t arise on my dev VM running Ubuntu 18.04 and php 7.2 with the exact same HTML galleys loaded into OJS
  • The problematic, production system is Centos 7 and php 7.4
  • Some (but not all) of the problematic galleys can be quite big (i.e. 5MB+) as the HTML contains base64 encoded images. However, they all render fine on the Ubuntu server and some large galleys (5MB +) also render fine on Centos, so filesize doesn’t seem to be a determiner of the issue

I’d appreciate advice on how to handle this. Is there an issue with commenting out this preg_replace_callback (how often do urls in the form ojs:// appear in the galleys?). Obviously, I’d rather not modify php files locally if at all possible, but this issue is affecting quite a few HTML galleys.

Let me know if you need more information or would like a sample HTML file etc.
Cheers,
Eoghan

1 Like

Hi @eocarragain,

The ojs:// URLs will only be needed if you’ve written your own HTML files to make use of them – which is unlikely. I’m interested in what’s causing this, but unfortunately it’s a very specific combination of configuration to recreate it and I don’t have it here. If you’re able to track it down further, please follow up! I would expect any of the common causes (out of memory, UTF8 mis-encoding, function disabled for security, etc) to leave a trail of crumbs in the error log.

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like

thanks @asmecher. good to know I can safely remove that block. If I get a chance I’ll see if I can narrow the problem down further. Yes, my initial thought was OOM since files are so big, but it is actually set very high in php.ini & as you say you’d expect to see that in the error log. I did wonder about something in the base64 strings causing an issue for the regex but you wouldn’t really expect that to result in no content at all.
Will let you know if we find more since it appears to have affected @Nerijus_Baliunas too.

Eoghan

1 Like

hi @asmecher @Nerijus_Baliunas
another update:

  • so a var_dump showed that $contents was actually null after the call to preg_replace_callback() indicating an (silent) pcre error
  • google brought me to regex - php preg_replace returning null - Stack Overflow which brought me https://www.myintervals.com/blog/2008/01/25/wtf-preg_replace-returns-null/
  • as in that blog preg_last_error() returned an error of ‘2’ indicating the pcre backtrack limit had been exceeded
  • in my php.ini the pcre.backtrack_limit setting was commented out, meaning it was defaulting to 1000000.
  • i increased this to 10000000 which did fix the problem and the galley rendered
  • however i also noticed that pcre.jit was set to 0. Comparing this to my Ubuntu setup, pcre.jit was commented out, i.e. set to 1. Commenting out pcre.jit (or setting it to 1) on the production server also fixed the problem even when backtrack was set to the default
  • in the end I went with just commenting both out (i.e. pcre_backtrack_limit=1000000 and pcre.jit=1)

A few things:

  • i inherited this VM, but the pcre.jit=0 may well be a default Centos or remi rpm php default, so may catch other people out
    *it might be an idea to check if $contents is null after the call to preg_replace_callback() and either a) substitute back the value of $contents before attempting the call or ;b) at least throwing a meaningful error that will get logged in PHP error log.

Happy to test a patch etc.

For now I’m happy because I have been able to revert the php file, so I don’t have to worry about it at the next upgrade but would be good to handle this edge case more generally. The length of the html file (and the inline base64 images) was likely a factor as probably affected the backtrack limit being exceeded in some cases. That means it probably won’t affect too many people with more normal HTML galleys, but even so it would be nice to handle gracefully.

Hope this helps. Cheers,
Eoghan

1 Like

Hi @eocarragain,

Good spotting! That’s a weird one (of the sort that turns people off PHP).

I’m not sure why this caused problems with calls to preg_replace_callback and not e.g. preg_replace, which is used several times with similarly-constructed regular expressions in nearby code. And a bit of local experimentation does suggest that the JIT option appears to decrease the backtrack requirements for some regular expressions.

So for the moment I’d lean towards simply checking and reporting errors as a baseline; if others start hitting these limits, we can look at implementation changes.

To add warnings, this should work: Watch for preg match limit https://forum.pkp.sfu.ca/t/html-galleys-em
 · pkp/ojs@474bce7 · GitHub

Regards,
Alec Smecher
Public Knowledge Project Team

3 Likes

Hi @asmecher,
That looks like a good approach. Cheers
Eoghan