OJS 2.4.8: Word documents corrupted upon Download

Hi,
I have problems with OJS 2.4.8.1. When I download submitted Word documents they are corrupt. But if I copy the same files directly from the server using scp they are fine. These are the clues I have so far:

  • There are several journals in the same installation, most of them under the same domain, but the two that have problems are on separate domains with some rewrites and a reverse proxy.

  • In one of the files I tested the direct copy is 50 KB and the corrupt copy 75 KB.

  • I sometimes get the following message in the error log that seems to coincide with clicking download (but doesn’t always show):

      ojs2 has produced an error\n  Message: USER WARNING: Smarty error: unable to read resource: "article/pdfInterstitial.tpl"
    
  • As far as I can tell there are no problems with PDF and RTF files.

Any ideas what could be at play here?

Simon

If you edit the details of one of the Word galleys what is the “File Type”?

User Home → Editor → (select article) → Edit → Layout → (choose galley) → Edit → File Type

There wasn’t any file under Layout, but I uploaded the same Word document, and it was assigned file type

 application/vnd.openxmlformats-officedocument.wordprocessingml.document

and I get the same error when I try to download it.

Sorry, I wasn’t very clear. Under the Layout section, there may not have been any Layout Versions, but there was a Galley Format, right?

What File Type was/is listed there?

If it is also “application/vnd.openxmlformats-officedocument.wordprocessingml.document”, is the uploaded file a docx? What is the nature of the corruption that happens in the download?

Can you access the site bypassing the reverse proxy? Does the same problem happen?

So the problem isn’t that users can’t download the files, but that the editors can’t view submission files, i.e. we haven’t gotten to the layout part yet. This is what I have under Layout for one of the submissions that causes problems:

If I change the base_url for the journal to the same as the ones that work, it solves the problem. It wasn’t me who set up the rewrites and the reverseproxy so I don’t know much about the intricacies involved here.

When I download the file it is 25K larger than the original, as mentioned, and when I try to open it in Word, LibreOffice or Google Docs, all of them complain that the file can’t be opened. I inspected the files using ‘less’ and the readable parts look similar (so I assume the offending part has been appended to the original file).

Ah, ok. This seems to point toward a problem with the proxy as opposed to OJS.

It is unlikely that a rewrite rule could cause this, but you might want to post your rules here to see if we can catch something out of the ordinary.

My bet is on the proxy, however.

A docx is really just a ZIP file of an XML document and supporting files. If you provide an original file and a corrupted file to whoever support your proxy software, they might be able to identify what is going on.

Regarding this occasional error:

I’m not sure where the viewPDFInterstitial call would be coming from. I see a handler for it, and I see the possibility of the routing to the handler, but I’m not clear on how that call would be generated from the codebase. Can you get a referring URL for these interstitial requests?

Ah, ok. This seems to point toward a problem with the proxy as opposed to OJS.

Ok, that would explain why I couldn’t find any mention of the problem online. Thanks for your time anyway. I’ll report back if we find a solution worth sharing.

I’m not sure where the viewPDFInterstitial call would be coming from. I see a handler for it, and I see the possibility of the routing to the handler, but I’m not clear on how that call would be generated from the codebase. Can you get a referring URL for these interstitial requests?

No URL unfortunately, but checked the log again and the error occurred several times during the night, so it is probably not related to the other error. The log entry that co-occured with my other error came from another IP address than mine (should have checked that).

Still, something seems to be wrong in the codebase because the ArticleHandler class calls $templateMgr->display('article/pdfInterstitial.tpl') (line 261) but the file pdfInterstitial.tpl is missing in templates/article in the branch ojs-stable_2_4_8. I have the same warning over 30 times in the log from the last two days.

Yes, I agree. I was hoping if your access logs indicated a referrer for the viewPDFInterstitial call, we might identify if this was perhaps a legacy OJS URL which is no longer supported (but for which you still have some links in the wild) or if it is something different.

After many hours of trial and error I finally found that mod_xml2enc caused the conflict (which is used by mod_proxy_html). The problem disappears if I simply disable it.

2 Likes

Hi, I have the same problem. File uploaded was 500KB, downloading using scp no problem. Download from OJS 3, only 76KB and it says can’t load or corrupted. I have checked, i dont have mod_xmlenc. My PHP ver 5.6.13. In my error log, I have this below:

PHP Strict Standards: Declaration of FileApiHandler::authorize() should be compatible with PKPHandler::authorize($request, &$args, $roleAssignments, $enforceRestrictedSite = true) in …
PHP Strict Standards: Declaration of SubmissionFileDAO::fromRow() should be compatible with PKPSubmissionFileDAO::fromRow($row, $fileImplementation) in …
PHP Strict Standards: Declaration of DevelopedByBlockPlugin::getSeq() should be compatible with BlockPlugin::getSeq($contextId = NULL) in …
PHP Strict Standards: Declaration of ArticleHandler::authorize() should be compatible with PKPHandler::authorize($request, &$args, $roleAssignments, $enforceRestrictedSite = true) in …
PHP Warning: Cannot use a scalar value as an array in …

What could be the problem? Is it related to plugins. Will deleting plugins will help? But there are many errors Previously I used 2.4.8, I just upgraded to OJS3.0.2. Any help will be appreciated. Thanks

Hi @tdmy,

Have you looked at the contents of the corrupted document to see if any hints are there? Look for example at the very start of the file and the very end of the file to see whether there’s an indication of an error message or similar.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi, thanks for replying my message. downloaded file cannot be open at all, both pdf and word. I cant check the beginning or the ending. Is there a way to check although the file cant be open? I can open small file like 100kb.

You can “open with” Windows notepad or a similar text editor to see if the file is not really a PDF or if the PDF contains a textual error at the beginning or end of the file.

Hi, I opened 1 pdf file. I get this until the end:

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-MY) /StructTreeRoot 83 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 9/Kids[ 3 0 R 43 0 R 45 0 R 47 0 R 54 0 R 56 0 R 58 0 R 65 0 R 69 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F3 11 0 R/F4 13 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/XObject<</Image15 15 0 R/Image17 17 0 R/Image19 19 0 R/Image21 21 0 R/Image23 23 0 R/Image25 25 0 R/Image27 27 0 R/Image29 29 0 R/Image31 31 0 R/Image33 33 0 R/Image35 35 0 R/Image37 37 0 R/Image39 39 0 R/Image41 41 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 6346>>
stream
xœÍ]koÜ6Öþ ÿAŸ^Ì,bY”H]ŠÅbskân“fc§/Šv?ȶb»ŽgÜÑL÷×ïy/¢®'3Å¢€3"yϝ‡”zøtµ¾úXž­ƒ¿ÿýðéz]ž]VçÁ¯‡'ËÛÿžÜÝV‡ïÊ‹«E¹¾Z.7§k4½®ÊójõÏ^<ž<~tø½Š°Hƒ“‰ ¢ÿD«PD2ÈŠ,”ÁÉ
A½:΂‹úñ£(¸à§Ü<½zü(8ˆÂH&Ipröëìé§ù‚“?zIsc~;e‘„yêOI­p0Ÿ…qìÃÿ:û÷ü@ÎÊÅü ›Ñ¢æùl5?HfÁÑü ˜-ÖÕ\ꆞ@¿—
j-Ñði®fÁ˜a‰Ç

That definitely starts out as a PDF. If it doesn’t end with a textual error message that can be attributed to OJS, and if you don’t see an associated error message in your PHP error log, this unfortunately points to a problem at the webserver or network level.

One other thing to check would be to look at your access log to see what HTTP status is associated with the request.

Thanks. will check access log. I have other systems running in the same server, php with symfony framework, worry will conflict with the OJS too.