Broken .pdf files when viewed/downloaded from OJS 2.4.8.1 – missing EOF marker

Dragomir · April 13, 2018, 12:17pm

Hello,

I’m in a situation where some of my previously uploaded .pdf files do open in the browser, but some of them don’t. For the problematic ones, I receive a Failed to load PDF document error when I try to view the .pdf from OJS. What’s more, when I download the .pdf file locally on my computer, Adobe Acrobat says that There was an error opening the document. The file is damaged and could not be repaired.

So, one could assume that the .pdf in question is really a broken one. However, if I navigate to the folder where that specific .pdf file resides and if I open that file from File Manager in cPanel, the file loads without any issues in the browser. There are no issues even if I download that same file directly from my hosting account.

In Notepad++, I also compared the .pdfs downloaded directly from my hosting account to the one downloaded from OJS. The only difference was at the end of the files:

The .pdf file downloaded directly from my hosting server had the following lines at the end:

…

trailer
<<
/Info 43 0 R
/Root 1 0 R
/Size 44
>>
startxref
630562
%%EOF

While the one that was downloaded from OJS had the following as the last line:

…

trailer

It looks like there’s something that blocks some of my .pdf files in OJS, resulting in missing EOF. Do you know what could be the issue here?

Thank you.

Regards,
D

Dragomir · April 16, 2018, 7:23am

Also, I’ve just tested the integrity of these files through the following online PDF repair tool: PDF Tools Online - Repair PDF . According to that tool, I get the following results:

When I upload the file that was downloaded through OJS there, here’s what I get:

The file is corrupt and cannot be repaired, but possibly recovered
    2 pages recovered.
    Page 1 of 2 	Last page

Errors:

Open file.
0x80410108 - E - The end-of-file marker was not found.
    - File: from-ojs.pdf
0x8041010A - E - The 'startxref' keyword or the xref position was not found.
    - File: from-ojs.pdf
0x80410108 - E - The end-of-file marker was not found.
    - File: from-ojs.pdf
Recover XREF table.
0x8041010E - E - The file trailer dictionary is missing or invalid.
    - File: from-ojs.pdf
Analyze Objects.
Analyze Outlines.
Analyze Pages.
0x80410113 - E - The file is corrupt and cannot be repaired. Some of the contents can possibly be recovered.
    - Page No.: 2
    - File: from-ojs.pdf
Recover Pages.
Save output file.
Close file.
3-Heights(TM) PDF repair tool, evaluation license valid until unbounded

And this is the result for the same file that was downloaded directly from the repository directory of my hosting account:

Successful completion
    The file was successfully analyzed, and no corruptions have been detected.

Hope this sheds some more light on my case.

Regards,
D

asmecher · April 16, 2018, 3:11pm

Hi @Dragomir,

Is it possible that in editing a PHP script you may have introduced whitespace accidentally to the beginning or end of a file? You can check for these using a tool like diff.

Regards,
Alec Smecher
Public Knowledge Project Team

Dragomir · April 17, 2018, 10:42am

Hi @asmecher,

I confirm that this could be a possible cause. Actually, these .pdf files are accessible from the same section of the journal (http://mysite.com/ojs/index.php/journal-name/issue/view/219). As all of these files have the same link structure, I assume that all of them should be using the same file templates…? But some of these .pdf files are loading normally through OJS, some aren’t…

Could you please suggest where to look exactly?

Thank you.

Regards,
D

asmecher · April 17, 2018, 3:48pm

Hi @Dragomir,

It’ll be nearly impossible to find the added whitespace by looking for it manually – try the diff tool I suggested above. It’s a standard tool so you should be able to find plenty of guidance about how to use it by googling around.

Regards,
Alec Smecher
Public Knowledge Project Team