Papers uploaded for review in OCS can't be read when downloaded

I can’t read papers uploaded for review in OCS 2.3.6

On the server under uploads is …
submission/review/6-4-2-RV.docx
file 6-4-2-RV.docx: Microsoft Word 2007+

When its downloaded and opened in Libreoffice I get:
K########lu#I################_rels/.rels��MK#A
file 6-4-2-RV.docx: data

Its the right size in kB but not readable.

In config.inc.php these are commented out.
; Microsoft Word
; index[application/msword] = “/usr/bin/antiword %s”
; index[application/msword] = “/usr/bin/catdoc %s”
and
mime_database_path = /path/to/magic.mime
Thats not commented out at all.

Mike

Hi @MikeL,

Is it possible that you’ve modified one of the .php scripts in OCS, either intentionally or accidentally? When the situation you describe arises, it usually means that a blank line has been accidentally added to the beginning or end of a .php script. There are too many of these to check manually, so I’d suggest using a standard tool like diff to compare your installation against the stock version of OCS.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi

Good suggestion. I had not thought of that. I did a test (script below) but that did not show any problems.
All php scripts seem OK.

#!/bin/bash
# This is what the output should look like if there are no blank lines 
# before or after the php script tags:
#   --------- ocs_test/classes/conference/Conference.inc.php -----------
#   <?php
#   ?>
#   ----
for f in `find ocs_test -name *.php`; do
    orig=`echo $f | sed s/ocs_test/ocs-2.3.6/`
    diff $orig $f > /dev/null
    result=$?
    if [ $result -eq 1 ]; then
        echo "--------- $f -----------"
        #diff $orig $f 
        head -n1 $f
        tail -n1 $f
        echo '----'
    fi
done

I loaded this:

<?php
echo mime_content_type('php.gif') . "\n";
echo mime_content_type('test.php');
?>

and I get a blank page.
Is there some package I’m missing on my server which supplies MIME info?
What should my settings be in config.inc.php?

On the server the Word docs are fine, if I download them via scp and view them they are OK.

Submissions start in a few days !

@MikeL, the way I am reading your first message is that your config.inc.php literally reads:

mime_database_path = /path/to/magic.mime

That will likely be a problem. Try commenting this line to allow PHP to autodetect; otherwise, replace the “/path/to/magic.mime” with the actual path. You may need you system administrator to tell you what that path is.

Hi

I tried commenting it out but still the same.
I have also used this:
mime_database_path = /usr/share/mime/magic
as there is a file there called magic.
Still have the problem.

Then I found /etc/magic.mime so I have now set:
mime_database_path = /etc/magic.mime

Maybe I have to restart something?
Tried restarting apache and service php5-fpm restart
Still the same problem.

There isn’t a restart required to alter the mime_database_path setting, but for the mime filetype to be applied in the the metadata, you would need to try uploading and downloading a new file while the setting is in place. Changing the setting won’t fix the metadata (if that is in fact the problem).

In your paper_files table in the database, do you have any entries in the file_type field? If blank, and if you enter a valid mime type here (see Microsoft mime types for examples), does the download work?

Okies. database entries are
file_name | file_type
| 5-13-1-SM.odt | application/vnd.oasis.opendocument.text |
| 5-14-1-RV.odt | application/vnd.oasis.opendocument.text |
| 6-15-1-DR.docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document |

so it looks like file_type is OK.
I will try a new file upload.

A new file upload uploads but again when I download it its stuffed.
The db entry for that is this:
6-4-4-RV.docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document
This was a doc saved as a word file from LibreOffice.

If the file_type entries are being calculated correctly, the MIME type is not the problem nor a solution.

It will be something more akin to @asmecher’s initial suggestion of a code problem. OCS mediates the file downloads, so if the file is verified as correct on the server (via sFTP/scp) then it is something in the code that is mangling them in the download.

You mention the filesize looks correct in the download. Can you diff against the original to see what changes in the contents? Is this problem reproducible via publicly accessible site?

Okies, file sizes look OK.
-rw-r–r-- 1 mikel mikel 4.5K Aug 3 22:32 6-4-4-RV.docx
-rw-r–r-- 1 mikel mikel 4.5K Aug 3 00:43 caving_mars_2.docx

file 6-4-4-RV.docx
6-4-4-RV.docx: data
$ file caving_mars_2.docx
caving_mars_2.docx: Microsoft Word 2007+

Public site is xxx
You can create an account and do an upload or I can make you a Director.

There are three extra carriage returns at the start of the file on download. This is almost certainly a case such as described earlier where a PHP file has some extra lines outside of the <?php ?> tags.

There is the off-chance this may not be a the head of a file. E.g.:

<?php
// copyright
// comment local change
?>



<?php
my_local_function();
// normal code
....

Ah I will check again.
Should I look for comments directly after the opening tag?

These three lines appear before all of your output on the site which I viewed.

Look closely at the root index.php file:

and if not there, any core file (bootstrap? PKPApplication?) which exhibits any locals.

Any added whitespace must be inside the tags <?php?>

I checked lib/pkp/includes/bootstrap.inc.php
and its fine. No whitespace outside tags.
What do you mean by PKPApplication? and locals?

The PKPApplication class is in lib/pkp/classes/core/PKPApplication.inc.php:

Given its prevalence, the unintentional whitespace will probably be in one of these first file loaded.

Local modifications (locals) would be any differences between the current code and the distributed code. Your script above should highlight most of them.

There are some whitespace differences. e.g. lib/pkp/classes/core/PKPApplication.inc.php
At the ends of files some of mine are:

    }
    ?>

and original ones are:

    }

    ?>

My extra white space is inside the tags still.

Is it possible these closing tags have a newline (or CRLF) following the tag, where the original file ended without the newline (or CRLF) character?

I don’t think its a new line as my script would show:

<?php
?>

----

and its shows

<?php
?>
----

always easy to miss things like that though, and some files have been customised and had some PKP OCS patches applied…

An old recommendation for PHP used to be omitting the closing tags because it was hard to ensure there wasn’t whitespace after the tag.

You might try that on some of these files where changes abut the close tag. For example, if you delete the close tag from PKPApplication.inc.php, does the number of newlines in the output decrease from 3 to 2?