Duplicate PDF's through Quick Submit Plugin

We are observing that uploading articles through Quick Submit plugin results in 2 copies of the uploaded pdf’s. We found this out since the disk usage after uploading to OJS was just double than what it should have been. While browsing through the files folder[e.g articles/303/submission] we see that there are 2 pdf of the same paper even though it was uploaded once.

This is using a lot of disk space. Has anyone experienced the same? Is it by design? If we want to delete, is there an easy way or we have to go folder by folder? Is there a way to prevent OJS from making a duplicate copy?

Hi @pcansf,

The Quick Submit plugin emulates the submission/workflow process, and this involves creating a submission file, and then using the same file as a galley upload.

This would have been required behavior in OJS 2.x, but I’m not sure it’s necessary any longer for OJS 3.x. (The relevant line of code is quickSubmit/QuickSubmitForm.inc.php at 48c75bc12804195225d4261c8f3d37ea9c4edbeb · pkp/quickSubmit · GitHub

So you may be able to remove/comment that area of the code and get what you’re looking for for new submissions, but it’s untested and may take some refinement.

Generally this isn’t a huge priority, as storage space generally isn’t a defining cost for OJS users, and most content is generally entered through the normal workflow, not the QuickSubmit plugin.

Regards,
Alec Smecher
Public Knowledge Project Team

Thank you Alec. I will review the code and see if we are comfortable with changing it. In the meantime, is there a way to differentiate between the file used as a Galley vs Submission so that we delete the correct file?

Storage is kind of a deal for us right now since we moved 10 years of archives to OJS :slight_smile:

Hi @pcansf,

Rather than deleting the files from the filesystem outright, you might consider using symbolic links to keep both filenames in place but using a single physical copy. See e.g. hard link - Is there an easy way to replace duplicate files with hardlinks? - Unix & Linux Stack Exchange for details. (I’d suggest doing this for your back-issues but not when using the workflow for new content.

Regards,
Alec Smecher
Public Knowledge Project Team

Alright. Thanks for your advice and sharing the link.

For each PDF in our v3.1 we have folders (under the Files directory) for copyedit, original, and proof. In our case the original folder is empty. I just deleted a single copyedit pdf from our files using FTP and I note that the pdf (presumably the galley proof version) still loads fine. We always use quicksubmit so is there any risks associated with what we did because it is tempting to remove all copyedit versions?

Hi @nickpanes,

You can delete the files from the copyedit area without affecting the published version, which is in proof. But generally I’m not fond of interventions in the files area – you’ll be leaving OJS trying to refer to a nonexistent file, and we might e.g. add an integrity check tool in the future that would (rightly) complain about this.

Regards,
Alec Smecher
Public Knowledge Project Team

OK noted. My frequent presence on this site reflects the fact that I am not too techy but one observation is that many first timers to online publishing I suspect just want the publishing element as we do - we are putting up PDFs made from a hard copy publication. A stripped down version without all the editorial process would in my opinion be very popular with users who have not taken the leap yet. Version 4? quicksubmit standard, payment module an option, no editorial processes, and (of course) only one PDF per article.

Hi @nickpanes,

OJS is at heart a workflow tool, and we’ve generally viewed the Quick Submit plugin as a way to get back-issue content into OJS as a way of adopting it for future content. Are you not interested at all in using OJS for the workflow?

Regards,
Alec Smecher
Public Knowledge Project Team

I guess my thought is that this is partly a different market. You are not in it for the money so it is an opportunity you can let go. However, many organizations are well behind the curve with online publishing. Our organisation attracts members of a certain age who still like to hold a solid book and read it. As long as we publish the hard copies we will probably keep the workflow offline. Our real enthusiasts have 88 year of our physical journals but some copies are very rare and finding stuff in them is laborious. On our digital archive they can search 11000 pages of pdfs in less than half a second, and for a small fee so can members of the public.

Digitisation of existing documents and books is a massive on-going market assisting research and study in all sorts of fields. However, publishing them online is not that easy. Frankly as a non-tech I should be able to load a piece of software to create a website, index and display the content with fast searches and controlled access. There are a few rather expensive hosted online subscriber services that come close but really there is nothing out there at reasonable cost to do this. As non tech I found implementing v3.1 very trying but with our limited requirements had there been a “WordPress plus” out there with an attached database and the limited functionality I mentioned we would probably have gone that route.

Hi @nickpanes,

Can you identify the areas where you’re getting hung up, besides e.g. having two copies of the PDFs? “Wordpress plus” is not our primary goal, but if there are specific impediments, it’s possible that they might be of interest to the community. Is it a matter of learning the software (e.g. where to find the quick submit tool etc), or are there ongoing areas you expect to find challenging, after investing the time in learning OJS?

Regards,
Alec Smecher
Public Knowledge Project Team