Request for comments: Deprecate the "Native Import Export Plugin"

jonasraoni · May 25, 2022, 10:03am

Problems

The file format isn’t stable: the output for OJS 3.2.x might not be compatible with OJS 3.3.x.
Maintaining the plugin isn’t cheap, any data structure modification requires the plugin to be synchronized, but sometimes it doesn’t happen, and it lays behind.
The code should probably receive a refactor in order to make it easy for us and others to maintain/extend it (e.g. part of its logic, the filter configuration, reside in the database).

What’s holding us from deprecating it?

Users are currently using it as an interchange data format to import/export data to/from other systems
The PKP Preservation Network currently depends on it.

Alternatives

Having a interchangeable file format is very useful! Perhaps we could create a new plugin to import/export using a standard format, I guess that OJS and OPS could probably make use of JATS (the GitHub - pkp/jatsTemplate: Basic JATS document template generator plugin for OJS could be handy here). What about OMP (ONIX?!)?
Delegate the task of importing/exporting data to the API (currently not ready for that).

Best,
Jonas

jonasraoni · May 25, 2022, 10:21am

Given that the format isn’t guaranteed to be compatible across the versions, and that we’re already not storing all the information we have (e.g. reviewers), I consider it’s better to replace it by a standard format (well documented, compatible with other systems, other people reasoned about the requirements, etc.).

For now I think we should leave the plugin barely functional, until the alternatives are ready, and then fully drop it in a future release.

Best,
Jonas

ctgraham · May 25, 2022, 3:04pm

The existing plugin provides a “bare minimum” export path for clients who do not have access to the filesystem/database level of service providers. We should maintain a manager-accessible function, as opposed to a sysadmin-only-accessible function, within the web UI.

mpbraendle · May 25, 2022, 10:33pm

Reply on Problems

Different (Native) Import/Export XML formats aren’t really a problem as long as there is XSLT. It is probably a matter of comparing the XML Schema of the various application versions and then create the corresponding XSLT. Since XSLT can be made modular, there can be a generic transformation which is valid for all 3.x versions and sub transformations that do conversion between 3.x and 3.x+1, 3.x+1 and 3.x+2 differences. The plugin (however you name it) could be generic, but should be able to recognize the version of the import file and choose the corresponding XSLTs for the target import version.
Backcompatibility to older 3.x versions must be guaranteed, whether you provide a generic import/export plugin with a standard format that supports all 3.x application versions or you provide transformations that can transform an old 3.x format to the standard format.

Reply on Alternatives

As I understand, JATS describes only articles (article is the root element), or am I wrong? However, for importing/exporting full journal/issue data, we need a format that describes a journal and its volumes, issues and groups of articles on a higher level. Embedded articles may be in JATS format, but remember that also JATS versions do change as well.
Can PDFs reliably (I mean reliably) converted to JATS XML? I don’t think so. I have tried two free tools on the Web. One produced garbage, the other one produced readable, but sometimes scrambled text. However, being able to transfer full-text content (mostly PDF) is a MUST requirement. This however, sets a restriction to possible formats.

marc · May 26, 2022, 8:06am

The need to move entire journals between OJS has been with us since the beginning.

Like Clinton, I think that (in an ideal scenario) this is something that journal managers should be able to do, without needing the intervention of administrators.

The main limitation of the native xml import/export plugin is related to changes in the data model, but also to collisions between source and target OJS configurations.

Moreover, as Martin explains, the plugin only exports the published article (data from the editorial process are not included).

I would be in favour of looking for temporary solutions in the short term (as journal mobility is not going to disappear) while we think about more definitive solutions.

In this thread I briefly discuss motivations and possible course of action: Thoughts about how to improve native XML import/export plugin · Issue #7898 · pkp/pkp-lib · GitHub

TL;DR; from easier (and more urgent) to complex, my list would be:

Add information on the version used to create the export to show warnings when they don’t fit.
Improve the info offered by the native import-export plugin when an error is found (ie: better display? link to a DIG document? clearer error feedback? suggested solutions to common errors?)
Extend the current plugin (or reformulate it as API+plugin) in order to work better between different versions (ie: modify the xml to rename tags)
Extend the current plugin (or reformulate it) to export including article history.

Martin’s solution based on XSLT sounds great but I’m not able to estimate how hard could it be and what implications would it have (could it be done with saxon or is there any FLOSS solution for those transformations?).

By the way, in the distant future, with time, resources and a base of 25,000 journals, I wouldn’t think it would be outrageous for PKP to consider defining a standard for the encapsulation of entire journals.