Feedback needed: Dataverse plugin for OPS/OJS 3

alexxxmendonca · April 4, 2022, 7:35pm

Dear all,

We are pleased to inform you that an MVP version of the Dataverse plugin for OPS 3.x is now available.

For this initial MVP version, the aim was to reimplement the plugin based on the version for OJS 2.x. This MVP version only works for OPS. We intend to make it compatible for OJS in future releases – any help is much appreciated!

Given that the task here was to make a new implementation of the plugin, we didn’t worry about making adaptations, but considering how much OJS has changed, a few adaptations may be needed, such as Supplementary Files from OJS 2.x that have been replaced in favor of a list of customizable components.

We would like to request feedback from the community before we can move forward.

We list below our main points of attention. We would welcome your feedback.

Metadata interoperability

How should the interoperability of metadata between OPS/OJS and Dataverse behave? Currently, the metadata is copied over to Dataverse once the submission to OPS/OJS is completed, but metadata changes in OPS/OJS or Dataverse will not reflect in either of them.

Is that the expected behavior, or should metadata be updated if there are changes? If yes, when is it the best time to do it: as soon as it’s changed in either one of them or only if/when submission gets accepted?

Data synchronization

Currently the datasets uploaded via OJS/OPS are transferred to Dataverse but a copy remains in OPS/OJS available for download at the user interface, as components of the preprint. Is this the expected behavior or should we center it all on Dataverse? If not, how should we deal with extra large file uploads (can OPS/OJS handle it?)? What about versioning or file edits?

Our recommendation:

Given that the data and metadata duplicity brings a great deal of complexity to the OJS/OPS integration with Dataverse, we recommend that this should be avoided.

With this approach, Dataverse would be the only source of information about the datasets. Any operation that changes the state of these resources should be forward to Dataverse. In this way, we intend to make the integration loosely coupled, referencing the resources held in the Dataverse with links, and replicating as little information (data or metadata) as possible in OJS/OPS.

Previously published research data

While we think it is important to maintain the integration with Dataverse, allowing the submission of research data via OJS/OPS along with the manuscript submission process, another change from the original plugin that we would like to have is to be able to receive research data that is previously published in other sources (either a different Dataverse than the one that is linked to the OJS/OPS installation, or a different system altogether, like Zenodo). Eventually, a submission can also have research data from more than one repository.

Our recommendation:

Allow the authors to inform links to other datasets published in different data repositories.

This initiative is being led by SciELO Brazil as part of its strategy to adopt Open Science practices to the global research communication workflow. The development is being executed by Lepidus. Once again, we appreciate any volunteers that would like to participate.

You can learn more about the current state of this development by watching the presentation that Lepidus made about the plugin in November at the Open Publishing Fest 2021:

November 19 - presentation in English

November 16 - presentation in Portuguese

juliangautier · April 7, 2022, 5:52pm

Hi all,

I help work on the Dataverse software as part of the team at Harvard and have some experience with the OJS 2.x plugin and discussions about it. Thanks for the opportunity to share some thoughts about these great points. I’m not able to use the OPS plugin so some of my feedback will probably lack the benefit of getting to experience the MPV.

I agree that the files of datasets should not be stored in OPS, otherwise like you wrote syncing issues become difficult to manage, maybe more so for preprints where researchers might be updating their datasets more often than they might when an article and its dataset are being reviewed by a journal.

When you write that you intend to replicate “as little information (data or metadata) as possible in OJS/OPS,” I think this is also tied to your questions in the “Metadata interoperability” section, right? It partly depends on how much information/metadata about the dataset is being kept and displayed in OPS. But regarding how much information about preprints should be kept and displayed in Dataverse repositories, I’m thinking that the most important piece of information is the preprint’s persistent ID or IDs, so I have questions about that below. Sometime in the future, I imagine that the type of relationship between the preprint and the dataset will also be important, since it’s what DataCite and Crossref need in order to track and publish citation counts, but this isn’t information that is being recorded in the Dataverse software right now.

Regarding PIDs in OPS, does each version of a preprint get its own persistent ID? In Dataverse repositories, each dataset version doesn’t get their own persistent ID and a dataset persistent ID, such as the DOI, always points to the latest published version. So would someone looking at a version of a preprint, say version 2 of 5, need to track down a corresponding version of an associated dataset in a Dataverse repository? If so, because each version of a dataset in a Dataverse repository can be accessed only if you know the version number, the only way to get to an earlier version of a dataset is to follow the DOI, click on the landing page’s version tab and click on the version number. So in addition to the dataset PID, it might be important to record and display the dataset version number, so that someone looking at a version of a preprint can get to the right version of the dataset.

And I agree about letting preprint depositors indicate that another dataset deposited in any repository, like Zenodo, is related to that preprint. If my questions about matching preprint versions with dataset versions has any merit, the nice thing about Zenodo is that it registers PIDs for each published version, so people can click on a dataset link and be taken straight to the right version of the dataset.

I hope this is helpful!
Julian

alexxxmendonca · April 13, 2022, 1:43pm

Hi Julian,

Thank you very much for your valuable feedback from someone inside Dataverse sotfware team.

We’re happy to know that we are on the same page regarding not storing the datasets in OPS. This is a huge behavior difference between the original plugin and this new one so we’re glad we are in agreement here.

It partly depends on how much information/metadata about the dataset is being kept and displayed in OPS.

Actually, in OPS we are only displaying the link to the dataset.

When authors submit their preprint (or journal article) they have the choice to also upload datasets, which would be sent to the Dataverse linked to that server or journal. The matadata filled for the article during the submission process would be used to populate the metadata fields in Dataverse.

But if the dataset already exists in Dataverse, then the authors only need to inform the link and it would be displayed at the public interface next to the article.

Regarding PIDs in OPS, does each version of a preprint get its own persistent ID? In Dataverse repositories, each dataset version doesn’t get their own persistent ID and a dataset persistent ID, such as the DOI, always points to the latest published version.

It’s the same for OPS. Only one DOI pointing to the latest version.

Finally, we’re also happy that we agree on capturing links from other data repositories other than Datavese.

Thank you, Julian!

asmecher · April 13, 2022, 10:49pm

Hi all,

Thanks and congratulations to all involved! We’ll do some internal exploration and hope to have more to add; meanwhile I’ll be watching this thread with interest, so feedback and input are welcome.

Regards,
Alec Smecher
Public Knowledge Project Team

pablovp · April 20, 2022, 2:57pm

Thanks for your feedback @juliangautier

The issues raised around versioning are also quite valuable!

So in addition to the dataset PID, it might be important to record and display the dataset version number, so that someone looking at a version of a preprint can get to the right version of the dataset.

Yes, it’s necessary. We will take into account in the implementation.

If my questions about matching preprint versions with dataset versions has any merit, the nice thing about Zenodo is that it registers PIDs for each published version, so people can click on a dataset link and be taken straight to the right version of the dataset.

Sure. This makes things a bit easier for the user, that won’t need to look for the version manually in the repository.

pablovp · December 23, 2022, 7:23pm

We made the adjustments that we talked about in this topic.

The decision to keep as little information as possible from the datasets on the application side greatly simplified the implementation, especially in uploading and editing the datasets.

Today we have SciELO preprints using the latest version of the plugin. It can be downloaded by others interested in:

We expect to release a version for OJS in the first quarter of 2023. Today the plugin was developed for OPS only.