Hi @kmccurley,
I’ll try to break this into the right fragments:
The whole purpose of long term preservation is to write data in a format that will still be readable 10, 20, or a hundred years later…
Agreed in principle, with some mitigating arguments. When the PKP|PN got its start a decade ago, JATS wasn’t as prominent a candidate for archiving. PKP|PN chose the XML import/export format with a plan to use XSL documents to permit easy forward-porting of older XML to a current format. This made it quick to launch and get out into the wild. The PKP|PN is a “dark archive”, intended to meet an archiving gap in getting a large body of OJS journals out there without institutional support; for example, by contrast to something like the Wayback Machine, it’s designed with an expectation that someone from PKP will take action when a journal goes down in order to stand an archived copy back up.
We have had some brief conversations about a shift to a more preservation-friendly format, and JATS is a natural contender, but those haven’t been conclusive. After a decade on the same infrastructure, we’d probably want to change some other things about PKP|PN given the opportunity and funding. Meanwhile the current design serves a need.
Clearly the strategy of trying to maintain the native XML import/export plugin isn’t serving the need. When I looked at how I would import documents into OJS, I couldn’t find a plausible strategy.
The biggest use case for the XML import/export tools is the one it was originally written for: back-issue import.
We’ve opted not to expand the current XML import/export toolset beyond published content e.g. to support more entities like peer reviews despite significant demand, because we would like to draw together import/export, user interface, and 3rd-party integration needs around the REST API. This will be much more maintainable than a sprawling XML toolset just for its own sake. As a result, while we do make quality of life improvements here and there, the XML implementation is fairly stagnant (especially around things like error handling, entity relationships, etc). It’s still frequently used for back issue migration, but it’s known to be fussy and incomplete. We are happy to review proposals for change, and there are third parties working on related items (see e.g. Extend native import/export plugin to include additional entities · Issue #3261 · pkp/pkp-lib · GitHub).
This has been explored at a few sprints; you might find context in past sprint reports.
…when creating a publication, supportAgencies is an array of strings with no structure. This means there are no ROR or fundref IDs, no department, no country, no grantID, etc. Others are far ahead on this.
There is a ROR plugin and a Funding plugin.
…when creating a publication, the citations are also just raw strings without structure.
There is 3rd-party work on this that I’ve seen demoed and which we hope to integrate into OJS 3.5. Some details here: https://projects.tib.eu/komet/en/
Is inline mathematics allowed in either MATHML or LaTeX format?
In published articles, you can do this with e.g. https://www.mathjax.org/ or https://latex.js.org. Submission titles have limited formatting starting with OJS 3.4.0, but I’m not aware of a way we can include formulae in titles that would play well with downstream services.
disciplines are just a list of strings, which apparently ignores any existing structured hierarchy of taxonomies like the library of congress, NCBI, ACM, AMS, or those maintained by other disciplines.
This is explored in Support browsing by keyword or subject · Issue #4932 · pkp/pkp-lib · GitHub and a couple of related issues. Long story short, we haven’t been able to find a global vocabulary that we can just adopt wholesale: it needs to be well translated, openly licensed, and applicable to the community at large. That’s probably an impossible combination, so we would like to add better generalized support for swapping in vocabularies.
when creating a contributor, affiliations also has no structure. At least this is now an array of strings instead of a single string.
Yes, we’ve explored this in Need to support multiple author affiliations · Issue #7135 · pkp/pkp-lib · GitHub.
It may also be related to the fact that you’re trying to map everything to a PHP object, which then stores elements into a column in a relational database. Going forward, the complete mapping of fields to columns in tables of a relational database doesn’t scale well. I can only imagine what it must cost on average to fully populate a publication from the database, much less an issue with hundreds of articles.
I get the sense you’re coming at OJS from a different design culture – which is fine, and we have a lot to learn from other approaches. But there is a methodology here, and it’s a lot more nuanced than more columns vs. less columns (at the risk of oversimplifying that conversation). You might have to spend time engaging on details before some of it starts to fit together.
Regards,
Alec Smecher
Public Knowledge Project Team