JATS XML to embedded HTML article

asmecher · May 4, 2017, 4:46pm

Hi all,

With the usual apologies for the late response…

I think that PKP has plans to manage references with the aid of JATS XML. So saving reference data to OJS database sounds like a good idea. This is data that could be useful in for example CrossRef services. Maybe @asmecher could comment this as well? At the moment references are saved as a simple text field.

I don’t have a concrete plan yet for how to mange metadata when it may appear both in the database (our usual metadata set) and in the document (e.g. JATS XML), but here are some off-the-cuff thoughts: I don’t like synchronizing data, so would rather choose an “official” source and stick with it. In the case of JATS XML, I think that document would be the best “official” source of information, so we would likely add tools to the Submission DAO (etc.) that would override database-provided metadata with values fetched from the more authoritative JATS XML. (Some caching may be necessary here to avoid unnecessarily expensive XML processing.)

@varshilmehta wrote:

Have you an example how to add data from PHP objects to the OJS Database? I know (in theory) how to add data in mysql database with PHP, but I think OJS (Smarty) is a special case. If you can give me a hint I would test it on my local OJS installation (after JATS parser will be ready).

There are a number of ways we could approach this – can you give me a more concrete example?

Regards,
Alec Smecher
Public Knowledge Project Team

Vitaliy · May 4, 2017, 6:44pm

Hi @asmecher,
Suppose we have transferred all the data from JATS XML into tree-like structured PHP objects. The concept is similar to Java Architecture for XML Binding. And inserted it in the OJS as a plugin. ,

Reference list will be represented with an ArrayObject ($references). Each reference is an instance of my custom class inside ArrayObject (JournalArticle, Book, BookChapter etc.). Data from XML are simply recorded to that instances with setters and can be retrieved with getters (e.g. getTitle(), getSource() etc.). Author names will be set as an another ArrayObject inside classes that represents a single reference. So retrieving data from them is a simple iteration with foreach loops:

foreach ($references as $reference) {
  if(get_class($reference) == "JournalArticle") {
    $articleTitle = $reference->getTitle();
    ....
  } elseif (get_class($reference) == "ArrayObject") {
    foreach ($reference as $authorName) {
      $name = $authorName->getName();
       ....
    }
  }
}

So what I need is to put the data inside OJS mysql table while iterating through this loops.
Edit: As I think about this more, we will need to call article specific id from the plugin. Or other way to check what article we are processing.

Vitaliy · May 4, 2017, 7:19pm

My idea in creating PHP objects from XML (and not applying simlpe XSLT transformation) is based on the assumption that we can use this data later. Display them on article detail page, export in different formats etc.

Vitaliy · May 8, 2017, 10:26am

Hmm. As I see none of the plugins interacts with database directly. So there is a need to make custom classes in pkp-lib classes folder, where all my objects and objects DAO will be stored? And plugin will only retrieve data with written there methods and display it on the front-end, @asmecher right?

If so, I need something simpler for start. Think I will stop on manipulating on lower level. Just need to keep in mind that we will have all of the JATS XML article and its content data as PHP objects (as POJOs in Java language)

ajnyga · May 8, 2017, 10:36am

Hi,

Making a custom DAO for a plugin is not a problem. But I do not know what would be a good way to store objects to the current database. I mean there is the submission_settings table where you can add custom values, but that is maybe not enough. I have made a few plugins which create custom database tables on interact with those, for example GitHub - ajnyga/navigation: Navigation plugin for OJS 3.x.

But I think that if this would be added to the core functionality, then there is the big principal decision whether we want to store full text in any format to the database. Before now, all full text versions have been in files.

If we save a JATS XML file to the database, it will mean that we are saving a lot of metadata twice. Unless the saved version only includes the body and back parts of the file, which would not be a bad idea at all. That would include both the readable full text and the references.

Vitaliy · May 8, 2017, 11:13am

As we already have all meta-data, I didn’t write parser for them. Only body and back. As an example of what will be stored in the journal article reference object: https://github.com/Vitaliy-1/JATSParser/blob/master/classes/BibitemJournal.php
As we will have different classes for different types of citations, them can be displayed at front-end with any citation style.

Also I am planning to finish the work with parser on this week. Then I need to finish my PhD dissertation for month or two , after this can take a look at the DAO implementation.

ajnyga · May 8, 2017, 11:19am

hah, I am actually also returning to work with my phd starting in July

I think that the parset needs a fair amount of work to fit a OJS plugin structure, but nothing impossible. I do think that saving full text this way is generally a good idea. Like to hear what Alec thinks about this, because I bet they have discussed this before is some form.

asmecher · May 8, 2017, 7:37pm

Hi all,

On storing data in the database: for simpler data the …_settings tables are a good approach, and there are already plugins that use this. And plugins can create their own schema too – none of the OJS 3.x plugins do this yet but several of the OJS 2.x ones did, and the same facilities exist. This is a bit cumbersome to use and maintain, though, so if the …_settings tables are close to fulfilling your requirements, they’re preferable. (They should also be quite fast to use when e.g. loading data to present on the front end alongside the submission. Some settings are serialized using PHP’s serialize/unserialize tools, but of course it’s hard to interact with that data relationally.) But I’m not sure this adequately answers the question…?

Regards,
Alec Smecher
Public Knowledge Project Team

Vitaliy · May 9, 2017, 8:34am

Hi @asmecher,

Nope, structure of ..._setings tables is not suitable for putting references there. Is there an example somewhere on how to create a new schema and putt value there from a plugin? Links on such plugins for OJS 2.x? I have less then a year experience in programming and several weeks in PHP so any help is appreciable.

ajnyga · May 9, 2017, 8:36am

See for example the static pages plugin: ojs/schema.xml at ojs-stable-2_4_8 · pkp/ojs · GitHub and ojs/StaticPagesPlugin.inc.php at ojs-stable-2_4_8 · pkp/ojs · GitHub

Vitaliy · May 9, 2017, 12:39pm

Thanks.

But before playing around with POJOs and Database, first I want to embed result html to the article detail page to see how it works. Here I have a variable (object) $html: https://github.com/Vitaliy-1/JATSParser/blob/master/main/main.php#L33, which represents my result HTML DOM. Can you explain me what I need to change here: embedGalley/EmbedGalleyPlugin.inc.php at master · ajnyga/embedGalley · GitHub? As I see this function: embedGalley/EmbedGalleyPlugin.inc.php at master · ajnyga/embedGalley · GitHub is crucial, where you evoke all the transformation tasks and embed into articleFooter.tpl, right?

Vitaliy · May 9, 2017, 7:11pm

So, JATS Parser is ready. Now comes the part with integration with OJS.
That’s how html is look like: https://vitaliy-1.github.io/JATSParser/test.html
But it will need to adapt to the OJS article detail page.

ajnyga · May 9, 2017, 7:18pm

I could try to look at this later this week. I am just very busy right now because this is my last month working with journal.fi and a lot of loose ends I have to finish.

ajnyga · May 13, 2017, 6:39am

Hi @asmecher,

When you have the time, would you mind checking GitHub - ajnyga/embedGalley: OJS3 plugin for visualizing JATS XML galleys again?

It should have all the things fixed you commented on. I will still add a check for multilingual XML files and check that it is the Article Text component that is used. I will also see if I could include this: Improve Google Scholar exposure with reference metatags!

What I have been thinking about is how does the embedGalley approach work with the OJS statistics framework? Basically “real” article views are counted based on the galley downloads/views. How would the embedGalley fit with the COUNTER rules? Maybe @ctgraham and/or @bozana could comment on this as well?

varshilmehta · May 13, 2017, 7:36am

When i used the latest version, it kind off over rides my .css file settings. I had uploaded a .css file to justufy my abstracts. However, now I have to do manually by justifying the format for each in the abstract section of the Ojs. The previous version was proper.

Vitaliy · May 13, 2017, 11:11am

Hi @ajnyga,

Just wondering, why you didn’t use Templates::Article::Main hook for this plugin?

ajnyga · May 13, 2017, 12:25pm

I guess you could use that as well, I think it would mean that the article would be in a more narrow space (with the entry_details div to the right) but that is of course a matter of opinion.

ctgraham · May 15, 2017, 12:22pm

The intention of the COUNTER JR1 and AR1 reports is to count each distinct time a user downloads or views the article’s full text content. This download or view is recorded in the “ojs::counter” metric. Currently COUNTER JR1 and AR1 distinguish between downloads/views of PDF vs. HTML vs. Other forms of the article fulltext.

If the JATS XML galley represents the articles fulltext, and if this plugin converts that for display in HTML for an end-user, this would count as an HTML download from a COUNTER JR1 and AR1 perspective, subject to the COUNTER exclusions of “double downloads”, bot-based traffic, etc.

ajnyga · May 15, 2017, 12:30pm

Thanks a lot @ctgraham, I was counting on hearing your comment on this!

Do you think that it would be a problem if the download count would be the same as the article abstract view count? Because embedGalley basically shows the full text HTML on the abstract page, not in a separate galley view.

I think that it would be fairly easy trigger the download count from the plugin, so that should not be a problem.

ctgraham · May 15, 2017, 12:45pm

I don’t think COUNTER Release 4 is concerned with abstract-only views, and I don’t think that is in the proposed Release 5, either. I think Release 5 will remove the distinction between type of download (HTML, PDF, Other).

Perhaps the bigger question would be whether or how to prevent duplicate counting of viewing the fulltext in HTML and then downloading the fulltext XML. Almost certainly that should count as a COUNTER “double click” once the distinction of “type” is removed in Release 5.

Details on COUNTER Release 4 data processing are here.