Caching mechanism for the OJS front-end

Dear community,

I am writing on behalf of the Milano University Press to present and ask for comments about a caching mechanism for OJS and OMP that we are launching with our technology partner Archimede Informatica (forum registered users: @razzi, @paolo_pellegrino, @monmarzia), and which we hope will be of interest to all users and therefore become part of future releases of the applications.

The idea of working on caching stems from the fact that our multijournal OJS has recently faced some difficulties related to its size. We experienced a decline in performance due to known issues such as inefficiencies in the management of the database underlying the applications by the core code or some plugins, which we have therefore had to abandon. This is exacerbated by the growing traffic on websites generated by massive indexing by search engine bots, including for the purpose of training artificial intelligence.
This was also discussed in the recent call on OJS development on December 15, and there is no shortage of forum posts on the subject, such as this one.

We are aware that

The problem to be solved:

  • Multi-journal OJS instances with a high number of journals and articles (>30 journals) have long response times when serving a page, both in the front-end and backend

  • Response times increase significantly when using certain features: OJS search, Issue TOC page, and some external plugins (e.g., “Recommend Articles by Author”) or system plugins (“Usage Statistics”, which is essential to satisfy DOAS recommendation 6.3: https://zenodo.org/records/15147823 )

  • The slowness is caused by SQL queries executed on the database (querys on very huge tables and/or thousands of queries to serve a single page), and cannot be definitively resolved simply by improving DBMS performance through database query optimization. (Database query optimization does not conflict with our caching system; rather, it can contribute to further improvements in front-end response times, but above all to reducing back-end response times and server load.)

Objectives:

  • Drastically reduce front-end response times in multi-journal OJS instances with a high number of journals and articles (>30 journals).

  • The solution must work starting from OJS 3.5, while keeping 3.6 in mind, at least by beginning to understand how much and in what way it can be adapted for OJS 3.6

  • The solution should ideally be reusable in later OJS versions.

How to achieve this:

  • Implement a caching mechanism for the OJS front-end that dratsically reduce the number of SQL queries executed every time OJS serves a front-end page.

  • Caching should operate at the HTML page level (each HTML page served by OJS in the front-end).

  • Caching should be implemented at the server level, not at the browser level (although browser caching optimizations do not conflict with our caching system; rather, they can help further improve response times and reduce server load).

  • Possible approaches:

    • based on static HTML files generated by OJS on the server, similar to the feature already present in OJS 3.3, which can be enabled in the config.inc.php file by setting `web_cache = On`. The current feature has a serious drawback: OJS does not automatically update the static HTML files when the content in OJS changes. You can only configure the cache duration, and the cache is completely reset when it expires.
    • based on the use of the database, by using caching tables to store the static HTML of the frontend pages. These pages will-be served by OJS instead of re-creating it every time by querying the DB.
  • Regardless of the chosen approach (static files vs database), the solution must include a backend mechanism that automatically removes cached pages whenever content related to one or more front-end pages is updated. These pages should be re-cached as soon as a new front-end request for them arrives.

  • The caching mechanism must be compatible with OJS APIs and other APIs enabled through OJS plugins (e.g., OAI-PMH, PKP-PN, etc.).

  • The caching mechanism must be compatible with the OJS access log system used by the Usage Statistics plugin to generate statistical reports.

  • We believe solutions based on caching systems external to OJS (Varnish, Redis, nginx) are not well-suited for the following reasons:

    • high configuration and maintenance complexity (Installing and configuring Varnish, Redis, nginx)
    • difficulty managing cache invalidation toward external caching tools for pages whose content is modified in the OJS backend
    • incompatibility with OJS access log recording used by the Usage Statistics plugin

Best practices to look at:

Additional information

  • The solution must take into account the fact that query optimisation can solve some problems of performance; we are available to work in that direction too; however, we believe that an efficient caching system can play a key role for large-scale installations with a large number of journals, together with database optimisation like this
  • the solution should not add a layer of complexity and have to be usable for all the community, being at least a feature increasing frontend performance; this is one of the reason we think to start from TYPO3 CMS caching mechanism, in other words we want OJS/OMP frontend having the caching features of a robust website