Hardware requirements for more than 50 Journals

aquast · May 25, 2021, 10:56am

Dear all,

as we plan to set up an OJS-infrastructure for supporting a huge amount of journals, we wonder if there is some experience about running such an infrastructure.

As we like to set up virtual servers each with:

OS: current SLES
Apache 2.4
PHP 7.x with fcgi
4 CPU
8 GB RAM
50 GB System-Storage
NFS-mounted Storage for the journal data (DB on System-Storage)

Our questions are:

is this setting valid for our plans?
how many journals can be operated within in such an setting in one instance (aka virt. server)?
is there any experience with bottlenecks we should be aware of?

Thanks in advance for your answers Andres

marc · May 25, 2021, 4:37pm

It’s not clear to me what you are trying to do…

With a single OJS you can run multiple journals (multi-tenant) or you can install multiple-ojs with different OJSs (single-tenant).

If the HW you mention is for a single journal (OJS in single-tenant mode)… yes, this is enough (by far) for any normal journal. I mean, everything under 1milion visits each day, will be hold without trouble with this infrastructure.

BUT, if you like to save some time in upgrades, probably you will like to make a multi-tenant installation (aka. “single ojs for multiple journals”). In this case, we will need more info to help you setting the VM but with multi-tenant, even your resources will be better used than with separated single-tenant approach.

I’m a big defender of single-tenant configurations because it let the journals more flexibility and you can monitor better what is happening, but with multi-tenant upgrades and configurations are easier.

This is the first, and probably the hardest decision you need to take.

Cheers,
m.

aquast · May 25, 2021, 4:54pm

Hi Marc,

thanks for your reply. As a service provider and hoster we have the situation that we want to provide an Infrastructure for many journals without knowing whom of the customers will rely on a single tenant solution (with respect to privacy issues) and whom may prefer a multi-tenant solution (with respect to pricing).

IMHO running around 50 or 100 journals with one vServer for each journals sound a little bit like maintainance overkill regarding updates and security issues. Therefore we will prefer running a multi-tenant installation, but … as far as I know, this is not feasable for customers which rely on their on database.

Best Andres

abadan · May 25, 2021, 10:01pm

BUT, if you like to save some time in upgrades, probably you will like to make a multi-tenant installation (single ojs for multiple journals).

Despite the time gained from maintaining the OJS, the opposite happens on the users’ side: the downtime for making upgrades is increased. In this sense, it is a point against several journals in the same installation of OJS, as the upgrade process can be time consuming.

I also advocate single-tenant, even if it makes me have to deal with hundreds of OJS / OPS / OMP installations today.

It would be interesting alternatives to reduce this problem, such as:

aquast · May 26, 2021, 6:10am

I wonder if there is any description what we have to do if we like to run more than one OJS-instance on one virtual server: Is there existing experience with running such an architecture? Seems to me a a workaround for the privacy vs. efforts issue

Back to my initial questions:
Is there any experience how many journals can be hosted within one single instance (multi-tenant)?
What kind of bottlenecks we have to be aware in terms of HW?

Thank you and cheers, Andres

marc · May 26, 2021, 10:53am

The biggest OJS I know is RACO, that holds 540 journals (and growing). But this is kind of tricky because only 10% (max) of this journals are real journals (I mean, doing full workflow in OJS) and most of them are just a copy of external journals. At the end RACO is more a repository than a real journal management system.

With the HW you mention, I’m quite sure you won’t have any bottlenecks.

Thinking in big journals… let’s do some rough numbers:

Backend is usually managed by a group of 50 people max (editors, section editors, copyeditors, reviewers)… and, in case your journal becomes very popular, increase this with 150 authors… (but it won’t be a problem for 8GB and 4CPUs).
Critical part with big journals would be the frontend… But ojs includes a caching system out-of-the-box that will let you deal with 500.000 visits daily without any trouble.
About disk usage: As always it depends of the journals. If it comes with a lot of old numbers and big PDFs full of images you can reach 50GB but will be very strange. My biggest journal is 30GB and it includes galleys and all the documents submitted during the workflow (original, copyedited…). Regular journal is around 5-10GB.
Most of the installations run over apache, but you can use nginx if you like. You can use any php flavor (php, php-cgi, php-fpm), but for ojs 3.x it need to be, at least, php 7.3 (and I recommend start thinking in php 7.4).

If you force me to tell where the problems would come (if any) I will point NFS… but I think it won’t happen any more with OJS 3.x… I mean, in old OJS versions it used to be DB-intensive but it was fixed in new 3.x branches.

So… I think you won’t have any problem with VM like this or, if you need it, you can reduce it a little based on the size of each journal:

For big journals: 4 CPU + 4GB ram + 50GB disk
For regular journals: 2 CPU + 2GB ram + 30GB disk
For small journals: 1 CPU + 1GB ram + 10GB disk

As an example, I rebuild all our infrastructure with single-tenant ojs (based on docker) and I didn’t made any stress report (yet) but our biggest journals don’t ask for more than 2CPUs and don’t take more than 1GB ram.

About multi vs single-tenant, I like your “downtime” argument… that is also in my list, but I always forget. I will open a new post with a wiki page to create a table with the pro/cons of both approaches. Feel free to modify to enrich the conversation.

Cheers,
m.

marc · May 26, 2021, 1:42pm

I missed to talk about this, that btw, is a very common and reasonable concern.

In short: Right now you are right… but there is a promising alternative.

The maintenance of 50-100 ojs will be an overkilling task… and, been the sysadmin of a 50 single-tenant journal service for almost 10 years, I’m very interested in finding a way to make this job easier.

Long time ago, “how to deal with multiple (independent) installations” was the main problem I tried to resolve with “mojo” and worked fine to me… but then I found docker and I think this will give us better isolation and control, so this will be my bet now.

Four years ago I though it will be an easy task so I started creating a very basic Dockerfile for ojs and I developed “dojo” (aka. docker4ojs. Dojo was a bash script to manage those ojs-containers, but soon I realized the dockerfile was not good enough for a whole community so the dojo part need to wait till pkp got the right images.

“Good images” need to be optimized, lightweight and (most important) fit well with OJS releases. If we got images for every ojs release, maintenance, upgrade (and testing) will be much easier… but won’t be enough. To manage 50-100 journals, you would also like to download/install/upgrade plugins from the command-line, enable/disable them and also, configure OJS and play with the settings of plugins from this same command-line.

Yes, I know I’m just pointing a “work in progress” solution (not much hands seam interested in ojs docker-images for production right now), but I think we have a solid basement to work over.

If somebody is interested in working together in this, I’m very open to collaborate.

Cheers,
m.

aquast · May 26, 2021, 4:00pm

Dear Marc,

thank you very much for your comprehensive answers. It helps me very much to have a much more better idea for our project. I will have a look to the linked information e.g. RACO too.

For me a docker image is not first choice, due to the dependencies associated with this. Nevertheless it could be a solution. My preference would be setting up multiple journals within one instance without having a shared database…

Cheers, Andres

marc · July 2, 2021, 11:22am

Today I remember I promised a wiki page to define pros/cons of each approach.

Feel free to modify to complete:

Single vs Multi tenant BENEFITS lists:

	single-tenant
1.	independent upgrades (faster & less issues)
2.	segmented track/monitor
3.	independent plugins (journals with different needs)
4.	independent users (smaller lists, better for privacy/GDPR)
5.	independent tools (more resilent to fails/attacks)
6.	smaller DBs (faster)
7.	smaller “down time” per journal when upgrade
8.	more flexibility (add/remove/change/upgrade only 1 journal)
9.	easier multi-domain (easy for mailing, certificates, redirections…)

	multi-tenant
1.	“portal” feature out of the box
2.	Searcher feature including all journals
2.	less work when upgrade (single upgrade)
3.	most usual case (better tested, easier to find help)
4.	centralized administration (plugins and users management)

DISCLAIMER: For newcomers that arrive today to this thread, more items don’t necessarily mean better. Some items could be essential for your organization so you don’t have a choice.

abadan · July 2, 2021, 8:05pm

Nice, Marc!

I have a little doubt just about this topic:

most usual case (better tested, easier to find help)

In our experience, with hundreds of journals hosted on OJS, both scenarios are very popular.

Some institutions believe that a multi-tenant OJS could bring greater visibility (eg, highlight on search engines like google).

marc · July 4, 2021, 3:11pm

Some institutions believe that a multi-tenant OJS could bring greater visibility (eg, highlight on search engines like google).

I don’t have evidences, but I think this is not necessarily true.
You can build your “portal” site over other tools with better seo than OJS so you will get more visibility.
I mean, Google won’t mind if they are single or multi-tenant… will be more concerned about the domainname or the semantics of your site… but with Google, who knows?