Comparison of the results in the statistical tools of the journals

Hi,

I like to share here some thoughts and experiences related with journal’s statistics to get your comments and debate to find a better (maybe common?) approach.

During last year I have been using 3 different tools (Plausible, Google Analytics and OJS) to get our articles’ statistics and I noticed all of them capture different information.

TL;DR;

  • Except for 3 anomalies, graphics show similar waves BUT…
    • All statistics get different data (probably because they count different things?)
    • Plausible counts 3 times more visitors that Analytics (not sure why).
    • OJS counts 3 times more pagesviews than Analytics (and don’t make any sense).
    • We got 3 wired spikes in OJS (discussed in Observations).

So, let me go step by step to think together why it happens, if all this make sense and maybe, help me discovering I’m doing something wrong.

Why this post?

To newcomers, let me clarify that article’s visits or pdf download metrics are not metrics to define the quality of an article/journal (because don’t say much about what happened to the article after the visit/download) but could work quite good as an indicator of visibility of the journal (and the article).

I have been playing with multiple tools to be sure I was counting correctly and with this post I also like to know if I’m misunderstanding the metrics or something is wrong in our installations or if any of OJS/Plausible/Analytics are not counting properly… because I suspect in this fourm there will be fellows more experience than us and they will like to share their knowledge.

So, let’s start. :wink:

I took some screenshots of one random journal of our service with the 3 different tools I mention: Plausible, GoogleAnalytics 3 and OJS 3.2.

I thought in extending the comparative with Matomo, goAcess, AWstats… but right now those were the ones that I get more handy.

Results are…

Comparative 1: Unique Visitors

Plausible (Unique Visitors) VS Google Analytics (Users)

15/03 19/03 24/03 29/3 03/04 05/04 12/04
Plausible (Visitors) 543 287 377 549 395 355 441
Analytics (Users) 120 57 84 126 91 84 101

Comparative 2

OJS (Summary page) VS Google Analytics (Pageviews)

15/03 19/03 24/03 29/3 03/04 05/04 12/04
anomaly regular anomaly regular anomaly regular regular
OJS abstracts 2580 908 2503 874 2427 625 926
Analytics (Pageviews) 332 251 206 331 271 234 249

Observations

For the reasons exposed, I’m not worried (yet) about getting exact same numbers… and I’m more interested in been sure tools are always consistent (counting “whatever” but always in same way… to let us observe the tendencies) and, except for 3 anomalies (commented below), graphics show similar waves, so IMO it indicates consistency BUT some numbers don’t make much sense to me.

1. Three spikes in OJS only.

We got 3 wired spikes in OJS that are not shown in any other tool.
I have 2 theories about those peeks that are:

a) We have been crawled
Not sure what OJS is doing with spiders but if they are not filtered, it peeks could be a crawler indexing and visiting all articles of the journal. Analytics filters the crawlers and Plausible show unique visitors, so it will count them as one single visit.

b) Our installation have some trouble
Not sure how but I can imagine an scenario where I have some kind of cron/schedule misconfiguration and data is processed multiple times.

If somebody can confirm same peeks, I will open a FR to “extend OJS to ignore crawlers” (If somebody is maintaining somewhere a list of IPs so implementation looks feasible).

2. All statistics get different results

This happens because all them count different things and in a different way.
Plausible counts UNIQUE visitors, that (I think) could be compared to Analytics Users.
In the other hand, OJS counts visits to article’s summary (probably @bozana can clarify this) that… could be compared to GA-Pageviews?

3. Plausible counts 4-5 times more visitors that Analytics (not sure why).

15/03 19/03 24/03 29/3 03/04 05/04 12/04
Plausible (Visitors) 543 287 377 549 395 355 441
Analytics (Users) 120 57 84 126 91 84 101
Ratio (P/A) 4,53 5,04 4,49 4,36 4,34 4,23 4,37

In confidence, I have no clue about why it happens.
I confirmed both (plausible and ga) scripts are loaded in all OJS pages (home, articles, pdfs, announcements…) and they are fine.

Any help about how to clarify this is welcome.
I was thinking in make some tests over a controlled environment (only visited by us) or install one or two new tools (Matomo, goAccess…) to compare them all and discover who is lying.

4. OJS counts 2-3 times more pagesviews than Analytics

This case is even more wired… because OJS is only counting summary pages and GA is counting every pageview, so GA should be bigger than OJS counting, not the opposite.

15/03 19/03 24/03 29/3 03/04 05/04 12/04
anomaly regular anomaly regular anomaly regular regular
OJS abstracts 2580 908 2503 874 2427 625 926
Analytics (Pageviews) 332 251 206 331 271 234 249
Ratio (O/A) 7,77 3,62 12,15 2,64 8,96 2,67 3,72

If we ignore the spikes (commented in 1) OJS is counting 2-3 times more pageviews than Analytics.

The only theory I have here is users are visiting this journal with anti-tracking tools so it will explain why OJS (even Plausible) are getting more visits than Analytics.

Questions

So, final questions are:

  • Why OJS is counting more than Analytics?
  • Why Plausible is counting more than Analytics?
  • Why are we getting those spikes?
  • Could we define a method to “standarize” the way we take statistics from our journals?
  • Which tools is more reliable?

I look forward to your comments.

Thanks for your time,
m.

@marc,

I’d say the main difference is that OJS counts server requests, so even though we have a list of bots/agents to ignore, we’re more likely to be counting “invalid” requests than 3rd-party tools, which are based on client-side requests. Here I’m assuming that very few bots are doing requests through a headless browser (e.g. https://pptr.dev) :grin:

Best,
Jonas Raoni

1 Like

Thanks Jonas!

If PKP is interested in this, I can dig into the logs… although probably will be easier to install AWStats or goAccess (that are log analyzers in server side) to discover why I’m getting those peeks.

But you made me recall that Bozana explained that OJS could be counting double because of the alias. For instance, this redirection:

@bozana am I right?

Anyway, it could explain double counting… but not three times.

I’m sure I’m not the first with this concerns.

@ctgraham, @ajnyga do you have any experience to share?

Hi @marc, and others :slight_smile:

Oh, always so challenging things to solve/figure out :sweat_smile:
I do not know 100% how Plausible and Analytics count, so let me later take a deeper look at that and lets analyze/take a look at your OJS now.
Generally:
We do consider a bot list (s. lib/pkp/lib/counterBots/generated/COUNTER_Robots_list.txt).
We differentiate between abstract page and file views.
We count after COUNTER rules – in that OJS version this means we do consider double clicks.
Thus, I would say the OJS numbers should not be that much higher in general.

The first thing that comes to my mind for this specific case is that you maybe use/have used both, internal and apache log files – I remember one issue from you, with the apache log files (the aliases above), where we realized that you actually also have the internal OJS log files. Only one, either OJS or apache log files should be used.
Thus, maybe to first double check your DB table metrics: What do you see as load_id there, e.g. with SELECT DISTINCT load_id FROM metrics ?
And lets take a look at those specific dates from above: What do you get as result for this query: SELECT DISTINCT load_id FROM metrics WHERE day = '20230315' ?

OK, lets first see these – maybe they will already tell us something more – before we think about/try other things…

Thanks!
Bozana

1 Like

Oh, always so challenging things to solve/figure out.

But at the same time, very interesting, don’t you think. :wink:

We do consider a bot list (s. lib/pkp/lib/counterBots/generated/COUNTER_Robots_list.txt).
We differentiate between abstract page and file views.
We count after COUNTER rules - in that OJS version this means we do consider double clicks.

Didn’t know we were filtering bots. Thanks to clarify.

The first thing that comes to my mind for this specific case is that you maybe use/have used both, internal and apache log files - I remember one issue from you, with the apache log files (the aliases above), where we realized that you actually also have the internal OJS log files. Only one, either OJS or apache log files should be used.

Yep. Possibly in this thread, isn’t it?

Thus, maybe to first double check your DB table metrics:
What do you see as load_id there, e.g. with SELECT DISTINCT load_id FROM metrics ?

Of course. It returns all usarge_events ids since 2015 (a total of 2,766 rows), and two extra files with counter and ojsViews data from 2.4.2.

In case it’s useful, I’m passing you the full list in a pastebin: https://pastebin.com/vRaddGUa

And lets take a look at those specific dates from above: What do you get as result for this query: SELECT DISTINCT load_id FROM metrics WHERE day = ‘20230315’ ?

This query only returns the log for 3/15/2023:

imagen

OK, lets first see these - maybe they will already tell us something more - before we think about/try other things…

Great. Thanks for everything, Bozana.
(But no hurry with this, please… now the priority is 3.4 and I suspect you have your hands full).

Take care,
m.

This weekend I got some time and I decided to try Matomo (that is great but very heavy), but with the “log analyzer” version (no more JS in client). For the record, it took two days to analyze 3GB of logs.

I got no time for a detailed comparison, but let’s take a quick look:

Matomo vs Plausible (Visitors)


plausibleVSmatomo

Although numbers are not exactly the same, data range is quite close (and it make sense because log statistics will be always more accurate -and higher- than a JS)… so looks like Plausible is counting surprisingly fine for been a JS counter.

Matomo vs OJS (PageViews)


OJSvsMatomo

As far as Matomo also let pageviews, is useful to compare with OJS and once again, the “skyline” is similar. The difference here is in the range, but also make sense to me becuase OJS is only counting pageviews in the article’s summary and Matomo counts everything under “/articles” path.

I feel everybody is counting as expected (except for googleAnalytics).

Taking advantage of the fact that Matomo allows us to see the data in detail I visit the table for day 15/03 (one of the 3 peaks) and I see the following:

So looks like Matomo is counting downloads and also everyurl under article (like /article/view/v48-lozares/pdf-es) and that explains the bigger numbers.

At this point (without need of a detailed comparative… although I would appreciate finding the time to do it) I feel confident enough to tell OJS is counting at least, as well as Matomo is counting.

What happens in the three peeks?

Going deeper with matomo I found the following:

That show an issue with a slash so my first theory now with this peeks (till Bozana suggest a different theory) is I probably set wrong rewrite rules and logs with extra redirections that generate fake visits.

I other words… peeks happen because of a misconfiguration due mod_rewrite rules.

Does it make sense?

Hi @marc,

OK, it seems only OJS internal log files are considered in DB table metrics, which is good :slight_smile:
So maybe next to see the log file from 15.03.2023, what everything is in there…

Best,
Bozana

1 Like