Describe the issue or problem
Calls to the v1 submission API generates thousands of database queries, with 40-90% of queries being duplicates. Adding memcached does not change the situation.
Steps I took leading up to the issue
- Go to into the databoard of a journal, being a URL like https://ojs.example.com/ABCD/submissions. Particularly look at requests like https://ojs.example.com/ABCD/api/v1/_submissions?…
What application are you using?
OJS 3.3.0.11
Additional information
Sample
For some journals, it take tens of seconds for the submissions page to load as up to 15 000 queries are sent to the database. Typically only 40-50% of the database queries are unique. In extreme cases, only 10% of database queries are unique.
To illustrate here are the number of database queries generated by different calls to v1 submissions API. They are all requests for 30 submissions at most. They are for different journals, status and offsets.
+---------------+----------------+
| Total queries | Unique queries |
+---------------+----------------+
| 15229 | 1512 |
| 14795 | 1557 |
| 13767 | 1359 |
| 13017 | 1404 |
| 7261 | 1628 |
| 7204 | 1170 |
| 6814 | 1330 |
| 1808 | 1126 |
| 902 | 569 |
+---------------+----------------+
Correlation
The total number of submissions doesn’t seem to matter.
The number of queries and the amount of duplication seems to be related to how many authors there are. When there are few to no authors on the submissions, the request completes faster with fewer database queries. It is requests where the submission has many authors that are taking a long time to complete and generate excessive database queries. The last two examples above are from a test journal, whose submissions rarely have any authors at all.
Detail of the top example
Taking the top example from above, I can see that most of the queries are for user groups settings, controlled vocabulary and stage assignments.
+---------------------------------+---------------+----------------+
| Table | Total queries | Unique queries |
+---------------------------------+---------------+----------------+
| user_group_settings | 2349 | 46 |
| controlled_vocab_entries | 1950 | 150 |
| controlled_vocabs | 1950 | 150 |
| controlled_vocab_entry_settings | 1564 | 122 |
| stage_assignments | 1440 | 300 |
| author_settings | 756 | 57 |
| submissions | 753 | 63 |
| user_settings | 606 | 8 |
| categories | 390 | 30 |
| publication_galleys | 390 | 30 |
| submission_settings | 390 | 30 |
| publications | 390 | 30 |
| user_groups | 360 | 2 |
| users | 276 | 8 |
| (other) | 1665 | 486 |
+---------------------------------+---------------+----------------+
| Grand Total | 15229 | 1512 |
+---------------------------------+---------------+----------------+
Continuing with the first example, these were the top 15 queries. These are from a single request to the v1 submissions API.
+---------------------------------------------------------------+-------+
| Query | Count |
+---------------------------------------------------------------+-------+
| SELECT * FROM user_settings WHERE user_id = 10351 | 405 |
| SELECT ug.* FROM user_groups ug JOIN user_group_stage ugs […] | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 3 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 5 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 6 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 16 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 9 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 14 | 330 |
| SELECT * FROM user_group_settings WHERE user_group_id = 15 | 330 |
| SELECT * FROM users WHERE user_id = 10351 | 75 |
| SELECT * FROM users WHERE user_id = 1730 | 65 |
| SELECT * FROM user_settings WHERE user_id = 1730 | 65 |
| SELECT * FROM users WHERE user_id = 19850 | 55 |
| SELECT * FROM user_settings WHERE user_id = 19850 | 55 |
+---------------------------------------------------------------+-------+
Memcached
After adding memcached, the numbers didn’t change. I already had the file object cache enabled. The only thing I saw going into memcached was some basic data about what was the current issue for a journal. I didn’t see any issue or submission data going into memcached from accessing either front-end or back-end pages.