Upgrade 3.0.2.0 -> 3.2.1-1 / Duplicate entry 'XXXX' for key 'controlled_vocabs.controlled_vocab_symbolic'

i’m upgrading 3.0.2.0 -> 3.2.1-1

From a fresh copy of the database, cleared caches and pure 3.2.1-1 code, the upgrade runs for about 9 minutes before hitting a problem. Turning on the DB debugging log I’m seeing:

UPDATE controlled_vocabs AS cv, submissions AS s
SET cv.assoc_type = 1048588, cv.assoc_id = s.current_publication_id
WHERE cv.assoc_type = 1048585 AND s.submission_id = cv.assoc_id

Then the error:

PKP-Database-Logger 1596630085.9521: Query:
UPDATE controlled_vocabs AS cv,
submissions AS s
SET cv.assoc_type = 1048588, cv.assoc_id = s.current_publication_id
WHERE cv.assoc_type = 1048585 AND s.submission_id = cv.assoc_id
failed. Duplicate entry ‘submissionSubject-1048588-1510’ for key ‘contr olled_vocabs.controlled_vocab_symbolic’
PKP-Database-Logger 1596630085.9521: 1062: Duplicate entry ‘submissionSubject-1048588-1510’ for key ‘controlled_vocabs.controlled_vocab_symbolic’
ADOConnection._Execute( UPDATE controlled_vocabs AS cv, submissions AS s SET cv.assoc_type = 1048588, cv.assoc_id = s.current_publication_id …, false)% line 1241, file: /var/www/ojs/lib/pkp/lib/vendor/adodb/adodb-php/adodb.inc.php
ADOConnection.Execute( UPDATE controlled_vocabs AS cv, submissions AS s SET cv.assoc_type = 1048588, cv.assoc_id = s.current_publication_id …)% line 442, file : /var/www/ojs/lib/pkp/classes/install/Installer.inc.php
Installer.executeSQL( UPDATE controlled_vocabs AS cv, submissions AS s SET cv.assoc_type = 1048588, cv.assoc_id = s.current_publication_id …)% line 437, file : /var/www/ojs/lib/pkp/classes/install/Installer.inc.php
Installer.executeSQL(Array[30])% line 395, file: /var/www/ojs/lib/pkp/classes/install/Installer.inc.php
Installer.executeAction(Array[3])% line 263, file: /var/www/ojs/lib/pkp/classes/install/Installer.inc.php
ERROR: Upgrade failed: DB: Duplicate entry ‘submissionSubject-1048588-1510’ for key ‘controlled_vocabs.controlled_vocab_symbolic’

That error doesn’t appear to feature in any of the other bug reports.

The database has a history of character encoding misadventures, with content ingested via the web UI and commandline using different encodings. We had previously thought that these were fixed, but it is possible that there may be lingering issues. The most likely issue is two identical string values in different rows where only a single one is expected.

Any ideas?

After the following SQL queries the upgrade ran to completion and the resulting system appeared to function (not tested extensively yet).

DROP TABLE submission_galley_settings;
DELETE FROM controlled_vocabs;
DELETE FROM controlled_vocab_entries;
DELETE FROM controlled_vocab_entry_settings;

cheers
stuart

Hi @stuart.yeates,

I agree with your diagnosis of the likely cause (misadventure leading to duplicate entries). Your approach will get the upgrade to complete, but it’ll remove all metadata that exists in “controlled vocabulary” fields – keywords, subjects, etc. I would recommend reviewing the submission subject metadata entries for the submission identified by:

SELECT submission_id FROM submissions WHERE current_publication_id=1510;

As the metadata won’t be present in the upgraded installation, you’ll have to take this submission ID (once identified) and review its metadata in the pre-upgrade database.

Regards,
Alec Smecher
Public Knowledge Project Team

Thank you Alec

I ended up not taking your advice of reviewing the metadata on a case-by-case basis, because there turned out to be quite a few hits and the data was normalised to opacity.

I got the migration to complete by dropping the index that was causing the apparent issue:

diff -r ojs/lib/pkp/xml/schema/controlledVocab.xml build/ojs-3.2.1-1/lib/pkp/xml/schema/controlledVocab.xml
36a37,42

          <index name="controlled_vocab_symbolic"> 
                  <col>symbolic</col> 
                  <col>assoc_type</col> 
                  <col>assoc_id</col> 
                  <UNIQUE /> 
          </index> 

However, article metadata are not appearing in the web front end and when added they’re not saving despite a green “Metadata has been saved” popup box. This is only an issue with the ‘keyword’ style metadata such as ‘keywords’ ‘subjects’ and ‘disciplines’; ‘type’ metadata just below them on the same screen seems to save and display just fine.

There are no obvious errors in the browser console or in the httpd logs.

Hi @stuart.yeates,

I don’t recommend removing that index – it’ll get you past your current roadblock but only by sweeping some pretty large data gremlins under the rug.

I wonder if you aren’t encountering issue 6213; could you try applying that before re-running the upgrade process?

Regards,
Alec Smecher
Public Knowledge Project Team

Applying the patches for issue 6213 appear to have fixed most of the issues. In particular the migration now progresses without complaining and keywords can be assigned and saved.

However, keywords which had been assigned prior to the upgrade appear to have been lost. I’m not sure how much of a priority this is, since the keywords don’t appear to be displayed anywhere obvious. There is a possibly relavent error in the logs (it’s the only error we’re seeing in the logs):

[Mon Sep 14 14:48:01.935066 2020] [php7:warn] [pid 26405] [client 10.30.40.114:53680] PHP Warning: in_array() expects parameter 2 to be array, null given in /var/www/ojs/lib/pkp/pages/workflow/PKPWorkflowHandler.inc.php on line 373, referer: https://{{hostname}}/test/manageIssues

Any hints about this @asmecher?