Filter With Duplicated Keywords simply filters the images with duplicated keywords. Its implementation checks both the list of flat keywords stored inside XMP (dc:subject) and the keyword list inside IPTC, but my testing shows that the list of IPTC keywords is not currently (build 2066) correctly retrieved by PSU. (The returned list always consists of only one keyword - see ticket #2837 in Mantis). Duplicated keywords inside XMP seem to be correctly identified.
Remove Duplicated Keywords removes all redundant (duplicated) flat keywords from XMP (dc:subject), for a selection of images (possibly filtered by the first script). Please note that:
- the list of hierarchical keywords and of ICS tags is not checked or touched in any way, since redundancy within the flat keyword list does not typically involve redundancy at the hierarchical level. (That is, keywords and labels with identical names can be usually distinguished by their hierarchical info.)
- the list of IPTC keywords is not currently checked or touched - but I am open to adding this once bug #2387 gets fixed. (Such an enhancement would be especially useful if the implemented solution for the issue of repeatedly added IPTC keywords (described here) does not somehow fix the metadata of images already affected by this bug.)
- this script implements (on demand) the optimization requested in ticket #2252, but the list of assigned labels is not checked or touched. Consequently, Hert's comments regarding #2252 still apply:
In particular, please note that if you run the script but maintain the assignment of labels with identical names, then you may still end up with duplicated keywords when the metadata of the affected image(s) is going to be saved (or write-synced) again. If you are willing to eliminate the root cause of duplicated labels, then you might want to use the filter script to identify all affected images and then modify or revoke the duplicated labels as appropriate. (I could probably write another script to automatically tinker with duplicated labels - either by renaming some labels or by revoking some label assignments - but I'm fairly skeptical of a "one policy fits all" there.)"Eliminating duplicates is a sub optimisation. [...] Duplicate catalog label names typically indicate that something should change in the catalog structure."
I hope some of you will find one or both of the scripts helpful.