Duplicates, Similar and a NOT filter

Post Reply
vkfoto
Posts: 94
Joined: 19 Oct 16 3:51

Duplicates, Similar and a NOT filter

Post by vkfoto » 29 Oct 16 16:46

Thanks to the Catalog states, I have found several thousand duplicate and similar images. Have already removed hundreds of duplicates.

If I understand it correctly, from what I see, duplicate is the exact same image stored in multiple locations. What are the specific criteria: filename, ext, size, date, contents?

As I look at all the duplicates, I'm seeing a trend. Back in the days before a photo catalog, I was cataloging but instead of software, I was creating subfolders by subject or by use.It was wasteful of space but it was a fast way to organize images into manageable groups. Looks like a lot of work ahead but a great way to cull the detritus form years past.


How is similarity determined?
As I go through and validate these, some are supposed to be there. They represent RAW+JPG pairs or different size versions i.e. large and small JPGs, etc.
I would like to mark them so that they don't appear in the Duplicate and Similar catalog states any more. Is that possible?
Meanwhile I thought of applying a color label to the ones that are OK, I picked green. Looking in the Advance Filter guide, I didn't see a way to do a negative filter: i.e NOT green. That way, all without a color label or any color except green would still show.

jstartin
Posts: 428
Joined: 23 Aug 06 13:47
Location: UK

Re: Duplicates, Similar and a NOT filter

Post by jstartin » 29 Oct 16 21:42

Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)

andrew.heard
Posts: 166
Joined: 16 Jun 10 0:36

Re: Duplicates, Similar and a NOT filter

Post by andrew.heard » 29 Oct 16 22:37

jstartin wrote:Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439
This link displays webapge with message "You are not authorised to read this forum."

vkfoto
Posts: 94
Joined: 19 Oct 16 3:51

Re: Duplicates, Similar and a NOT filter

Post by vkfoto » 30 Oct 16 0:35

I found this from Hert http://forum.idimager.com/viewtopic.php ... te#p108753
Binary vs visual comparisons

Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?

jstartin
Posts: 428
Joined: 23 Aug 06 13:47
Location: UK

Re: Duplicates, Similar and a NOT filter

Post by jstartin » 30 Oct 16 12:50

andrew.heard wrote:
jstartin wrote:Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439
This link displays webapge with message "You are not authorised to read this forum."
Still works for me, but the web is full of mysteries :wink:.
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)

snowman1
Posts: 361
Joined: 01 Jan 07 3:13
Location: UK

Re: Duplicates, Similar and a NOT filter

Post by snowman1 » 30 Oct 16 12:56

I got the same message as Andrew.
Snowman1
http://www.flickr.com/photos/snowman-1/
--------------------------------------

jstartin
Posts: 428
Joined: 23 Aug 06 13:47
Location: UK

Re: Duplicates, Similar and a NOT filter

Post by jstartin » 30 Oct 16 13:53

snowman1 wrote:I got the same message as Andrew.
The light has dawned, the penny dropped, etc. The link was to a beta tester's forum without general access.
Hert wrote:"Duplicate Files" will detect duplicates based on the binary file signature. The same signature means the same file. I can't explain how the signature is calculated, but it's done on the file and not on metadata. When you have an image and save a copy without metadata, then "Duplicate files" won't recognize them as the same as the file content is different. When you have an image and save a copy as another name then Duplicate Files will detect that as the file content is the same. Save a copy with some sharpening and "Duplicate Files" won't detect that as the file is different.

"Similar Images" will analyze the content of the image and determine if the image (not the file) is the same as another image. Unlike Duplicate Files, Similar images will detect identical copies, Save a downsized copy and the image is detected. Save a copy while stripping (some) metadata and it is detected. Save a sharpened copy and the image is detected. Save a b/w copy and it is detected. But when you crop an image then the image content is no longer the same and it (most probably) won't be detected.
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)

snowman1
Posts: 361
Joined: 01 Jan 07 3:13
Location: UK

Re: Duplicates, Similar and a NOT filter

Post by snowman1 » 31 Oct 16 12:10

Jim, thanks for re-posting that, that's very helpful.
Snowman1
http://www.flickr.com/photos/snowman-1/
--------------------------------------

PhilBurton
Posts: 312
Joined: 12 Sep 10 18:47
Location: CA, USA

Re: Duplicates, Similar and a NOT filter

Post by PhilBurton » 01 Nov 16 1:32

vkfoto wrote:I found this from Hert http://forum.idimager.com/viewtopic.php ... te#p108753
Binary vs visual comparisons

Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
vkfoto,

If I understand you correctly, you would like PSu to treat a RAW+JPG pair of the same image as just one image. So if you a file named _DSC1234.NEF and a file named _DSC1234.JPG, PSu would treat the two files as one image for display purposes, etc. That is what I would want, but with renaming included. And if there is a _DSC1234.WAV file (Nikon voice memo) that would be renamed the same as the NEF and JPG.

Back in March, I started a thread on this topic, http://forum.idimager.com/viewtopic.php?f=57&t=24163, which may be helpful to you.

Welcome to the forum.

Phil
Photo Supreme user
Home built i7 3930, 32 GB RAM, Win 10 Pro 64, latest version of Photo Supreme 3, Lightroom 6 and Photoshop CS 6 (perpetual licenses)

vlad
Posts: 967
Joined: 01 Sep 08 15:20

Re: Duplicates, Similar and a NOT filter

Post by vlad » 07 Nov 16 15:48

vkfoto wrote: Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
If properly versioned, the RAW+JPG pairs should not appear inside Duplicate Files or Similar Images. (I actually had some version pairs displayed under Duplicates, but forcing a metadata resync + thumb rebuilding has solved the issue.)

vkfoto
Posts: 94
Joined: 19 Oct 16 3:51

Re: Duplicates, Similar and a NOT filter

Post by vkfoto » 07 Nov 16 16:25

Hmm, so before doing any processing, that is before any other versions are created, have PSu combine the CR2 and the OOC JPG into version sets. Then they will be ignored in the duplicate test. Sounds like a plan going forward.

So there is no way to flag other existing images to also be ignored? Once I've investigated and decided that it is a legitimate duplicate or similar image, I would like it to not be in those lists.

vlad
Posts: 967
Joined: 01 Sep 08 15:20

Re: Duplicates, Similar and a NOT filter

Post by vlad » 07 Nov 16 17:19

So, you want to maintain some non-versioned duplicates in your catalog (just curious: what's the purpose?) but exclude them from the list. Well, neither is there a standard feature to exclude true duplicates/similars, nor should it be. (That would defeat the purpose of determining duplicates and similars based on objective tests - e.g., binary comparison. I would very much welcome, however, a user-controlled threshold for the similarity test.)

Still, there are other ways to ignore certain images. For example, you could define a private label, say LegitimateDup, and assign it to any "legitimate" duplicate. Then, you could apply a label filter within Duplicate Files and invert that filter to get all "illegitimate" dups (or similars).

vkfoto
Posts: 94
Joined: 19 Oct 16 3:51

Re: Duplicates, Similar and a NOT filter

Post by vkfoto » 07 Nov 16 19:21

Why? I chalk it up to being new to PSU and still using an older established workflow. I will often duplicate an image in order to work with it separately. For example small, medium and large JPG versions destined for different uses. I don't bother changing the names since I always stored them in individual folders. In order to select one subset in Explorer or my email, it is easier if they are already grouped physically in one place.

When I got PSu, I was attracted by the ability to catalog my images. That's all I was really looking for. So far, I've been busy applying labels to past images and discovering various features of the program. As long as that works, anything else is a bonus for me.

Not sure why applying an ignore flag would be detrimental. If after investigation I determine it's OK, them that would be one less image for the program to worry about. I've always been from the school that feels automation should bend to people and not people to automation.

vlad
Posts: 967
Joined: 01 Sep 08 15:20

Re: Duplicates, Similar and a NOT filter

Post by vlad » 07 Nov 16 22:39

vkfoto wrote:Why? I chalk it up to being new to PSU and still using an older established workflow. I will often duplicate an image in order to work with it separately. For example small, medium and large JPG versions destined for different uses. I don't bother changing the names since I always stored them in individual folders. In order to select one subset in Explorer or my email, it is easier if they are already grouped physically in one place.
Fair enough, but why are you opposed to grouping your true duplicates or similar images into version sets? If it helps, you could think of versioning as the official mechanism for ignoring duplicates/similars. Let's say the Similar Images state includes small_1.jpg, medium_1.jpg and large_1.jpg as a triplet of similar images. You could simply select those images and press Shift+V to manually version them together. (The names do not need to match.) In the future, those images should no longer appear in the Similar list. Isn't that suitable? (Note that those images are really variants of a common image, so we're talking about proper use of versioning, not just a workaround!)
Not sure why applying an ignore flag would be detrimental.
Because a catalog state should reflect the true state of database content, rather than some altered (or filtered) variation. In addition, an ignore flag would probably trigger requests for some "un-ignore" mecahnism, with all the ensuing complications. As I said: think of versioning as your ignore flag ;)
I've always been from the school that feels automation should bend to people and not people to automation.
Fair enough, but I regard the content of catalog states as primarily a matter of accuracy and reliability rather than automation.

vkfoto
Posts: 94
Joined: 19 Oct 16 3:51

Re: Duplicates, Similar and a NOT filter

Post by vkfoto » 08 Nov 16 0:30

Looks like I still have to get comfortable with versioning. I guess without a solid understanding of what it is and how it works, I'm a bit reluctant to use it. Too bad there isn't a good written guide. I know there is lots of good information and advice here in the forum but I'd first have to find and put it all into a document that I could read offline while playing with PSu.

Post Reply