Duplicates, Similar and a NOT filter
Duplicates, Similar and a NOT filter
Thanks to the Catalog states, I have found several thousand duplicate and similar images. Have already removed hundreds of duplicates.
If I understand it correctly, from what I see, duplicate is the exact same image stored in multiple locations. What are the specific criteria: filename, ext, size, date, contents?
As I look at all the duplicates, I'm seeing a trend. Back in the days before a photo catalog, I was cataloging but instead of software, I was creating subfolders by subject or by use.It was wasteful of space but it was a fast way to organize images into manageable groups. Looks like a lot of work ahead but a great way to cull the detritus form years past.
How is similarity determined?
As I go through and validate these, some are supposed to be there. They represent RAW+JPG pairs or different size versions i.e. large and small JPGs, etc.
I would like to mark them so that they don't appear in the Duplicate and Similar catalog states any more. Is that possible?
Meanwhile I thought of applying a color label to the ones that are OK, I picked green. Looking in the Advance Filter guide, I didn't see a way to do a negative filter: i.e NOT green. That way, all without a color label or any color except green would still show.
If I understand it correctly, from what I see, duplicate is the exact same image stored in multiple locations. What are the specific criteria: filename, ext, size, date, contents?
As I look at all the duplicates, I'm seeing a trend. Back in the days before a photo catalog, I was cataloging but instead of software, I was creating subfolders by subject or by use.It was wasteful of space but it was a fast way to organize images into manageable groups. Looks like a lot of work ahead but a great way to cull the detritus form years past.
How is similarity determined?
As I go through and validate these, some are supposed to be there. They represent RAW+JPG pairs or different size versions i.e. large and small JPGs, etc.
I would like to mark them so that they don't appear in the Duplicate and Similar catalog states any more. Is that possible?
Meanwhile I thought of applying a color label to the ones that are OK, I picked green. Looking in the Advance Filter guide, I didn't see a way to do a negative filter: i.e NOT green. That way, all without a color label or any color except green would still show.
Re: Duplicates, Similar and a NOT filter
Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439
http://forum.idimager.com/viewtopic.php ... te#p106439
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)
-
- Posts: 164
- Joined: 16 Jun 10 0:36
Re: Duplicates, Similar and a NOT filter
This link displays webapge with message "You are not authorised to read this forum."jstartin wrote:Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439
Re: Duplicates, Similar and a NOT filter
I found this from Hert http://forum.idimager.com/viewtopic.php ... te#p108753
Binary vs visual comparisons
Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
Binary vs visual comparisons
Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
Re: Duplicates, Similar and a NOT filter
Still works for me, but the web is full of mysteriesandrew.heard wrote:This link displays webapge with message "You are not authorised to read this forum."jstartin wrote:Here is an old post of Hert's that may answer your questions
http://forum.idimager.com/viewtopic.php ... te#p106439

Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)
Re: Duplicates, Similar and a NOT filter
I got the same message as Andrew.
Re: Duplicates, Similar and a NOT filter
The light has dawned, the penny dropped, etc. The link was to a beta tester's forum without general access.snowman1 wrote:I got the same message as Andrew.
Hert wrote:"Duplicate Files" will detect duplicates based on the binary file signature. The same signature means the same file. I can't explain how the signature is calculated, but it's done on the file and not on metadata. When you have an image and save a copy without metadata, then "Duplicate files" won't recognize them as the same as the file content is different. When you have an image and save a copy as another name then Duplicate Files will detect that as the file content is the same. Save a copy with some sharpening and "Duplicate Files" won't detect that as the file is different.
"Similar Images" will analyze the content of the image and determine if the image (not the file) is the same as another image. Unlike Duplicate Files, Similar images will detect identical copies, Save a downsized copy and the image is detected. Save a copy while stripping (some) metadata and it is detected. Save a sharpened copy and the image is detected. Save a b/w copy and it is detected. But when you crop an image then the image content is no longer the same and it (most probably) won't be detected.
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; internal AMD Radeon™ HD7560D; 4GB DDR3 SDRAM; Win10x64)
Re: Duplicates, Similar and a NOT filter
Jim, thanks for re-posting that, that's very helpful.
-
- Posts: 307
- Joined: 12 Sep 10 18:47
- Location: CA, USA
Re: Duplicates, Similar and a NOT filter
vkfoto,vkfoto wrote:I found this from Hert http://forum.idimager.com/viewtopic.php ... te#p108753
Binary vs visual comparisons
Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
If I understand you correctly, you would like PSu to treat a RAW+JPG pair of the same image as just one image. So if you a file named _DSC1234.NEF and a file named _DSC1234.JPG, PSu would treat the two files as one image for display purposes, etc. That is what I would want, but with renaming included. And if there is a _DSC1234.WAV file (Nikon voice memo) that would be renamed the same as the NEF and JPG.
Back in March, I started a thread on this topic, http://forum.idimager.com/viewtopic.php?f=57&t=24163, which may be helpful to you.
Welcome to the forum.
Phil
Photo Supreme user
Home built i7 3930, 32 GB RAM, Win 10 Pro 64, latest version of Photo Supreme 3, Lightroom 6 and Photoshop CS 6 (perpetual licenses)
Home built i7 3930, 32 GB RAM, Win 10 Pro 64, latest version of Photo Supreme 3, Lightroom 6 and Photoshop CS 6 (perpetual licenses)
Re: Duplicates, Similar and a NOT filter
If properly versioned, the RAW+JPG pairs should not appear inside Duplicate Files or Similar Images. (I actually had some version pairs displayed under Duplicates, but forcing a metadata resync + thumb rebuilding has solved the issue.)vkfoto wrote: Now is there a way to have PSu ignore RAW+JPG pairs? That is, treat them as a single entity so that neither the Duplicate nor the Similar functions will flag them?
Re: Duplicates, Similar and a NOT filter
Hmm, so before doing any processing, that is before any other versions are created, have PSu combine the CR2 and the OOC JPG into version sets. Then they will be ignored in the duplicate test. Sounds like a plan going forward.
So there is no way to flag other existing images to also be ignored? Once I've investigated and decided that it is a legitimate duplicate or similar image, I would like it to not be in those lists.
So there is no way to flag other existing images to also be ignored? Once I've investigated and decided that it is a legitimate duplicate or similar image, I would like it to not be in those lists.
Re: Duplicates, Similar and a NOT filter
So, you want to maintain some non-versioned duplicates in your catalog (just curious: what's the purpose?) but exclude them from the list. Well, neither is there a standard feature to exclude true duplicates/similars, nor should it be. (That would defeat the purpose of determining duplicates and similars based on objective tests - e.g., binary comparison. I would very much welcome, however, a user-controlled threshold for the similarity test.)
Still, there are other ways to ignore certain images. For example, you could define a private label, say LegitimateDup, and assign it to any "legitimate" duplicate. Then, you could apply a label filter within Duplicate Files and invert that filter to get all "illegitimate" dups (or similars).
Still, there are other ways to ignore certain images. For example, you could define a private label, say LegitimateDup, and assign it to any "legitimate" duplicate. Then, you could apply a label filter within Duplicate Files and invert that filter to get all "illegitimate" dups (or similars).
Re: Duplicates, Similar and a NOT filter
Why? I chalk it up to being new to PSU and still using an older established workflow. I will often duplicate an image in order to work with it separately. For example small, medium and large JPG versions destined for different uses. I don't bother changing the names since I always stored them in individual folders. In order to select one subset in Explorer or my email, it is easier if they are already grouped physically in one place.
When I got PSu, I was attracted by the ability to catalog my images. That's all I was really looking for. So far, I've been busy applying labels to past images and discovering various features of the program. As long as that works, anything else is a bonus for me.
Not sure why applying an ignore flag would be detrimental. If after investigation I determine it's OK, them that would be one less image for the program to worry about. I've always been from the school that feels automation should bend to people and not people to automation.
When I got PSu, I was attracted by the ability to catalog my images. That's all I was really looking for. So far, I've been busy applying labels to past images and discovering various features of the program. As long as that works, anything else is a bonus for me.
Not sure why applying an ignore flag would be detrimental. If after investigation I determine it's OK, them that would be one less image for the program to worry about. I've always been from the school that feels automation should bend to people and not people to automation.
Re: Duplicates, Similar and a NOT filter
Fair enough, but why are you opposed to grouping your true duplicates or similar images into version sets? If it helps, you could think of versioning as the official mechanism for ignoring duplicates/similars. Let's say the Similar Images state includes small_1.jpg, medium_1.jpg and large_1.jpg as a triplet of similar images. You could simply select those images and press Shift+V to manually version them together. (The names do not need to match.) In the future, those images should no longer appear in the Similar list. Isn't that suitable? (Note that those images are really variants of a common image, so we're talking about proper use of versioning, not just a workaround!)vkfoto wrote:Why? I chalk it up to being new to PSU and still using an older established workflow. I will often duplicate an image in order to work with it separately. For example small, medium and large JPG versions destined for different uses. I don't bother changing the names since I always stored them in individual folders. In order to select one subset in Explorer or my email, it is easier if they are already grouped physically in one place.
Because a catalog state should reflect the true state of database content, rather than some altered (or filtered) variation. In addition, an ignore flag would probably trigger requests for some "un-ignore" mecahnism, with all the ensuing complications. As I said: think of versioning as your ignore flagNot sure why applying an ignore flag would be detrimental.

Fair enough, but I regard the content of catalog states as primarily a matter of accuracy and reliability rather than automation.I've always been from the school that feels automation should bend to people and not people to automation.
Re: Duplicates, Similar and a NOT filter
Looks like I still have to get comfortable with versioning. I guess without a solid understanding of what it is and how it works, I'm a bit reluctant to use it. Too bad there isn't a good written guide. I know there is lots of good information and advice here in the forum but I'd first have to find and put it all into a document that I could read offline while playing with PSu.