Similar images?

Post Reply
vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Similar images?

Post by vkfoto » 28 Mar 21 3:51

When scanning new images, it sometimes happens that the same image comes from two or more sources and so gets scanned multiple times. When I highlight a suspect and then right click "Show more ..." and then select "Similar Images" I expect to see them all.

Having selected an image I know to be in duplicate, I expected to be shown the other one. It was not shown.
Do the duplicates have to be in the same physical folder? or anywhere in the database?

After creating a copy of a known duplicate, I ran the test again and it found the new copy but the other duplicate is still not found.
Then I went over to the States tab and tried the Duplicate Files. The two above files are displayed but the third is also not shown. The third duplicate is in another folder.

Is this how Similar and Duplicates is supposed to work?

Robosoc
Posts: 38
Joined: 11 Apr 10 9:56
Location: Germany

Re: Similar images?

Post by Robosoc » 28 Mar 21 7:09

Is the third image, which is currently not shown in your tests, a copy (everything the same) or a Derivate (changes like size or something else?

I would have expected to see it at least in "similar images" like you, and I also had the feeling that the detection of similar and duplicates does not always show the whole truth, but I did not spend enough time on it.

Hert
Posts: 6811
Joined: 13 Sep 03 7:24

Re: Similar images?

Post by Hert » 28 Mar 21 9:58

Duplicates and Similar is unrelated to folder location. They are based on signatures. When building a thumb the signatures are calculated. If I remember correctly the duplicate signature calculation was changed somewhere early V5. If using a catalog from before that then to be sure all your signatures are correct you should rebuild all thumbs. Of course for newer catalogs that’s not needed.
This is a user-to-user forum. If you need product support then please send a message

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 28 Mar 21 23:10

Just in case, I selected all images (3615 items) and ran the command "Rebuild Selected Thumbnails" After it was finished, I saw two scans of the same photo in different folders. Because the display is by date order, these two happened to be side by side. Selecting one, I right clicked, "Show more ..."and then selected "Similar Images." the other was not presented as a similar candidate.

Since they are from two different scans, they are not exact duplicates. But what is the definition of "Similar?" How different do they have to be for the algorithm to not consider them similar? Or said another way, how close is close enough to be considered similar?

In another test, one of three images was selected and the other two were found. In this case, they are actual copies of each other. One because it is a montage of the same person from different ages are they dated differently.

Hert
Posts: 6811
Joined: 13 Sep 03 7:24

Re: Similar images?

Post by Hert » 29 Mar 21 8:25

vkfoto wrote:
28 Mar 21 23:10
Since they are from two different scans, they are not exact duplicates. But what is the definition of "Similar?" How different do they have to be for the algorithm to not consider them similar? Or said another way, how close is close enough to be considered similar?
They have to be visually pretty much similar (like a downsized copy of the same image, or a b/w version of the original). Or a identical visual copy of the original.
In another test, one of three images was selected and the other two were found. In this case, they are actual copies of each other. One because it is a montage of the same person from different ages are they dated differently.
Duplicate means binary identical. If two files are identical then they are duplicates. If they are visually identical (the image appearance) but the files differ (e.g. different metadata) then they are "Similar", not "Duplicates"
This is a user-to-user forum. If you need product support then please send a message

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 29 Mar 21 15:34

If similar is because the meta data is different, then that is not very useful if I need to find photos that were scanned multiple times by accident.

Did a small test to simulate a typical situation.
No metadata was applied beyond what gets automatically created by the scanner. Why are they not similar? I think by all measures they are similar.
Scanned the same original four times
from Explorer
from Explorer
color targets details.JPG (18.08 KiB) Viewed 753 times
In PSU, the thumbnails look like this
PSU thumnails
PSU thumnails
color targets.JPG (53.6 KiB) Viewed 753 times
Only two were recognized as similar, image 0344 and 0347
Attachments
image 0344 and 0347
image 0344 and 0347
color targets 2 similar.JPG (29.65 KiB) Viewed 753 times

Hert
Posts: 6811
Joined: 13 Sep 03 7:24

Re: Similar images?

Post by Hert » 29 Mar 21 15:46

“Similar” doesn’t look at metadata. Only the visual appearance of the image, as explained in my previous post.

Why not share your files so we can check them and confirm if they are identified as similar or not.
This is a user-to-user forum. If you need product support then please send a message

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 29 Mar 21 19:33

The four "similar" files are being sent by WeTransfer. Visually, I think they are all similar, the same original just scanned four different times. My PSU sees three different files. Only two are similar.

Hert
Posts: 6811
Joined: 13 Sep 03 7:24

Re: Similar images?

Post by Hert » 29 Mar 21 20:29

Thank you for sharing the files. The files are not similar because they are too different. The rotation is different, the framing is different. Except from one, they are too different to be considered visually identical.
like a downsized copy of the same image, or a b/w version of the original). Or a identical visual copy of the original
In other words: use Similar images to find other files of the same photo.
This is a user-to-user forum. If you need product support then please send a message

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 29 Mar 21 21:22

Hmm, well, that solves that mystery. When I look at the 4 images, they seem pretty similar to me. I suppose that teaching the AI system to recognize what humans consider similar images is a big job.

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 02 Apr 21 1:31

Wondering if there was a better solution than to go through my images one at a time to manually find similar images, that is those that are not binary copies but have just about the same subject matter, I came across the little program "Similar Image Finder." Ran a few tests on my image folders and it did a very good job at showing what could be similar images. Depending on how I set the parameters, it finds more or fewer, quicker or slower.

Hert
Posts: 6811
Joined: 13 Sep 03 7:24

Re: Similar images?

Post by Hert » 02 Apr 21 10:06

It's a matter of expectation. PSU's "Similar" detection is not intended to find different angled scanned photos. You may get some positive hits depending on how neatly positioned the photos on the scanner were.
PSU's "similarity" detection is designed to find images that are identical but look different. For example, a downsized copy of the same photo, or a back&white version of the same photo.
This is a user-to-user forum. If you need product support then please send a message

vkfoto
Posts: 214
Joined: 19 Oct 16 3:51

Re: Similar images?

Post by vkfoto » 03 Apr 21 2:49

Hert wrote:
02 Apr 21 10:06
...
PSU's "similarity" detection is designed to find images that are identical but look different. ...
Ah, that explains my confusion, I was going by the assumption that the reverse was the case - not identical but look the same. which is why I couldn't figure out why the four sample images of the color chart were not flagged as all similar.

Now that I found the app Similar Image Finder, I can locate and remove what I consider similar images , same original scanned at different times, and so keep my image folders as small as possible. Don't have to worry and wonder if I already scanned the same photo from another album.

Post Reply