download/import skips images

sanphotgn
Posts: 336
Joined: 26 Aug 07 17:06

Re: download/import skips images

Post by sanphotgn »

Is it possible files are being skipped, because Duplicate Handling is set to Skip file?
Photo Supreme 6.7.2.4201 (64 bits) (Windows)
fbungarz
Posts: 1829
Joined: 08 Dec 06 4:03
Location: Arizona, USA

Re: download/import skips images

Post by fbungarz »

Is it possible files are being skipped, because Duplicate Handling is set to Skip file?
No, I am not using that setting (never have).
Thanks,

Frank
sanphotgn
Posts: 336
Joined: 26 Aug 07 17:06

Re: download/import skips images

Post by sanphotgn »

I asked, because it is shown in your screen print:

viewtopic.php?f=57&t=27352#p120789

I downloaded and "installed" one of your import profiles (HX...noScript) and the setting is set at Skip file.
Photo Supreme 6.7.2.4201 (64 bits) (Windows)
Hert
Posts: 7928
Joined: 13 Sep 03 6:24

Re: download/import skips images

Post by Hert »

sanphotgn wrote: 03 Dec 18 19:25 Is it possible files are being skipped, because Duplicate Handling is set to Skip file?
This makes sense. The Import Process uses a lot of concurrent threads and if two files are calling the script at the exact same time, they may get the same number returned from the file name script.

I'm curious if setting duplicate handling to make unique names supports this assumption.
This is a user-to-user forum. If you have suggestions, requests or need support then please send a message
vlad
Posts: 895
Joined: 01 Sep 08 14:20

Re: download/import skips images

Post by vlad »

Hert wrote: 04 Dec 18 9:41 This makes sense. The Import Process uses a lot of concurrent threads and if two files are calling the script at the exact same time, they may get the same number returned from the file name script.
Shouldn't that happen only for duplicates? Could non-duplicate files end up with identical names?
Hert
Posts: 7928
Joined: 13 Sep 03 6:24

Re: download/import skips images

Post by Hert »

vlad wrote: 04 Dec 18 11:50Could non-duplicate files end up with identical names?
Sure, if you configure a profile to *always* return test.jpg as the target file name then regardless unique source names the target file names will be duplicates.

Import IMG0001.JPG as TEST.JPG
Import IMG0002.JPG as TEST.JPG
etc

It depends on what the rename script in the profile generates. In this case the script generates a sequenced file name (afaik). So if two threats call the script at the same time then the target file name will be a duplicate.
This is a user-to-user forum. If you have suggestions, requests or need support then please send a message
vlad
Posts: 895
Joined: 01 Sep 08 14:20

Re: download/import skips images

Post by vlad »

Hert wrote: 04 Dec 18 12:20 It depends on what the rename script in the profile generates. In this case the script generates a sequenced file name (afaik). So if two threats call the script at the same time then the target file name will be a duplicate.
Isn't that a design issue (or limitation, at least)? Could it be avoided by tweaking the script?
Hert
Posts: 7928
Joined: 13 Sep 03 6:24

Re: download/import skips images

Post by Hert »

You’re free to enter whatever you like as a rename rule. If you’re rename rule is to rename everything to test.jpg then that’s up to you. That it doesn’t make sense to do is another discussion
This is a user-to-user forum. If you have suggestions, requests or need support then please send a message
fbungarz
Posts: 1829
Joined: 08 Dec 06 4:03
Location: Arizona, USA

Re: download/import skips images

Post by fbungarz »

Well, it seems I never even noticed that setting and I NEVER changed it from the default either! Again looking at all my profiles, they all have that first option enabled (= skip files). It is the default option and I am not aware that I ever changed it. Why would I suddenly change an option that I was not even aware of and which previously never caused any problems?

I have used the profiles for years now and they never previously skipped any images!
Sure, if you configure a profile to *always* return test.jpg as the target file name then regardless unique source names the target file names will be duplicates.
But my rename rule is not configured to return *always* the same name! On the contrary, it is supposed to use a counter. And it does! The image names receive sequential numbers! Only some of the files are NOT downloaded and thus, when the next file is downloaded instead, that then gets named with the number and thus the image names are not correctly paired as JPG+NEF anymore...

You say the "handling" duplicates is configured to "skip duplicates". But there are NO duplicates among these images! The JPG+NEF pairs, although identical images, are pairs of different image formats! And why would some of those pair be downloaded correctly, i.e., being recognized not to be duplicates, while other files are considered duplicates?
Also: if you look at my screenshot there are even entire JPG+NEF pairs being skipped! And these pairs are definitely NOT duplicates of any other images!
This makes no sense...

OK, next step...
Looking at the settings, which one do I try next. There are three options:
1) Skip File
2) Overwrite File
3) Make name unique

"Make name unique" - what would that one achieve, but interfere with the rename script by possibly appending numbers, thus messing up the counter even more?

"Overwrite File" - I definitely do not want that either, right? It would mean that files are not only not downloaded (i.e., skipped), but potentially even overwritten...

I will try both other options with some fresh files tomorrow, but really doubt it will make a difference.
Username
Posts: 310
Joined: 18 Feb 18 21:21

Re: download/import skips images

Post by Username »

Just an idea.

Could this be a race condition where the script renaming process sometimes difference in time or are processed by a different thread.
So some gets skipped during cache work or post renaming process but pre writing file?
Or during post writing but pre renaming?

Any difference in which files gets skipped if you import same NEF+jpeg from internal SSD or external usb thumb drive?
Or during high CPU load by other processes?
PSu Server 2024 & Postgres 15 on macOS 14
PSO 6 on Windows Server 2022

- I'm the user
Hert
Posts: 7928
Joined: 13 Sep 03 6:24

Re: download/import skips images

Post by Hert »

As I mentioned before, choose “Make names unique”.

Pick that, not to get the correct result, but to prove the assumption raised by sanphotgn

Do you then get all files downloaded? (yes yes yes with wrong names...I know)
If so then your rename script isn’t thread safe. And then we can go forward and change your file naming script so that it can handle multiple threads running concurrently.
This is a user-to-user forum. If you have suggestions, requests or need support then please send a message
fbungarz
Posts: 1829
Joined: 08 Dec 06 4:03
Location: Arizona, USA

Re: download/import skips images

Post by fbungarz »

As I mentioned before, choose “Make names unique”.
Do you then get all files downloaded? (yes yes yes with wrong names...I know)
OK; I think we are onto something now...

I have just changed one of the old profiles to "Make names unique" and the result is this:
(1) All files are downloaded, none are skipped - good
(2) Ever so often some file names receive an additional (001) appended to their name and thus the JPG+NEF pairing gets messed up and uncessary file names are generated (see screenshot):
Download-results_MakeNamesUnique.jpg
Download-results_MakeNamesUnique.jpg (315.9 KiB) Viewed 366518 times
If so then your rename script isn’t thread safe. And then we can go forward and change your file naming script so that it can handle multiple threads running concurrently.
I guess that confirms it: the file isn't "thread safe".
It worked before, so probably this has to do with speed improvements through more multi-threading in the new PSU version?

Thanks again for looking into getting this back to work!
Hert
Posts: 7928
Joined: 13 Sep 03 6:24

Re: download/import skips images

Post by Hert »

I've now looked at the rename script. The issue is not a thread race condition after all. As stated in the pre-leaded text of the rename script that you posted before, this script assumes that the files are processed in an exact sequence. While this works in a non-threaded system, this assumption is violated when ran in a multi threaded system. Simply because there's no control over when what file will be copied when.

Here's a rename script that will do the same and not depend on the sequence of files.

Code: Select all

//    - This script is designed to be used with the downloader pre-script
//      "Download by Prefix", as it manages some of the registry keys used
//      by this script.
// JAG, 16-Apr-2007
// HVZ, 23-Jun-2016 (converted to Photo Supreme)
// HVZ, 06-Dec-2018 (made script independent of the sequence as with multithreading, the sequence is not predictable)

var
  highestNumber, lastNumber: Integer;
  prefix:     WideString;

  AStackName: WideString;
  AFiles: TTntStringList;
  idx: Integer;
 
begin
  prefix        := ReadFromRegistry ('dlByPrefixCurrent', 'prefix', '');
  if (prefix = '') then prefix := 'UNKNOWN_'; 
  highestNumber := ReadFromRegistry ('dlByPrefixCurrent', 'value', 0); 

  AStackName := 'DownloadByPrefixList_' + prefix;
  AFiles := TTntStringList(StrToPtrInt(Nvl(PopFromStack(AStackName), '0')));
  if AFiles = nil then
  begin
    AFiles := TTntStringList.Create;
    PushToStack(AStackName, PtrIntToStr(AFiles));
  end;
  if VarToStr(Nvl(PopFromStack(AStackName + 'session'), 0)) <> Session.GUID then
  begin
    AFiles.Clear;
    PushToStack(AStackName + 'session', Session.GUID);
  end;

  idx := AFiles.IndexOf('%FileName');
  if idx = -1 then
  begin
    lastNumber := highestNumber + 1;
    AFiles.AddObject('%FileName', lastNumber);  // maintain the number used for this filename
  end
  else
  begin
    // reuse the same number for existing files (e.g. TEST.JPG and TEST.NEF)
    lastNumber := AFiles.Objects[idx];
  end;

  if lastNumber > highestNumber then
  begin
    WriteToRegistry ('dlByPrefixLastVals', prefix, lastNumber);
    WriteToRegistry ('dlByPrefixCurrent', 'value', lastNumber);
  end

  // return our newly constructed filename
  result := prefix + AddLeadingChars (IntToStr(lastNumber), '0', 4, False);
end;
It worked before, so probably this has to do with speed improvements through more multi-threading in the new PSU version?
PSU was always multi threaded but due to the complexity of the import routines, in V3 the import module was restricted to use only one thread. In V4 this restriction was let go and the copy process now uses up to 3 concurrent threads. And now that multiple threads can call your rename script, the original assumption as set by the original writer of the script no longer applies.

PS. when you paste the script above, it will give an exception in the preview. Don't worry about that. If you're running 1790 or higher than the script will work during import.
This is a user-to-user forum. If you have suggestions, requests or need support then please send a message
fbungarz
Posts: 1829
Joined: 08 Dec 06 4:03
Location: Arizona, USA

Re: download/import skips images

Post by fbungarz »

Thanks so much Hert !!!
Compiling the script I get the message:
Unknown identifier or variable is not declared: 'Session'. Source position: 28,72
I guess that is OK, since you write:
will give an exception in the preview. Don't worry about that. If you're running 1790 or higher than the script will work during import.
I'll try it out now and report back. :wink:

Again: thanks for looking into it.

Cheers,
Frank
fbungarz
Posts: 1829
Joined: 08 Dec 06 4:03
Location: Arizona, USA

Re: download/import skips images

Post by fbungarz »

The renaming now works, but now pairs of files are STILL skipped.
:(
(I re-enabled "skip" for "duplicate handling" again - should I disable that? Like I said, since it is the default for any new profile, it previously was always remained enabled, I never touched it...)

Out of 66 files in total, only 44 get downloaded.
I looked at the files, before the download they are in 6 different folders:

folder 1: 10 images, only 8 get downloaded
folder 2: 12 images, only 8 get downloaded
folder 3: 10 images, only 6 get downloaded
folder 4: 10 images, only 4 get downloaded
folder 5: 18 images, only 14 get downloaded
folder 6: 6 images, only 4 get downloaded

There is no discernible pattern, how many images are downloaded and how many are skipped.
I looked at the time stamps of the original files. It seems that some (not all) of the files being skipped have were taken the same minute as other files that are being downloaded, but their time stamp still differs by the second.

To me this seems to suggest that "skip duplicates" considers images that were taken at the same minute, only a few seconds apart, as duplicates. Is that correct? This seems odd, given that cameras these days are often capable of taking images only microseconds apart...

Anyway, there are also images, however, that have a distinctly different time stamp, minutes apart that nevertheless are still skipped. Like in Folder no. 6 all six images have different names AND different time stamps, but 2 of the six files are still skipped. Strange...

What criteria does PSu use to consider files "duplicates"?

Next step: running it with "make file names unique"...
Post Reply