Is increase of database after area detection comprehensible?

Post Reply
Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Is increase of database after area detection comprehensible?

Post by Robosoc » 02 Apr 21 7:22

I just finished automatic face recognition by the external tool TagThatPhoto (TTP), with writing the area descriptions into the files (jpg) and made a quick foder scan in PSU to let PSU realize that there have been changes in >20.000 images. I then read in alle the images with changes into PSU to synchronize them.

In TTP I write faces & tags to image metadata with a setting to add "Microsoft face regions" and "MWG face regions". I know this may be unneccassry double information in the jpg-files, but I wasn't sure what works best with PSU and on a test with only 3 files it looked to me as if PSU worked good with this setting.

Before that import my database had a size of 566.233 MB, after that import and after compacting it it shows 1.850.406 MB. More than tripple!
That does not seem to be comprehensible only by informations of areas and additional tags to me, does it? (Is compacting realy doing its job???)

I did the work in TTP in a couple of steps and mass-syncrhonised PSU twice or tripple in the past days, asuming that at the end I always have the latest state (I don't doubt that is the case). But unfortunatly I did not watch the size increase of the database during those steps.

I do have backups of either my photos before TTP action and of the PSU database before importing the new informations.

Is there any idea how I can check if there isn't anything wrong?

I could build up a complete new database based on the images of cause, but than I am going to loos folder labeling and Stack-informations, which I realy would dislike.

I am on PSU6 V3641 and did perform compacting in this version and the version I had instaled before (at least 6.0.0.3635, maybe a version inbetween but I don't think so).

Hert
Posts: 6775
Joined: 13 Sep 03 7:24

Re: Is increase of database after area detection comprehensible?

Post by Hert » 03 Apr 21 13:24

I think your images were not all fully imported to the database. 500MB for a >20K images database sounds like very low. Maybe this was an upgraded database from an older version?
1.8GB sounds more like what I would expect for images will average metadata content. You're good. No need to reimport everything.

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 09 Apr 21 8:45

I tried to double check this issue. And I am still pretty sure that all my ~24.700 images have been fully imported before and all of them seemed to have a thumbnail (I reloaded the backed up database, compacted it and checked for unknown thumbs with 0 result).

I also checked if the size of the images themself have dramatically increased after the TTP activity and I can hardly see any difference in the sice of my image folders. (both roughly 87,8 GB)

Just to get sure I did mess something up with my catalogue I better build up a brand new catalog from importing all images again.... but before I need to figure out a way to have my stacks somehow saved so that I will have less work to reproduce them. Maybe I give each Stack a Tag like
Stack01, Stack02, Stack03..

Hert
Posts: 6775
Joined: 13 Sep 03 7:24

Re: Is increase of database after area detection comprehensible?

Post by Hert » 09 Apr 21 9:45

but before I need to figure out a way to have my stacks somehow saved
Stacks are part of ICS and is written to metadata if you have ICS writing ON (the default). ICS is your extra "catalog backup".

Before importing your new images, make sure to enable ICS reading. Then after all images are imported then don't forget to disable ICS reading again. But keep ICS writing enabled, it's your extra backup that will safe you when everything else fails. ICS reading is only needed to "restore a catalog" so not needed during normal operation. Therefore you should keep that switched OFF after the restore.

Again, I see no reason to rebuild your catalog. Save yourself the time and effort. But of course that's up to you :D

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 10 Apr 21 9:16

Thank you Hert for giving me the hint with the ICS- Shema. Just to be sure: Is this the right setting (it looks correct to me):
1.jpg
1.jpg (30.41 KiB) Viewed 227 times
2.jpg
2.jpg (21.71 KiB) Viewed 227 times
I intend to use two Computers with Standalone PSU and plan to synchronize the database with goodsync on automatic mode (automatic synchronization on file change with delay...) via WLAN . So I would be happy if the catalog is as small as possible and tripple the size means ~tripple the snyc time, which increases failure and conflict possibilities. Therefore I realy want to make sure, if 1,8 GB is the best I can get as I still did not see what was missing before, when the database was <600 MB.

I do understand that my chances a low, but it is worth a try in my eyes.

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 11 Apr 21 7:14

Got a result ... but that does not make me really happy and gives me rather less certainty :-(

1) I set up a new catalog in a complete new folder.
2) I opended it, kept all settings as deault but added "Read IDimagaer ICS sheme (if availabel)
3) I impored the one root folder, in which all my images are in
4) I waited several hours until all services finished (Synch and thumbs)

The new database seem to have inlcluded all current 24668 images.
The size of the catalog is now 1,057 GB (so a third pretty much differing result inbetween the others)

Many statistics do not match at all. For example the "not Catalog labeled" Info at States shows
- 5484 images in the biggest catalog 1,85 GB
- only 4051 images in the newly build catlog with 1,057 GB.

Stacked Images have not been rebuild be ICS sheme, the new Catalog does not sho a single stacked image.

When I moved to PSU V6 I started to build my catalog from the images completly new and did not use ICS sheme then. In V6 world I decided to stick with English installations to be sure not to mix up with the translated root categories such as People (in english) and and Menschen (in german), right from the start.
Anyway, the new catalog from yesterday (build from XMP and ICS I understand) found 14.994 labeled images in "Menschen" and 10.838 in "People". "Menschen" does not exist in my catalogs since I use V6 and all catalogs have been fully in Sync (no "out-of-Sync images even after Verify folder all), that is definatly the case for the two last backed up catalogs before TTP face recignition (0,566 GB) and after TTP face recognition (1,85 GB).

Are this really drives my crazy and things like that were the reason why I decided to start with complete new catalog in V6. But it seemes to my, that even verify folder all, does not loook for 100% all differences between the current catalog and the image meta data.

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 11 Apr 21 8:38

Step by step I will now try to understand what is all messed up in my images.
So one issue what I found in the 1,85 GB (after TTP) catalog is, that there are images with Areas and Names linked to the area, where the Label of this area is not in the catalog label list:
Screenshot 2021-04-11 092505.jpg
Screenshot 2021-04-11 092505.jpg (366.24 KiB) Viewed 186 times
I believe that happened while working in PSU and is not related to TTP: I will fokus on that a little more below.

The very same picture and in sync in both catalogs is recognised as following after newly build catalog:
Screenshot 2021-04-11 093421.jpg
Screenshot 2021-04-11 093421.jpg (379.95 KiB) Viewed 186 times
So where does this come from in my point of view. (I will have to write this later, as I have family responsibilites now...coming back later.

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 11 Apr 21 14:22

When I had done the job in TTP with recognisizing about 240 different people that I gave a name to (yeah, what a Job) and thousands of automatic face detections, I brought those informations to PSU by "verify folder quick" and than Read all metadatas.

That worked out very good, but led of cause to a very flat list of many peoples who have not been in the catalog before collected directly under the lable people. All faces belonging to people that I already had in my hierachicle catalog (such us People.Family.Sven -> myself) have been merges automatically though.

I then had to structure those people that have been new to PSU. Some have not been new but I had them in the catalog by a slighly other name, so I had to merge some of the people. For this structuring process I decided to set PSU to travel mode, so that Synching would be processed after finalising tha strucuring work. I did so, to avoid synching images with mor then one person twice or even more often.

The label "Ammelie" which you see in the example images of my last post is a good example of these. Because the label Ammelie was only being produced by TTP and was new to PSU then. When I performed the work in TTP I wasn't sure about the name of this girl and called her Ammelie, later I remembered her name was Emmalie. And I used this new name from then. So while being in travel mode I merged something like this: Merge People.Ammelie to a label People.Frriends.Vacationfriends.Emmalie. On processes like this one, there must have been something gone wrong.

But the result currently is as following:

Catalog-Hierarchy of the 1,85 GB Catalog processed with a lot of semi-automtic TTP and manuell PSU work:
Screenshot 2021-04-11 150959.jpg
Screenshot 2021-04-11 150959.jpg (22.11 KiB) Viewed 167 times
As already mentioned: No image out-of-sync, no result on "verify folder all" search
So my expectation is, that PSU wrote the latest informations into the files (XMP and ICS).


Catalog Hierarchy of the newly build catalog 1,057 GB Catalog
Screenshot 2021-04-11 151254.jpg
Screenshot 2021-04-11 151254.jpg (45.88 KiB) Viewed 167 times
So, what I don't understand:

Why are there still Names like Ammelie in the image meta informations, that PSU V6 finds on import, when there is no out of sync detection before from a catalog where I don't have this name in my catalog anymore.
It would be correct if the name Ammelie would not appear in both, the name "Emmalie" should have been linked to the face of the left girl. And the name Leonie should be labeled only once from PEOPLE and not from MENSCHEN-branch.

Hert
Posts: 6775
Joined: 13 Sep 03 7:24

Re: Is increase of database after area detection comprehensible?

Post by Hert » 12 Apr 21 11:21

So my expectation is, that PSU wrote the latest informations into the files (XMP and ICS).
The out-of-sync indicator is something that PSU keeps. If metadata in the file changes afterwards then PSU has no knowledge about that.
If you want to be sure that what PSU has in its catalog is in the metadata then you must write to the file first.

Check that particular file's metadata (right click -> Run from Repository -> Metadata -> Full Exif dump) and check where it has the Ammelie in the metadata. All I can do is guess, which won't help you at all.

Robosoc
Posts: 36
Joined: 11 Apr 10 9:56
Location: Germany

Re: Is increase of database after area detection comprehensible?

Post by Robosoc » 13 Apr 21 7:30

Sorry, Hert, by trying to explain and show exactly what I wonder I was for sure giving to much information.

Let me try to narrow one of my problems down:
In an existing catalog I would expect that "Veryfy folder all" will find any difference in metadata between what is stored in the catalog and what is stored in the metadata of the image itself. After Verify I should be able to decide if I want to read the imagedata to the catalog or the other way around to have those images in snyc. If I abort this decision the image should be marked out of synch in PSU.

This is my understanding of PSU and therefore my expected bahavior.

The complete story above proofes that this is not the case, at least not on that image, allthough PSU is able to find all informations within the image, when I completly import the image. So "Verify" does not find all diferences...how can I scan allready imported folders for all changes?

Hert
Posts: 6775
Joined: 13 Sep 03 7:24

Re: Is increase of database after area detection comprehensible?

Post by Hert » 13 Apr 21 8:31

In an existing catalog I would expect that "Veryfy folder all" will find any difference in metadata between what is stored in the catalog and what is stored in the metadata of the image itself. After Verify I should be able to decide if I want to read the imagedata to the catalog or the other way around to have those images in snyc. If I abort this decision the image should be marked out of synch in PSU.
"Verify Files All" finds the files that are changed compared to what PSU has stored in its database. It does so by comparing the binary signatures of the file with the binary signature that PSU keeps in the catalog.
If, for a changed file, you decide to read metadata from the file then PSU will read metadata to the catalog using your existing Sync-Read settings...exactly the same as a manual right click -> Metadata -> Read Metadata from File.

If you think this file is incorrect, then use right click -> Metadata -> Read Metadata from File to reproduce this. The result should be exactly the same. Afterwards check what is in its metadata (as explained in my previous reply) to analyze the result.

Post Reply