Why does synching increase catalog file size?

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Why does synching increase catalog file size?

Post by fbungarz » 29 Nov 16 13:43

Hi,
I noticed something rather bizarre: my catalog database is getting fairly large even though I compact quite regularly. What is rather strange: I make an effort to try to keep catalog and XMP inside the images in sync. So far I thought both sets of data, the information stored in the catalog and the information stored as XMP inside the image files was independent from each other: here the catalog, a SQLite database and there the image files.

Now, somehow this does not seem to be the case! When I sync an image file (CTRL+S) the data from the catalog get written to the image as XMP, i.e., the XMP header inside the image file gets updated. I can check that with some independent tool like ExifTool or move the image to a second computer, there import the image into a different PSu database and the data is there. So, the information is actually stored as an independent data block inside the image. With more and more XMP stored inside the image file the size of the file increases, but insignificantly so.

So far so good.

But this is what I believe is rather strange: syncing the catalog information (TRL+S) into the image files the catalog actually grows !!! And the increase in catalog size can actually be fairly substantial! Say, with auto-sync off, I add one single label to 20,000 files. These 20,000 files are now shown as "out-of-sync". Adding the labels results in a slight increase of the catalog file size. That is to be expected: the new information, which files now have the new label gets written to the catalog. But then syncing the files should, in my opinion, not result in an additional increase of the catalog size! After all, syncing the images no new information is added to the images! For the catalog only one single tag is affected: the sync status. before that tag was set to "out-of-sync", now it is changed to "in sync". Why does a simple database entry for one singe field (the sync- status field) result in any file size increase of the catalog file? And the difference is quite substantial: several 100 MP for 20,000 images.

The result: my catalog keeps growing. Compacting reduces this growth a bit again, but still the end result: the catalog remains larger after syncing than before. Why?

Is there a way to somehow avoid this? For example, which procedure will result in a smaller catalog:
(1) I add one label to 20,000 images, then sync. I add a second label, then sync, I add a third label, then sync.
Or
(2) I add 3 labels, then sync.

In the both cases three labels were added, so I expect the database to grow. But in the first instance syncing was executed three times, each time increasing the catalog size. So presumable one ends up with a larger catalog then?

And: if syncing results in growth of the catalog file, will "auto-sync" generally contribute to excessive database growth?

For anyone who has doubts, you can actually watch it happen:
While syncing takes place have a look at the folder where the catalog is stored:
Apart from the default catalog "idimager.cat" a second 7-12 KB file gets created every time a sync operation is executed: "idimager.cat-journal". At the moment that this 7-12 KB file disappears, the catalog file "idimager.cat" grows...
[The "idimager.cat-journal" file is also created btw, when the catalog is being compacted, when you assign labels, add metadata, it is just where the data temporarily gets stored before it is actually written to the database...]

Thanks for clarifying this,

Frank

Hert
Posts: 6329
Joined: 13 Sep 03 7:24

Re: Why does synching increase catalog file size?

Post by Hert » 29 Nov 16 22:39

Here's explained how SQLite handles journal files;
https://www.sqlite.org/tempfiles.html
This is a User-to-User forum which means that users post questions here for other users.
Feature requests, change suggestions, or bugs can be logged in the ticketing system

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 29 Nov 16 23:34

Very interesting link, though I still don't quite understand, why the database grows as a result of simply syncing the files. Theoretically, at least once being compacted it should again be the same size as before syncing, shouldn't it (If I correctly understand what is going on the growth is just a temporary result of the incremental journal file updates, that overhead space again being "cut away" as part of the vacuuming/compacting process...)

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 30 Nov 16 10:52

Just to give you an idea, why I am concerned about this:
I used the script "Macro Command to Catalog Label" from the resource repository to generate labels from XMP for ca. 20,000 photos.
With auto-sync off my database grew to about 7.2 GB (after compacting).
I now have synced ca. 50% (ca. 10,000 files), the database now is 9 GB - before compacting. After compacting, the database is 7.8 GB. That is still a considerable increase in size.

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 07 Dec 16 8:40

My catalog database "idimager.cat.db" is now approaching a file size of 9Gb - after compacting! And I still have about 22,000 images that need re-sync after relocating a branch of my catalog tree.
I really do not understand this!

All I did is create a new label category and move one branch with keywords assigned to ca. 14,000 images into that new category. Because most of my images have at least three versions ca. 30,000 images went out-of-sync. I started syncing these files in batches of ca. 3,000 files. The catalog file size increases as a result of the sync. After compacting it is still larger then it originally was.

Why does simply relocating labels increase the catalog size?
Why does syncing files increase the catalog size?

I am regularly using a MySQL database with 117,128 specimen records. The whole database is just 354 MBs. And it is a complex database with lots of relational tables. My PSu catalog contains 106,847 image records, but its file size is now almost 9GB !
OK - this is probably comparing "apples and oranges", a completely unfair comparison. Still, I don't quite understand why simply reorganizing the catalog and syncing files results in a substantial increase in catalog file size...

[PS: I am not talking about the thumbnail database. I understand that this si large simply because it stores the thumbnails and previews, in my case 24GB]

tstoddard
Posts: 578
Joined: 07 Sep 12 12:51

Re: Why does synching increase catalog file size?

Post by tstoddard » 07 Dec 16 13:39

Frank,

I have no expertise in this area so please don't take anything I say as fact, but, I was curious about your situation so I started to poke around my catalog database with a SQLite client that I downloaded and looked at some of my tables and how many records they contained. The idSearchData table seems to be one that holds many records. Mine has almost 7 million records in it and I have just under 50,000 items in my catalog. Interestingly, I opened PSU and picked a random image that I imported a few months ago that is not in a version set and has 3 labels assigned. I did a "Save metadata to file" on that image and looked at my idSearchData table again. 6 new records had been added to that table. I repeated the exact same thing on the exact same file and each time 6 more records were added to that table.

This is something that I would want Hert to weigh in on before doing anything, but perhaps rebuilding the search data would get you some of your diskspace back. Apparently, that operation can take a long time so be warned.
Tom Stoddard

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 07 Dec 16 15:05

...perhaps rebuilding the search data would get you some of your diskspace back
That would be great. I too have occasionally looked at the database using SQLite Expert. That table indeed is quite gigantic.
I noticed that rebuilding the search data takes ages though. So; it would be great to hear from Hert, if that is actually worth the effort...

Thanks,
Frank

PS: My catalog has now grown to 9.1 GB, with still 18,000 images to be synced...

tstoddard
Posts: 578
Joined: 07 Sep 12 12:51

Re: Why does synching increase catalog file size?

Post by tstoddard » 07 Dec 16 16:54

fbungarz wrote:PS: My catalog has now grown to 9.1 GB, with still 18,000 images to be synced...
It would be interesting to know how much of the 9.1 GB the idSearchData table is occupying. I suspect it is a significant portion.
Tom Stoddard

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 07 Dec 16 20:22

Hi Tom,
I just opened the database with SQLite. My search table is indeed gigantic: 320,621,159 rows = more than 300 million records!
But that said, the search table is a "virtual table"... whatever that means :?
I have looked at https://sqlite.org/vtab.html trying to understand what virtual tables are and if they contribute to the database file size. Assuming that they simply are tables virtually replicating information that is physically stored elsewhere I am not so sure the size of that table should matter much :!: :?:
But honestly, I really do not know if that is true...

If, what you suggest is right, then re-building the search data probably would result in a much smaller database. If, however, the table does not physically exist, but is essentially only an SQL statement referring to other tables - then the size of that table should not actually matter.

It would be really nice to have Hert chime in and provide some inside if re-building the search data would have any effect in reducing the size of my database [incidentally before compacting the database is now 9.3GB large, still ca. 15,000 images that need re-sync...].

Cheers,
Frank

tstoddard
Posts: 578
Joined: 07 Sep 12 12:51

Re: Why does synching increase catalog file size?

Post by tstoddard » 07 Dec 16 23:36

Frank,

After doing some searching on virtual tables in SQLite, I believe that they do consume at least some disk space and potentially a lot of disk space. From what I read, they are tables that are created in order to take advantage of FTS (full text search) functionality. Depending on how the tables are designed, they can contain a lot of text data or they can just reference other documents. Either way, they still need to store some data in order to do that.

What I mainly question is why would additional rows be added every time a sync occurs even when nothing has changed. In the experiment I performed, I didn't change anything, I just repeatedly synced the same file and each time records were added. I probably shouldn't speculate why that happens but I will anyway. I imagine that the overhead of checking to see if something has changed or if duplicate or redundant data already exists in the idSearchData table would result in a performance hit. The whole purpose of free text searching is to improve performance.

If I end up being correct about all of this then I would say that rebuilding the search data periodically should be a regular maintenance task, just like backing up and compacting your database. If I get time tonight, I plan to experiment on my database or a copy of it. Since mine isn't as large as yours, I expect that I will be able to rebuild my search data and then compact my database in a lot less time than it would take you. I'll let you know the results.
Tom Stoddard

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 08 Dec 16 0:41

Looking forward to hearing the results of your experiment.
Thanks!
Frank

BTW - Looking at SQLite Expert I see that there is an option to empty a table. Perhaps rebuilding the search data would be a lot faster then... I guess, in theory, if your hunch is correct, one could empty the table, compact, rebuild the search data, compact again - that would result in a completely new, fresh search table AND a much smaller database !?

tstoddard
Posts: 578
Joined: 07 Sep 12 12:51

Re: Why does synching increase catalog file size?

Post by tstoddard » 08 Dec 16 13:30

Frank,

It didn't work as I had hoped it would. On the contrary, my catalog increased in size significantly. Now I'm wondering if I should use the new larger database or go back to the old one. Is there search data that the old one is missing? I was not having any issues with performance before doing this test so I think I'll just go back to my original one.

I also used the sqlite analyzer command line tool to analyze a copy of my database and it does appear to me that the search data is what uses a majority of the disk space (about 90%) that the catalog consumes. Here's an excerpt from the output of that tool:

Code: Select all

*** Page counts for all tables with their indices *****************************

IDSEARCHDATA_SEGMENTS............................. 1901031     59.3% 
IDSEARCHDATA_CONTENT.............................. 850463      26.5% 
IDIMAGEDATA....................................... 227217       7.1% 
IDSEARCHDATA_DOCSIZE.............................. 105666       3.3% 
IDCATALOGITEM..................................... 47166        1.5% 
IDCATALOGITEMDEFINITION........................... 41757        1.3% 
IDIMAGEVERSION.................................... 8550         0.27% 
IDPROP............................................ 1006         0.031% 
IDVERSIONPLACEHOLDER.............................. 916          0.029% 
IDIMAGECOLLECTIONITEM............................. 453          0.014% 
IDFILEPATH........................................ 234          0.007% 
SQLITE_MASTER..................................... 85           0.003% 
IDIMAGECOLLECTION................................. 27           0.0% 
IDPLACEHOLDER..................................... 26           0.0% 
IDCOLLECTIONINFO.................................. 25           0.0% 
IDSEARCHDATA_SEGDIR............................... 21           0.0% 
IDPROPGROUPCONTENT................................ 20           0.0% 
IDIMAGEAREA....................................... 18           0.0% 
IDIMAGESCORE...................................... 8            0.0% 
IDIMAGECOMMENT.................................... 7            0.0% 
IDIMAGECARD....................................... 6            0.0% 
IDIMAGEMODEL...................................... 6            0.0% 
IDUSERUSER........................................ 6            0.0% 
IDIMAGEGALLERY.................................... 5            0.0% 
IDMULTIUSERNOTIFICATION........................... 5            0.0% 
IDSYNCXMP......................................... 5            0.0% 
IDUSERFAVORITE.................................... 5            0.0% 
IDUSERUSAGE....................................... 5            0.0% 
IDIMAGEBOXUSAGE................................... 4            0.0% 
IDIMAGEIPTC....................................... 4            0.0% 
IDIMAGESUBSCRIPTION............................... 4            0.0% 
IDPROPGROUP....................................... 4            0.0% 
IDPROPRELATION.................................... 4            0.0% 
IDUSER............................................ 4            0.0% 
IDUSERPROP........................................ 4            0.0% 
IDMEDIUMINFO...................................... 3            0.0% 
IDPROPGROUPPATHCONTENT............................ 3            0.0% 
IDPROPGROUPPERIODCONTENT.......................... 3            0.0% 
IDSYNCTHUMB....................................... 3            0.0% 
IDTEMPFILELIST.................................... 3            0.0% 
IDTEMPLIST........................................ 3            0.0% 
IDIMAGEBOX........................................ 2            0.0% 
IDPROPCATEGORY.................................... 2            0.0% 
IDSYNCIMAGEIPTC................................... 2            0.0% 
IDSEARCHDATA_STAT................................. 1            0.0% 
IDVERSIONINFO..................................... 1            0.0%
I think I'm done experimenting for now. Hopefully, Hert will be able to enlighten us as to why this data keeps growing so significantly and whether or not there is anyway to gain control over that growth.
Tom Stoddard

fbungarz
Posts: 1574
Joined: 08 Dec 06 5:03
Location: Arizona, USA

Re: Why does synching increase catalog file size?

Post by fbungarz » 08 Dec 16 14:14

Hi Tom,
funny, I did some experimenting myself:

I made a backup and then used SQLite Expert to delete the contents of the search table, then via vacuum compacted the database.
The result: my 9.2 GB database shrunk down to a 740 MB database!
I then opened the database in PSu, no problems, everything still there. Only the search does not work. So, in PSu I invoked the re-build search data. The database grew from 740 MB to 3.7 GB. I have not done a lot of testing yet, but it looks like everything is still there and the search now also works again.

Now I wonder: should I continue to use the 3.7 database or go back to the 9.2 GB original one? Of course I do not want to loose any data ...

To me it looks like that re-building the search data does not clear the old data from the search table but instead simply indexes everything again and thus adds to that table. This would explain, why in your case your database grew quite substantially. Likewise, as you suggest, syncing the catalog data to the files apparently also adds to the search table. Compacting the database apparently does nothing to remove that overhead. Thus, over time this table grows and grows and grows. In my case: migrating from IDI to PSu I did a lot of cleaning up, re-arranging catalog labels, adding new labels, adding XMP via mapped labels. All this would explain why my database increased very substantially. The search table simply grew and grew and grew ...

But of course, unfortunately I cannot be certain that this is really the case !? The database compacts, the search works, as far as I can tell even the last additions are all there. Still I am worried. It would be good to get some confirmation that deleting and re-building the search data did not do any damage, did not inadvertently cause any data loss ...

Cheers,
Frank

tstoddard
Posts: 578
Joined: 07 Sep 12 12:51

Re: Why does synching increase catalog file size?

Post by tstoddard » 08 Dec 16 16:01

fbungarz wrote:Now I wonder: should I continue to use the 3.7 database or go back to the 9.2 GB original one? Of course I do not want to loose any data ...
I would err on the side of caution and keep using the 9.2 GB database until Hert confirms that doing what you did was safe and won't cause issues in the future. Perhaps he might even improve his rebuilding script to automate what you have done so that it can all be done from within PSU.

Maybe we're both making too much out of this database size issue but since I keep my PSU data on my main system drive, which is a 250 gig SSD, disk space does matter.
Tom Stoddard

vkfoto
Posts: 111
Joined: 19 Oct 16 3:51

Re: Why does synching increase catalog file size?

Post by vkfoto » 08 Dec 16 16:53

Watching this from the sidelines is a bit disconcerting. Wonder how it is affecting potential users? :(
1 It's by design and bad things might happen if it is changed or if we tinker in there. :?
2 It's a known issue and either
2.1 it's being worked on, 8) or
2.2 there is no easy solution, :( or
2.3 disk space is cheap so move on. :cry:
3 No one is paying attention. :shock:

Post Reply