Integrity of JPG and DNG files: Any suggestions for managing bitrot?
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Integrity of JPG and DNG files: Any suggestions for managing bitrot?
I had a file system corruption last night (*), and had to restore about 6000 files from backup.
Afterwards, I started thinking more about image file integrity. I think I don't know for sure that all the other image files on the main drive escaped damage. Equally, I don't know for sure that there is no bitrot in the backups (or indeed in the originals).
Ideally I'd like to be able to check
(i) that every image file in the catalog is still properly formed, and
(ii) through MD5 or other hash, exactly which files have changed since the last time I ran this check. (And then ideally I could compare this with Photo Supreme's list of files which are known to have changed.)
At the moment, if there were a problem with the integrity of specific images on my working drive, I might not find out for years when one day I want to open the specific file. By that time, the same error might have propagated through all the layers of my file backup system.
I had a look online for tools which can do the file structural integrity check (in Windows 10). The closest I could find was ImageVerifier ... which unfortunately seems to have been discontinued in 2016. I am looking into doing automatic MD5 checking in SyncBack which I use for the backup system; but assuming this works it still would not be in any any way integrated with Photo Supreme's knowledge of what files are actually expected to have changed.
Just curious:
(a) Does Photo Supreme have capabilities in this area? As far as I could see, it checks the integrity of the database but doesn't address the ongoing integrity of the image files themselves.
(b) What do other people do to address this concern?
David
PS (Note *) The source of the file system corruption is very annoying. I recently moved from a fast desktop to a fast notebook. So I got a large SSD and put it into a USB3 housing. I started copying files from the main drive. When using this external drive, the SSD would then freeze randomly ... which then locked up my system completely (what inane design lets Windows freeze just because one external drive has a problem??) ... which caused corruption on the drive from which I was copying files. I wasn't even writing to that drive ... And to add to the annoyance, the reason for the freezing seems to be that the disk case can not deliver enough power for the peak demand when writing to the SSD (which I bought, with advice, in the same transaction as the case). I read in the Corsair forums that SSD vs HDD could have lower average power demand but higher peak power demand. When I put the SSD into my external drive dock, the freezing problems disappeared.
Afterwards, I started thinking more about image file integrity. I think I don't know for sure that all the other image files on the main drive escaped damage. Equally, I don't know for sure that there is no bitrot in the backups (or indeed in the originals).
Ideally I'd like to be able to check
(i) that every image file in the catalog is still properly formed, and
(ii) through MD5 or other hash, exactly which files have changed since the last time I ran this check. (And then ideally I could compare this with Photo Supreme's list of files which are known to have changed.)
At the moment, if there were a problem with the integrity of specific images on my working drive, I might not find out for years when one day I want to open the specific file. By that time, the same error might have propagated through all the layers of my file backup system.
I had a look online for tools which can do the file structural integrity check (in Windows 10). The closest I could find was ImageVerifier ... which unfortunately seems to have been discontinued in 2016. I am looking into doing automatic MD5 checking in SyncBack which I use for the backup system; but assuming this works it still would not be in any any way integrated with Photo Supreme's knowledge of what files are actually expected to have changed.
Just curious:
(a) Does Photo Supreme have capabilities in this area? As far as I could see, it checks the integrity of the database but doesn't address the ongoing integrity of the image files themselves.
(b) What do other people do to address this concern?
David
PS (Note *) The source of the file system corruption is very annoying. I recently moved from a fast desktop to a fast notebook. So I got a large SSD and put it into a USB3 housing. I started copying files from the main drive. When using this external drive, the SSD would then freeze randomly ... which then locked up my system completely (what inane design lets Windows freeze just because one external drive has a problem??) ... which caused corruption on the drive from which I was copying files. I wasn't even writing to that drive ... And to add to the annoyance, the reason for the freezing seems to be that the disk case can not deliver enough power for the peak demand when writing to the SSD (which I bought, with advice, in the same transaction as the case). I read in the Corsair forums that SSD vs HDD could have lower average power demand but higher peak power demand. When I put the SSD into my external drive dock, the freezing problems disappeared.
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
BTW I read that DNG has a file hash embedded, but the only ways to check it (that I could find) are with Lightroom (which I don't use) or by opening every file individually with Adobe Camera Raw.
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
If I remember correctly, for any file being verified for the first time by running Verify Folder, a file signature is calculated and stored it in the database. On later uses Verify calculates the file signature and compares with the stored value - if they don't agree "File Changed" is reported. Presumably if a file is good to start with and does not change, then it cannot have become corrupted and this should give some confidence. Unfortunately there are many valid ways in which a file may change (I think the signature is based on the image file together any sidecar files) but no easy way of checking what has changed, and it is all too easy to decide to update the stored signature without noticing corruption has occurred ( I think I fell foul of this a few months ago on a small number of images).
As Image Verifier is no longer being updated or sold it would be a real benefit if PSu could, in some way, check the integrity of the image data in a file. I suggest it is actually a legitimate and indeed important function of a Digital Asset Management system to do this.
As Image Verifier is no longer being updated or sold it would be a real benefit if PSu could, in some way, check the integrity of the image data in a file. I suggest it is actually a legitimate and indeed important function of a Digital Asset Management system to do this.
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; SSD; 16GB DDR3 SDRAM; Win10x64)
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Firstly, sorry to hear about your problem. Such issues are grieving.David Grundy wrote: 16 Jun 18 6:01 I had a file system corruption last night (*), and had to restore about 6000 files from backup.
Afterwards, I started thinking more about image file integrity. I think I don't know for sure that all the other image files on the main drive escaped damage. Equally, I don't know for sure that there is no bitrot in the backups (or indeed in the originals).
Secondly, I didn't read the whole post, but I do have the following comments. This is the way I attempt to future proof my archive, because nobody have the time to go through all files to check for corruption.
I have not trusted "copy", "import and delete" or such commands for many years. I'm using an app called ChronoSync (there are surely alternatives) which verifies bit-for-bit that all 'copies' are in-fact clones. This should ensure against bitrot into the future, as far as I know.
You can make several backups of your entire system and an app such as ChronoSync will enable you to use a 'clean' old backup and update with all new files (assuming that they are clean too). If you are able to determine that the bitrot started on a certain date then you can use the appropriate backup and then add the latest files. It's a chore but it does not happened often (hopefully) and worth the investment to have 2 or 3 external backup drives.
Hope this helps?
Never say never change, but using Mac since 2005. Photo Supreme 3.3.0.2605. I endorse the interoperability of files between applications and systems.
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Thanks for those thoughts. I had forgotten that "verify folders" does integrity checks on the files themselves, and this appears to be what I am looking for. I ran "verify" on the main branch of the photo folder tree last night (several hours) and it found "147" changed files, of which 145 turn out to be corrupted. I now have to search back through my backups and see if I have uncorrupted copies somewhere, or whether the curruption pre-dates my backup life cycle. No way to know when the corruption happened, but I'm hoping it was in the same file corruption episode a few days ago in which case it will be easy to restore them.
I used to be pretty careless about backing up all my files. (I had a system but was not careful about following it.) Following a huge scare 6 months ago when I thought my wife lost 3 years of recent files, I realised how much I value all these files and I am now much more fanatical about backups. I'm using SyncBackSE which seems pretty good, but until recently I hadn't got around to enabling features such as copy verification and keeping old versions of files. I've had to spend some money to create the backup disk capacity and connectivity to make it more robust, but since that scare it seems obviously worthwhile.
I used to be pretty careless about backing up all my files. (I had a system but was not careful about following it.) Following a huge scare 6 months ago when I thought my wife lost 3 years of recent files, I realised how much I value all these files and I am now much more fanatical about backups. I'm using SyncBackSE which seems pretty good, but until recently I hadn't got around to enabling features such as copy verification and keeping old versions of files. I've had to spend some money to create the backup disk capacity and connectivity to make it more robust, but since that scare it seems obviously worthwhile.
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Well, it turns out that the file verification under Photo Supreme's "Verify Folders" process may be incomplete.
I had a look in Windows Explorer at the affected folder where Photo Supreme had found 145 files "changed" which on closer inspection turned out to be corrupted. In Windows Explorer:
I regret that in the course of recovering the system as below, I did not think of keeping a copy of the corrupted files for any future investigation.
Recovery process
After the above analysis, I had a look at a few recent generations of backups and found that Windows Explorer had no problem with any of these files even in the most recent backup sets. Luckily I had SyncbackSE configured to use the "fast" method of checking that files did not change, so it only updates the backups for files which have changed name/size/date. Since the corrupted files did not change these attributes, the backups were not updated with the corrupted files. As a happy consequence my most recent backup contains up-to-date versions of every file I've actively touched recently, and the uncorrupted versions of all old files.
This was an interesting outcome - I had been thinking recently about moving SyncBack to a hash comparison to check which files had changed. It seemed obviously better. But if I had done that, the recovery process this time would actually have been significantly harder. I wonder on balance which approach is better? Now I am thinking that rather than using hash comparison as part of the backup process, I should have a separate process which tracks file changes/corruption by hash checks.
So now I have deleted the whole of the recently created data drive, and copied everything to it from the most recent backup. This gets me pretty much back to where I should be. Or at least, that is where I think I am up to ...
I had a look in Windows Explorer at the affected folder where Photo Supreme had found 145 files "changed" which on closer inspection turned out to be corrupted. In Windows Explorer:
- Windows Explorer finds 1826 DNG files in that folder, which agrees with Photo Supreme. These are all from a trip to India in 2005.
- Windows Explorer could render a thumbnail and a preview for most of them, but not for the 145 files flagged by Photo Supreme as "changed".
- Additionally Windows Explorer showed another 3 files, not flagged as "changed" by Photo Supreme, for which it could not render thumbnail or preview.
- Adobe Camera Raw agreed with Windows explorer that these 148 files are all corrupted and not usable.
I regret that in the course of recovering the system as below, I did not think of keeping a copy of the corrupted files for any future investigation.
Recovery process
After the above analysis, I had a look at a few recent generations of backups and found that Windows Explorer had no problem with any of these files even in the most recent backup sets. Luckily I had SyncbackSE configured to use the "fast" method of checking that files did not change, so it only updates the backups for files which have changed name/size/date. Since the corrupted files did not change these attributes, the backups were not updated with the corrupted files. As a happy consequence my most recent backup contains up-to-date versions of every file I've actively touched recently, and the uncorrupted versions of all old files.
This was an interesting outcome - I had been thinking recently about moving SyncBack to a hash comparison to check which files had changed. It seemed obviously better. But if I had done that, the recovery process this time would actually have been significantly harder. I wonder on balance which approach is better? Now I am thinking that rather than using hash comparison as part of the backup process, I should have a separate process which tracks file changes/corruption by hash checks.
So now I have deleted the whole of the recently created data drive, and copied everything to it from the most recent backup. This gets me pretty much back to where I should be. Or at least, that is where I think I am up to ...
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
It would be useful to understand why they were missed. Is it possible that these three files had been added to the catalog after your last verification of the folder? If that is the case then I believe their signatures would not have been in the catalog - I think Hert once explained, unless I misunderstood, that it is the first use of Verify Folder that does this and not the normal import process.David Grundy wrote: 17 Jun 18 8:38 From this I conclude that Photo Supreme's file verification had missed three files which had been corrupted and could not be opened.
Jim (Photo Supreme: AMD Quad-Core A8-5500 Accelerated Processor 3.2 GHz; SSD; 16GB DDR3 SDRAM; Win10x64)
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Unfortunately I don't see any comfort there. These folders were originally ingested to IDImager and converted to DNG on 14 Feb 2012, and the files in that folder (1826 files in total, at least according to today's count) have not been modified since then. The same time stamp (within a few minutes) is on all the files in the folder - other than some inadvertent out-of-camera jpg files with year-2005 timestamps. Since ingesting they have not been rated nor tagged. The specific files in question were not different from the others in any obvious way and they haven't had any special attention as far as I can recall. Certainly, I have not thought about any of these files in the last few years, until the "verify" process flagged them as changed.jstartin wrote: 17 Jun 18 13:13 It would be useful to understand why they were missed. Is it possible that these three files had been added to the catalog after your last verification of the folder?
Actually it was a pleasant trip down memory lane, which was some compensation for the stress in trying to sort out the implications of the file corruption! I found myself thinking, Hmmm, perhaps I should do more trips like that. And then I found myself thinking that actually I'm quite happy NOT doing any more trips like that.
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
I have only skimmed this thread but FWIW BeyondCompare is an excellent tool for checking all is well. Very configurable and powerful (though there's a learning curve to exploit its capabilities). Not free but IMO worth the cost in situations like this.
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
I used to use ExactFile (ExactFile.com), but that doesn't really work with PSU for me. That's because I prefer to write XML to the image files, so if I change or restructure my catalogue labels, that gets written to the files and they're flagged as potentially corrupt - potentially hundreds of thousands of them at a time. It should work on an off-line archive though, and might be made to work if you wrote to sidecar files and excluded them (though can't now remember if ExactFile could do that).
I suggested it as something that would be useful a couple of years back; thread here: https://forum.idimager.com/viewtopic.ph ... 15#p110733
I suggested it as something that would be useful a couple of years back; thread here: https://forum.idimager.com/viewtopic.ph ... 15#p110733
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Thanks, I will take a look at BeyondCompare and ExactFile to see how they work. Also I still need to make time to look at SyncBack again (since I already have it) so that I can see whether there's a way to maintain a hash database of originals and/or backups.
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
ChronoSync does not flag files incorrectly as Mke mentions when using ExactFile.
Never say never change, but using Mac since 2005. Photo Supreme 3.3.0.2605. I endorse the interoperability of files between applications and systems.
-
- Posts: 243
- Joined: 13 May 07 15:40
- Location: Hong Kong
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Unfortunately (for us Windows users) Chronosync seems to be for MacOS only.
Incidentally, one thing that makes file mirroring harder for me is that all my drives (other than the system drive) are encrypted by Veracrypt, for which VSS (Windows's Volume Shadow-copy Service) apparently does not run. As a result backup services which rely on VSS (ie most of them) are unreliable on my system.
Incidentally, one thing that makes file mirroring harder for me is that all my drives (other than the system drive) are encrypted by Veracrypt, for which VSS (Windows's Volume Shadow-copy Service) apparently does not run. As a result backup services which rely on VSS (ie most of them) are unreliable on my system.
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
I have a suggestion for this issue, that could be a feature request.
DNG files have a MD5 field in their Metadata that checks the image data is not corrupted. I think this is very clever, because a file metadata is often updated. Most of the modification are legit. So if you rely on the MD5 of the full file it become extremely difficult to know if the MD5 has changed because you changed the metadata or because the image part of the file has been corrupted.
Moreover everytime you update the metadata of a file their alos is aspecific "time stamp" xmp field that records the time when the metadata was changed, so it becomes easy with these 3 information to guess the real state of a file:
We could then "verify the image integrity" like in Lightroom
DNG files have a MD5 field in their Metadata that checks the image data is not corrupted. I think this is very clever, because a file metadata is often updated. Most of the modification are legit. So if you rely on the MD5 of the full file it become extremely difficult to know if the MD5 has changed because you changed the metadata or because the image part of the file has been corrupted.
Moreover everytime you update the metadata of a file their alos is aspecific "time stamp" xmp field that records the time when the metadata was changed, so it becomes easy with these 3 information to guess the real state of a file:
- file MD5 changed
- is image MD5 changed?
- yes => image corrupted or modified
- no => is the timestamp different from the backup timestamp?
- yes => metadata modified or corrupted
- no =>metadata corrupted or modified with out timestamp update
- is image MD5 changed?
We could then "verify the image integrity" like in Lightroom
Re: Integrity of JPG and DNG files: Any suggestions for managing bitrot?
Yes, fixity checking would be good for all types.
Related discussion at viewtopic.php?f=57&t=23970&p=110733#p110733
Related discussion at viewtopic.php?f=57&t=23970&p=110733#p110733