[Feature Request] Persistent Hash Database for Long-Term Data Integrity Verification

Discuss new features and functions
Posts: 1
Joined: 27 May 2024

Kevin

Hello to FreeFileSync Team,

I have been a long-time user of FreeFileSync and am very satisfied with the software. Recently, I encountered an issue that I believe highlights a critical feature that would greatly enhance the utility and reliability of FreeFileSync, especially for long-term data backups.

Here's my situation: I have two hard drives that I synced in mirror mode using FreeFileSync three years ago. During this period, the hard drives were not connected to any computer.
To ensure the integrity of my files, I performed a file content comparison today, which checks the hashes of the files on both drives.

To my surprise, 23 files were found to have different hashes, despite being identical three years ago. This indicates that one of the hard drives has developed issues, resulting in corrupted files, even though the timestamps and other file attributes remain unchanged. The problem is that FreeFileSync did not store the hashes in a database during the initial comparison three years ago, making it impossible to determine which drive contains the corrupted data.

Given this, I believe it is essential to implement the following feature enhancements in FreeFileSync:

1. Persistent Hash Database Storage: When performing a file content comparison, FreeFileSync should store the file hashes in a database on the hard drive. This database would serve as a reference point for future comparisons, allowing users to identify which drive has corrupted files and which one has the correct data.

2. Incremental Hash Updates: After the initial comparison, any new files added to the synced hard drives should have their hashes added to the existing database without requiring a full re-comparison. This incremental update would save time and computational resources while ensuring the database remains up-to-date.

This feature is crucial for users who rely on FreeFileSync for long-term data backup and synchronization. It would provide a reliable way to verify data integrity years after the initial sync, ensuring that the mirror remains accurate and any data corruption can be detected and addressed effectively.

I also noticed that a similar feature request was made in 2015, as detailed in this forum post: Forum Post. It appears that this need has been recognized by other users for some time, further underscoring its importance.

Implementing these enhancements would greatly increase the reliability and utility of FreeFileSync for all users, especially those who depend on it for safeguarding important data over long periods.

Thank you for considering this request. I look forward to seeing these features in future updates.
User avatar
Posts: 2388
Joined: 22 Aug 2012

Plerry

> ... I performed a file content comparison today, which checks the hashes of the files on both drives.

FreeFileSync (FFS) does not use hashes.
A compare by content simply compares the entire content (bit-by-bit) between a left- and right-side file.
The reason why FFS does not use hashes/checksums has been explained several times in this forum,
see e.g. viewtopic.php?t=6709&p=22256#p22256 ,
viewtopic.php?t=10407#p39099 ,
viewtopic.php?t=8774#p31263 and
viewtopic.php?t=5512
User avatar
Site Admin
Posts: 7156
Joined: 9 Dec 2007

Zenju

I already programmed exactly that kind of software for my personal use... 16 years ago! So why not share it now:
https://www.mediafire.com/file/utieywhmi1fl2q6/MD5_DB.zip

Usage:
(0. Edit MD5 DB.reg to match the path where you put the tool, then double-click)
1. You create an "md5_db" text file in which you enter the folder paths that contain the files you want to hash. See Example.md5_db.
2. Double-click this file whenever you want to either
a) "resolve issues" which means adding hashes to newly added files, delete hashes of files deleted since last time, or
b) "verify" that the files still match the stored md5 hashes

Caveats:
- No Unicode support
- Excutable only, no Source code (not much to see here anyway, as it's one of the first programs I wrote)
- For paranoid security: somehow (?) manually check that the files really are consistent *after* the hashes have been created. Otherwise the hashes might correspond to already broken files! E.g. this could happen if you copied files from an unreliable medium like an old CD.
- All the above is already available when you simply place your data into archive files like zip, or rar which also store checksums. This makes a tool like "MD5 DB" somewhat moot.
Posts: 998
Joined: 8 May 2006

therube

FWIW...

voidhash
voidhash, the ramble
(Current 7-zip now does xxHash hashes [in addition to earlier hashes].)
(A pretty straight forward GUI), CHK Hash Tool.
All tools will have gotcha's.

(There was another tool out there, that also did "updates", deletions, kind of thing, but the name is escaping me.) -> RHash