Idea: synchronization based on pre-generated checksum files

Posts: 2 · raven 29 Jul 2018, 06:40

That is, I specify in the program the generated checksum file for Destenation or Source, based on it the program determines what to copy.
Why this is important: at this stage, when synchronizing over a network (ftp or Samba), you actually have to copy all the content from the network to compare.
With a low data transfer speed, this takes too much time.

Posts: 2615 · Plerry 30 Jul 2018, 07:08

This has been suggested and discussed before.

An FFS sync runs only on one machine, e.g. the machine of your left base location.
In said case, determining the checksums of files in the left base location (and subtree) could be quick.
However, for determining the checksum of the files (and subtree) in the right, remote base location, all data still needs to traverse to your left base location local machine, as that is where FFS is running.
Then you might just as well directly compare the (full) remote data with the local data, rather than first calculate the checksums of the local and remote data and then compare the checksums ...
If each base location would have a local machine determining the checksums that would be another story, but that would require software to run at all locations involved in the sync, not just one.
Perhaps OK if you "control" all locations involved.
But the advantage of FFS is that it only needs to run in one location, while allowing you to run syncs to, from and between locations where you have "just" file access.

Having pre-stored checksum files is generally a bad idea. It tells you the checksum at the time it was calculated. Even if the checksum data would be based on first writing and then reading the just written data (something that is hard to guarantee, as it might be read from cache rather than from disk), the stored data might have been corrupted since calculating and storing the checksum.

Posts: 2 · Fossie 3 Aug 2020, 09:54

I'd like to add a vote for this to be implemented.

It would be a great way to allow for data scrubbing. Every time a file is updated, recalculate the checksum. Schedule periodically verification of checksums. If a mismatch is found, we know the file has been corrupted, and should be given the option to replace it with the copy in the other location (after verifying that copy is good).

I do control both environments and I expect most people do, if not they're probably using a cloud service, which pretty much makes FreeFileSync obsolete in terms of supplying a backup storage solution.

Posts: 2615 · Plerry 4 Aug 2020, 08:15

Your vote is for using chechsums for validation, which is something different than using chechsums to determine which files need to be copied, as was proposed by TS raven.
In either case, the first part of my my earlier reply still holds.

A for your (Fossie's) proposal: as FFS only runs in one location, having FFS compare the chechsum to the stored data requires all data to be transferred to the machine running FFS. Then FFS might just as well simply directly compare the data of the left and right location rather than calculating the chechsums.

Obviously, nothing prevents you from using a dedicated data verification tool based on checksums, running at either of the sync locations (if you have control over both ends).
But (at least in my view) it does not make sense to include this functionality in FFS, just like e.g. for the frequently suggested duplicate-finder function.

Posts: 2 · Fossie 4 Aug 2020, 14:11

Thanks for your reply, Plerry.

The way I see it, adding validation functionality would only improve the quality of what FFS is doing. Rather than just copying files and make sure they appear updated in two locations, one could argue that the users primary concern and interest is in the contents of the files. This requires some level of validation/scrubbing.

If FFS is not the way to go to this end, do you know of any data verification tools that might be suitable? Googling only seem to lead me to database cleaning products :/

Best regards

Posts: 2615 · Plerry 4 Aug 2020, 16:37

It obviously differs per platform, but for Windows see e.g. here.