Feature request: compare file contents only if date differs

Discuss new features and functions
Posts: 7
Joined: 7 Mar 2014

srcfrgr

I mainly use Freefilesync to sync my desktop and laptop, and use the date/size comparison option, because the file contents option is too slow to be practical on a daily basis.

However, I often find myself having to resolve conflicts for files that are actually identical, simply because the timestamp is different. This happens for example when having the same version control repository on both machines: if I worked and/or pulled from the repository on both sides after the last sync, then, even if the files are identical, they will have different timestamps and result in a conflict.

To resolve this, I think it would be useful to include a new comparison option in Freefilesync, that would have advantages of both current comparison methods:
- first compare using timestamp/size as the criterion
- for those files that have changed since last sync, and which have the same size but different timestamps, compare the file contents, and let the user decide what to do if they are identical, e.g., do nothing, update timestamp to newest. You may want to give the user the option to include those files that have different sizes as well, as small differences can occur e.g. when using different OSes.

With this procedure, only a very small number of files would actually have to be compared using their contents, which would speed up the comparison tremendously.

What do you guys think?
Posts: 85
Joined: 28 Aug 2012

blues12

Great idea. Been trying to get a grip on exactly the same kind of problem. Your proposed solution would handle it nicely.
I strongly second this suggestion.
Posts: 2
Joined: 17 Mar 2014

kalle-r

Great idea. Been trying to get a grip on exactly the same kind of problem. Your proposed solution would handle it nicely.
I strongly second this suggestion.blues12
This is a great idea!
+1
Posts: 7
Joined: 7 Mar 2014

srcfrgr

Any thoughts from the developer on this feature request?
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

Comparing two files by content is roughly just as fast as simply overwriting one with the other. So for "mirror" there's nothing to improve in FFS. For "two way" mode a conflict indeed needs manual resolution. However I'm not convinced that this problem is a valid one: If the version control tool sets the modification time of a file to "now" during checkout it should probably be fixed to preserve it instead.
Posts: 85
Joined: 28 Aug 2012

blues12

Comparing two files by content is roughly just as fast as simply overwriting one with the other. So for "mirror" there's nothing to improve in FFS. For "two way" mode a conflict indeed needs manual resolution. However I'm not convinced that this problem is a valid one: If the version control tool sets the modification time of a file to "now" during checkout it should probably be fixed to preserve it instead.Zenju
I cannot confirm that overwriting is just as fast as comparing content. Maybe it's due to the much slower write speed than read access on my stick, or maybe it's because of fragmentation (the stick is nearly full).

Besides, FFS is also great for just comparing (not necessarily syncing). Comparing everything by content takes a lot more time than a conditional date-diff.

Is there a way to run a content compare after a date compare? I mean on the filtered list, of course.
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

I cannot confirm that overwriting is just as fast as comparing content. Maybe it's due to the much slower write speed than read access on my stick, or maybe it's because of fragmentation (the stick is nearly full).

Besides, FFS is also great for just comparing (not necessarily syncing). Comparing everything by content takes a lot more time than a conditional date-diff.

Is there a way to run a content compare after a date compare? I mean on the filtered list, of course.blues12
You can select the files you want to binary-compare, right-click and add to the include filter. Then change the comparison variant and compare again.
Posts: 85
Joined: 28 Aug 2012

blues12

You can select the files you want to binary-compare, right-click and add to the include filter. Then change the comparison variant and compare again.Zenju
I was afraid you'd say that. It requires a complete "hard-wired" filter setup for an ever changing environment.

Which brings me back to the original intent and title of this thread. I stil think it were a great idea and time-saver to implement it.

Or any half-automated solution - such as running the second compare on the visible list in the main window.
Posts: 7
Joined: 7 Mar 2014

srcfrgr

Comparing two files by content is roughly just as fast as simply overwriting one with the other. So for "mirror" there's nothing to improve in FFS. For "two way" mode a conflict indeed needs manual resolution. However I'm not convinced that this problem is a valid one: If the version control tool sets the modification time of a file to "now" during checkout it should probably be fixed to preserve it instead.Zenju
Thanks for answering. I am indeed interested in the conflicts occurring in "two way" mode.
There are actually good reasons not to preserve the modification time, mainly when using automatic build tools like make or distutils, as noted in the [Mercurial FAQ][1].
I have this kind of conflicts every day (that's how much I use your software!), and it would truly be a timesaver if you could add this option.
Thanks!

[1]: http://mercurial.selenic.com/wiki/FAQ#FAQ.2FCommonProblems.Why_is_the_modification_time_of_files_not_restored_on_checkout.3F
Posts: 7
Joined: 7 Mar 2014

srcfrgr

I know this is an old post, but I feel like maybe I explained my request incorrectly.
I'm actually puzzled by the different behaviors in the "Time and Size" vs "Content" comparisons.
In "Time and Size", it looks like most pairs are skipped before they haven't changed since the last sync, so they don't need to be compared again, is that correct?
While for the content comparison, all pairs (that are not obviously different due to different sizes) are being compared again.
Am I misunderstanding the way the database is being used?
Wouldn't it make sense in the "By Content" comparison to skip all pairs where we know neither side has changed since the last sync?
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

I know this is an old post, but I feel like maybe I explained my request incorrectly.
I'm actually puzzled by the different behaviors in the "Time and Size" vs "Content" comparisons.
In "Time and Size", it looks like most pairs are skipped before they haven't changed since the last sync, so they don't need to be compared again, is that correct?
While for the content comparison, all pairs (that are not obviously different due to different sizes) are being compared again.
Am I misunderstanding the way the database is being used?
Wouldn't it make sense in the "By Content" comparison to skip all pairs where we know neither side has changed since the last sync?srcfrgr
The database is not taken into account during comparison (but instead at a later stage when determining sync directions). The categorization of files therefore is as advertised either files are found identical by time and size or by content.
The only setting that has an influence on whether files are compared are the file inclusion and exclusion filter rules. E.g. if a folder is excluded, it really is not processed in any way.
Posts: 7
Joined: 7 Mar 2014

srcfrgr

I see. It sounds to me like it would make so much sense to not compare the content of files that have already been compared and shown to be the same at the last sync, and haven't changed since. That would be a huge time saving, from something like 1 hour to less than a minute.
To me, either that, or as I suggested earlier an option to limit the "by content" comparison to conflicting files, would be amazing and make my life so much easier.
Would you be willing to consider including either of these ideas in a future release?
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

> it would make so much sense to not compare the content of files that have already been compared

It's not possible to know if files have been changed since the last comparison because some applications may change a file's content without changing the metadata or there may even be a corrupted segment on the hard drive. That may be an obscure scenario, but it is precisely what compare by content is for. If such a scenario is not expected (= almost always), then comparison by time and size is the option to go for anyway.

> as I suggested earlier an option to limit the "by content" comparison to conflicting files,

The case where a compare by content is able to actually resolve conflicts by finding the files identical sounds obscure to me. The version control system should be able to restore proper modification times.
Posts: 7
Joined: 7 Mar 2014

srcfrgr

> it would make so much sense to not compare the content of files that have already been compared

It's not possible to know if files have been changed since the last comparison because some applications may change a file's content without changing the metadata or there may even be a corrupted segment on the hard drive. That may be an obscure scenario, but it is precisely what compare by content is for. If such a scenario is not expected (= almost always), then comparison by time and size is the option to go for anyway.

> as I suggested earlier an option to limit the "by content" comparison to conflicting files,

The case where a compare by content is able to actually resolve conflicts by finding the files identical sounds obscure to me. The version control system should be able to restore proper modification times.Zenju
> The case where a compare by content is able to actually resolve conflicts by finding the files identical sounds obscure to me. The version control system should be able to restore proper modification times.

As I mentioned earlier, version control systems (e.g., Mercurial) expicitly advise against restoring "proper" modification times, because it would throw off softwares like "make" that rely on modification times to decide whether to compile something or not.
When working with other people, if you compiled an executable after the date of someone else's modification to the code, then when you pull that new code, it will have a date prior to the compiled executable, and will be silently ignored by make. This is obviously not desired. That's why version control systems use the pull date for all modified and new files.
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

Unfortunately for "two way" this specific scenario is not covered at all currently. A solution that fits into the FreeFileSync design without bloating the GUI with new options in the standard config dialogs could be a context menu option like "binary-compare selection". It's still manual, but conflict resolution in two-way is also manual, so this could work. Maybe this is even a building block for other interesting scenarios...?
Posts: 7
Joined: 7 Mar 2014

srcfrgr

Unfortunately for "two way" this specific scenario is not covered at all currently. A solution that fits into the FreeFileSync design without bloating the GUI with new options in the standard config dialogs could be a context menu option like "binary-compare selection". It's still manual, but conflict resolution in two-way is also manual, so this could work. Maybe this is even a building block for other interesting scenarios...?Zenju
I think that would be a great improvement. Being able to launch an external diff software (e.g., Beyond Compare, etc.) to compare and potentially merge would also be fantastic.
User avatar
Site Admin
Posts: 7052
Joined: 9 Dec 2007

Zenju

I think that would be a great improvement. Being able to launch an external diff software (e.g., Beyond Compare, etc.) to compare and potentially merge would also be fantastic.srcfrgr
You can integrate external tools already; see help file chapter "External Applications".
Posts: 1
Joined: 16 Apr 2007

sneezedit

You can integrate external tools already; see help file chapter "External Applications".Zenju
Sorry about the late post, just read it. I read the external tools help but that appears to be for individual files on a one time basis (for each file you choose). I think (correct me if I am wrong) JonL would like to see an external app that can be called by FFS for each file to do the comparison and respond back to FFS. So, FFS would call the external app for each file to be compared and get a yes/no response on whether the file passes/fails the comparison. This would allow someone do more complex comparisons.

For example, take JonL's initial request. Instead of first comparing file date/size, then content, use an external app to compare as follows.

1. Before running FFS, the user runs a program/app on each directory (outside of FFS) and creates a checksum/MD5/SHA-x database of the files in each directory,
2. Run FFS using a custom .ffs_gui/.ffs_batch config with the external comparison app,
3. If any differences appear, FFS can then proceed to sync'ing the files or adding those files to the list (depending on gui/batch).

I think this would solve JonL's dilemma. Granted calling the external app for each file comparison takes longer then the simple file date/size but it would be faster then doing a full file content comparison since reading the files for the checksum is outside of FFS. Also, if someone needs this type of customization, they should expect some amount of time penalty for calling an external app.

If you make the configuration for this, consider making it local to specific configuration files. Also, to help increase the speed potential, allow the user to check both FFS comparison (with either file date/size OR file content) and/or the external app comparison. This gets somewhat both benefits - if file date/size matchs there is no need for syncing, however, if file date/size does not match, call the external app to do the comparison and sync if the response back is a mismatch.

I hope I explained my thoughts well enough. Don't know how much work it will entail but please a least consider it. I do enjoy using FFS.

Sorry about the long post but thank your for considering it.
CN