FileTimeTolerance - related

Discuss new features and functions
Posts: 34
Joined: 15 Sep 2009

ozo-sf

It's related to the bug request:
[404, Invalid URL: https://sourceforge.net/tracker/?func=detail&aid=3043064&group_id=234430&atid=1093080]

By some reasons I can't add any posts there. So here is what I'd like to add:
-----------------------------

Yes, indeed. It's not so simple. But keeping focus at the main goal here -
to separate only changed files from the rest - I think it could be
improved by implementing this simple way:
1. check timestamps of two files. If they're the same - go to next files
2. if timestamps are different:
2.1. is the difference equals exactly 'N' hours AND is equal or less then
24h? If it's not - files are different
2.2. If it is - they could be the same version. And in such case:
2.2.1 - simple consider they are the same (simple, fast and quite effective
way). Or if not:
2.2.2 - compare file sizes. If different - files are different.
If it's the same - finally compare the content (MD5?).

Why 2.2.1 could be acceptable? The probability of saving two different
versions of a file within 24h with exactly the same minutes and seconds -
is very low. This assumption may eliminate a lot of cases for further
content comparison (which could be costly) and speed up the sync..

Why it could be a bad idea to implement?
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

This idea basically would be the generalization of the current "+-1h" setting
to full hours within a day.
A few basic issues remain:

1. Timeshift occured (either due to dst or timezone travel), files have a different date, but are marked as equal. So far so good. Now one of the two files is modified. The next sync in automatic mode will bring up a conflict because the files were not "in sync" last time, therefore it cannot deduce a sync-direction for the current sync.
2. Time zone shifts are not only full hours but may be as small as quarter of an hour
3. The probability of a failed detection of different files (with same size) increases with 2. and the number of files processed

2.2.2 a full binary comparison of two files should show similar performance
characteristics as if one file is copied over the other, so there is not much
to win in this case.

Concerning problem 1. the solution is not hiding the differences, but either
avoiding them or solving them. Former might be handled with ideas like [404, Invalid URL: https://sourceforge.net/tracker/?func=detail&aid=2994784&group_id=234430&atid=1093083]
But they still need more refinement.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

> 1. ... The next sync in automatic mode will bring up a conflict
>because the files were not "in sync" last time, therefore it
>cannot deduce a sync-direction for the current sync.
I agree with you - it can not. Unless the program may use UTC
timestamps in both directories. In this case it should use UTC
("absolute" time stamps) and not a local time. If at least one
of them don't use UTC - it's a tough case, resulting in potential
conflicts in automatic mode. And, I guess, here we're talking
about the latter case (at least one FS doesn't support UTC).
Only if program can't make comparison of UTC timestamps the
proposal in my initial post could make a sense.

BTW, while I see the place for "automatic" mode, I have actually
never used it. I need a simple back up of current files to some
backup storage. "Mirror" mode does it well.


>2. Time zone shifts are not only full hours but may be as small
> as quarter of an hour
Never heard of them, but this very specific case could be covered
with a dedicated configuration option. Checking for quarter of
hour differences may slightly increase the number of cases that
slip through the check #2.1 (ref. in my post above).


>2.2.2 a full binary comparison of two files should show similar
>performance characteristics as if one file is copied over the
>other, so there is not much to win in this case.
Good point.


>Concerning problem 1. the solution is not hiding the differences,
>but either avoiding them or solving them.
You can't dictate users to use only FS that support UTC. For a
nearest future there is a place for old FS's like FAT, FAT32, etc.
So, while we have to deal with them, let's try to find a solution.


Finally, I have not checked the recent implementation of the
"Ignore 1-hour file time difference" option, but the last time I
checked it allowed to slip away files that where changed in less
then 1 hour from the last backup... And the time difference was
not equal exactly 1 hour.
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>If at least one of them don't use UTC
This is exactly the one big problem here: FAT/FAT32 don't have UTC time
information but save local time only.

> For a nearest future there is a place for old FS's like FAT
Yes, this will probably be a very long time until FAT support finally can be
dropped... USB sticks being FAT-formatted by default keep this filesystem
still alive.

> the last time I checked it allowed to slip away files that where changed in
less then 1 hour
Yes this is another safety-measure at the cost of convenience/ease-of-use:
filetime with less than 1 hour are marked as conflict because at one of the
two DST shifts per year, the older file gets a newer date. For mirror sync
however this isn't a real problem, as conflicts can be assigned a direction,
too.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

>>the last time I checked it allowed to slip away files that
>>where changed in less then 1 hour
>Yes this is another safety-measure at the cost of
>convenience/ease-of-use: filetime with less than 1 hour
>are marked as conflict because at one of the two DST
>shifts per year, the older file gets a newer date.
And that's exactly the case where the approach published in
the first post here may help. Files with time difference
less then 1h would be considered different, as opposed to
files that have exactly 1h difference.
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>with time differenceless then 1h would be considered different
Currently these files are assigned a conflict, which can have a sync-direction
configured. I don't see the advantage of introducing a new variant type
(currently "different" is only available for compare by content). For mirror
syncs all is fine already with the current approach.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

The reason why I'm posting here is that the current approach
is not working well for me (and for others - see e.g. bug
request "FileTimeTolerance - ID: 3043058"). And I've explained
why.

There is no need to introduce a new option. Moreover, I'd even
suggest to remove the current one - "Ignore 1-hour file time
difference" simply because we all use the DST and we all need
that TZ/DST tolerance all the time. And if you compare files
using the technique provided in the 1st post - it will work
automatically. But of cause it's not a big deal...
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

> remove "Ignore 1-hour file time difference" simply because we all use the
DST
In it's current form this option entails the behavior of categorizing files
with deviation of less 1 h as conflicts. As this has a medium impact on the
tool's behavior it's something not all users will want.

> And if you compare files using the technique provided in the 1st post - it
will work
... for mirror sync only. When using other variants like <Automatic> there are
still issues remaining.
The requirement for a reworked DST handling are:
- possibly remove the DST option in global settings (optional)
- correctly execute synchronization with ALL sync variants after DST change even if files have been synchronized/copied externally.
- handle time zone shifts also
The first post is feasible for mirror sync only, the link for <Automatic>
mode. We'll need something that's a more general, safe and user-friendly
solution.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

>>And if you compare files using the technique provided in the 1st
>>post - it will work
>... for mirror sync only. When using other variants like <Automatic>
>there are still issues remaining.
Then let's do it just for at least the mirror mode only.

While I can see some cases where Automatic mode could be used, I
personally never used it. From file backup software I need the
"mirror" mode only. I've been using that with SyncbackSE for many
years before, and now I'm doing the same with FFS as well.
I just want to forget that problem and never return to it back...

>- correctly execute synchronization with ALL sync variants after
>DST change even if files have been synchronized/copied externally.
I agree with that intention, but let's make that small step first
and don't postpone it until a comprehensive solution will be made...
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>Then let's do it just for at least the mirror mode only.
As for the mirror sync there really is not much to do: Instead of only 1h,
multiples of 1h would be allowed. So the only functional advantage would be
support for moving between different time zones. Honestly that is not that big
a usecase, so I prefer to handle it when implementing a general FAT-filetime
solution.
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>General FAT-filetime solution.
Speaking about it, just had some (potentially) great idea: When thinking about
what the source of this problem is at its lowest level, we have FAT that
physically saves local time, while NTFS physically saves UTC time. The binary
amount of "information" is the same, the difference is just its
interpretation. Now if Windows would interpret FAT's binary filetime
representation as UTC, we wouldn't have this problem! Actually I think this is
something that could by done quite easily by MS, even migration/backwards-
compatibility isn't really an issue here (old files that were saved with local
time would now be reinterpreted with UTC, but this just once as migration
step...so what) I believe this might be what Linux is doing. In fact I
couldn't reproduce the DST issue on Linux even with FAT-formatted usb sticks.
Now back to reality. MS isn't going to fix this (although that's a pity,
because it's easy and it's their responsibility). But I could. I'd just
interpret FAT filetimes as UTC. And when copying files to FAT with FFS I'd
adjust times accordingly. The only remaining issue would be if files are
copied externally. Therefore I'd need to distinguish, is it local time or UTC
I'm reading? A quick and dirty solution would be to hide a flag somewhere
associated with each file. Perhaps abuse some other metadata like creation
date? I'd just need store a single bit of information, this should be feasible
somehow...
Result: All DST/time-zone issues would be ultimatively solved irrespective
which FFS mode is active, no database file required...whatever. I think this
idea looks quite promising.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

>>Then let's do it just for at least the mirror mode only.
>As for the mirror sync there really is not much to do: Instead of
>only 1h, multiples of 1h would be allowed. So the only functional
>advantage would be support for moving between different time zones.
>Honestly that is not that big a usecase, so I prefer to handle it
>when implementing a general FAT-filetime solution.
May be there is no many folks here who change time zones frequently.
I may agree with that. But I may assure you that every one changes
computer time twice a year (DST) for exactly 1h. And they always
expect that FFS recognizes different/same files correctly. So, while
I wish you luck in offering general FAT-filetime solution, meanwhile
I hope to forget about that problem with at least mirror mode...

Regarding to your second's post idea:

1st of all. Let me remind you what you already know - in FAT
the timestamps were created using local time. While the binary amount
of "information" is the same, there is no way to know some global time
(like UTC) at all. It's not just a matter of interpretation. You may
treat all of those timestamps as a random values discretely (1h steps)
deviating in +/- 24h range from a real time (like e.g. UTC). File on
mine computer created simultaneously with a file on your computer will
have different (local) timestamps and, because it was not stored in UTC
or because it was not saved in local time plus time zone plus DST -
there is no way to restore the actual time in future. The only one
mitigating factor here is - both files have the same amount of minutes
and seconds and the time difference is no more them 24h. That's it. BTW,
as you know, timestamps, even in FAT have higher resolution then just
seconds. You may count on that when comparing files in order to
recognize that files, which are differ in 'n' hours, are actually
the same...

2nd. I may assure you - Microsoft will do nothing about it. They are
working only on products that they hope to sell in future. They don't
work on what was already sold. They do a development for a new money,
only. That's the essence of this company and their business model.
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>there is no way to know some global time (like UTC) at all
I think I didn't make myself clear enough: The idea is to simply save UTC time
at this physical place where FAT expects local time when copying files via
FFS(!). This is the whole trick. Local time is a binary value that is never
touched by DST or time zone shifts on FAT. Consequently we have preserved UTC
time information exactly the same way we have it on NTFS. Of course windows
doesn't interpret what it thinks is local time correctly anymore, this is the
price.
All we did is change the interpretation of this binary data within FFS that
Windows thinks is local time. Consequently we have fixed the DST/time zone-
issue in the FFS world. The only remaining challenge is to handle transitions
between FFS and Windows: 1. Copying a new file to FAT disk 2. Changing an
existing file on FAT disk
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

> FAT have higher resolution then just seconds
Write time, the one we're mostly interested in has only 2 second precision on
FAT
Posts: 34
Joined: 15 Sep 2009

ozo-sf

So, if I understand you correctly, in backup volume (using FAT) you want to
save time stamps in UTC (and not in local time, as they normally would). And
the time is converted by FFS on the local computer using its time zone. Then
you can move that backup volume (despite the fact that it's FAT) to any other
place in the world (or simply change DST) and successfully synchronize those
files with those other computers. Right?

It'd work, if: I don't access those files from any other synch. application
(which knows nothing about that approach) or I don't copy those files directly
(which I can easily do now, otherwise I put files with timestamps that FFS
would not expect).

But, time to time I deal with backuped files directly. For example, copy whole
root folder from backup location to a destination when I want to restore all
files at once (it is the easiest and natural way). Another example, I move
files in backup volume manually all the time when I want just to rename some
folders that contain a lot of nested files / subfolders. It's because FFS
doesn't support renaming folders yet... So, there are cases when I need to
work with files manually, and there are cases when I may want to apply a
different file synchronization program too. Saving files with recalculated UTC
timestamps instead of local time may create some problems here.

But, under the condition that I know about that special way how FFS treats
timestamps on backup volume (FAT formatted) and I agree to (and promise) do
not use any other synchronization program(s) or deal with backuped files
directly (only via FFS) I guess the idea could be quite feasible...

You're right about 2 sec time precision in FAT, my memory failed me in this
case ;) It makes the matter a bit worth then I thought.
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

>Right?
Correctly. I fix the design flaw (which had it's reasons back then) that FAT
saves local time instead of UTC by having FFS save UTC time instead when. This
works as far as FFS is concerned. A few (solvable) issues remain:

1.> if: I don't access those files from any other synch
Yes, other sync apps would be totally out. They would see NTFS vs FAT files as
different while FFS sees them as equal. This may have some users erroneously
think FFS is broken. On the other hand this behavior is inevitable sooner or
later: We have a DST shift that visibly changes file times by 1 hour and still
want FFS to see them as equal, while other sync apps don't. With regards to
the annoying DST issue that this approach solves and the non-functional nature
of this restriction I'm tempted to accept this price.

2. Files are modified on the FAT drive directly by the user or external files are copied to FAT
These files will get their usual local time. FFS will need to distinguish
which dates are in local and which have a converted UTC representation. I was
thinking about an approach to have yet another metadata file that saves UTC
times for each file. But because of your comment about renaming files this is
basically out.
There is not much space associated with each file for additional metadata. We
have write time and creation time.
One approach could be to convert local times to UTC times when scanning the
FAT drive (I'd consider this adaption logical read-only). To note that files
are now UTC some magic number could be written in creation time. This would be
the migration step to deal with existing FAT files that are not yet adapted by
FFS. Also when copying files FFS would save UTC time instead of local and set
this magic number indicator.
This solves the issue of having new files copied externally to FAT. Files that
are editted directly on the FAT however would get an invalid write time,
because the magic number in creation date would still be set, altough write
time is now local time. Fortunately this wouldn't have a bad effect most of
the time: <automatic> and mirror mode simply would detect a changed date. They
don't care whether it's newer, older or valid, but proceed correctly.
Perhaps this effect could be totally avoided by even saving this magic number
within last write time.
Posts: 34
Joined: 15 Sep 2009

ozo-sf

I'd say - go for it.
But reserve an option to turn it off just in case. Or even better - offer
action to restore the local time on remote FAT volume if user will ask for
it...
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

I've found some further refinement that fixes all remaining problems:
1. Leave last write time untouched -> no windows/ffs/other sync app interaction problems!
2. Abuse creation date to save all information
I. local<->UTC time offset
II. indicator that offset in I) is present
III. indicator that offset in I corresponds to specific write time (this could
be a hash of the write time)

Resulting context:
Only creation date is physically modified, last write time is untouched! When
user copies a new file to FAT or edits a file on FAT the data stored in
creation date becomes obsolete due to II (when creating a new file, creation
time is current) and III. As soon as he scans these files with FFS, the
creation date receives the data required to calculate UTC. I think this is THE
solution ;)
User avatar
Site Admin
Posts: 7050
Joined: 9 Dec 2007

Zenju

Finished a working prototype: [404, Invalid URL: https://sourceforge.net/tracker/index.php?func=detail&aid=2994784&group_id=234430&atid=1093083]