Keep previous versions of files

Posts: 24 · bkeadle 10 Aug 2012, 03:38

I like, and welcome, the limit revision ideas you offer. As for your
"previous" revision tree, it's nice to see in explorer tree view files that
have changed on a give run of a sync. True, I could just enable logging and
refer to the log, but somehow the revision tree is easier. In the new revision
structure, I suppose I could do a search on the revision date/time stamp and
see the a tree result that way (sorta) of changed files on a given sync pass.
If having 2 different options for revision copies (like file compare Time &
Size vs. File Content) is problematic, so be it. The new structure may well be
preferred for most.

Posts: 7210 · Zenju 10 Aug 2012, 12:54

> I could do a search on the revision date/time stamp

This will be a feasible way to solve the problem. FFS takes care to use the
same time stamp for all files revisioned within a sync session and for the log
file name as well. Generally, having a simple revisions hierarchy that matches
the user's source data seems more important than being able to quickly find
the "diff" of the last update.

> I figure if I haven't needed to get an old/deleted file in in "x" days that
I don't need it after that.

This is a good reason to require a limit on the date of deletion as it is
encoded in the time-stamp appended to the file name (as opposed to the last
file modification time).
I'm almost inclined to add two options "limit revision count" and "limit to
last x days". But there is a problem with limiting the time: In order to find
all the outdated files, FFS would need to scan the entire versioning directory
at the end of every sync. If the data is stored on a slow device like an usb
stick or on a network share, the performance may be prohibitive.
For "limit revision count" on the other hand it's easy and fast (at least it
scales with the number of deletions) to limit the number of a group of files
located in the same directory.
Also it seems a "limit revision count" could also solve your problem of
providing means to undo an accidental deletion? If this is true, we would only
need "limit revisions count" as an option.

There is also the question how to handle symlinks. For the old versioning
scheme, it seemed obvious that both file and directory symlinks should be
moved. I think this is also true with the new versioning in the case of file
symlinks. But directory symlinks may be a different story: Currently (in the
beta) I do create directories lazily when versioning, i.e. only when needed.
This simplifies the code internally, but may be actually even useful. Also the
time-stamp is not applied to the directories that are revisioned. So for
directory symlinks this seems to indicate that they should simply be deleted
without revisioning!? Any outragous complaints about this plan?

Posts: 24 · bkeadle 10 Aug 2012, 13:18

> But there is a problem with limiting the time: In order to find all the
outdated files, FFS would need to scan the entire versioning directory at the
end of every sync.

I don't understand why this would be any more difficult than the revision
count. For any given file you're making a revision copy, couldn't you just do
a wildcard search for that file, and delete any of the resulting files that
exceed the time limit? e.g.:

dir %filename-root%*.%extension%
for each file if older than x days, delete

Also, you mention the new versioning naming syntax:

> <revisions directory>\<subdir>\<filename>.<ext> YYYY-MM-DD HHMMSS.<ext>

I would suggest/prefer instead:

> <revisions directory>\<subdir>\<filename> YYYY-MM-DD HHMMSS.<ext>

with an optional delimiter between <filename> and date/time stamp, for the
reasons previously mentioned in my hardlinks.cmd?

Also, what are your thoughts about being able to *RESTORE* files from a
certain date - sync back? Are we left to our own devices for copying back a
revisioned file and having to handle the removal of the date/time suffix?
*THIS* actually may be a good reason to preserve (as an option/alternative)
your previous revision structure, so that if I needed to do a restore of files
from a previous backup, it's only a matter of:

xcopy <revisions directory>\YYYY-MM-DD HHMMSS\*.* <destination> /s

whereas with only the new proposed revision structure, restoring more than a
few files would be a tedious process or require a pretty sophisticated script
to make it happen.

Posts: 7210 · Zenju 11 Aug 2012, 16:38

> I don't understand why this would be any more difficult than the revision
count

In order to cleanup files older than "last x days" you need to travese the
complete revisions directory including subdirectories. On the other hand if
you need to ensure that only a fixed number of revisions exist per file, you
only need to check a single directory, and only do this if you add a new
revision. Former could be a performance issue if the directory tree is huge,
and it doesn't scale with the number of deletions, as the second option does.

> <revisions directory>\<subdir>\<filename> YYYY-MM-DD HHMMSS.<ext>

A lot of files have the same name but a different extension, e.g. "source.cpp,
source.h". The extension is therefore used as part of the name.

> with an optional delimiter between <filename> and date/time stamp, for the
reasons previously mentioned in my hardlinks.cmd?

What reasons do you mean exactly?

> what are your thoughts about being able to *RESTORE*

That's a difficult question, the lines between synchronization and versioning
are blurry in this area. FFS should stay focussed on the synchronization
aspect, but may satisfy a good deal of versioning tasks as long as it "doesn't
hurt" the design. The fact that FFS provides a user-readable revisions
structure almost implies that it will also be the user who does the restore,
just because he can, in contrast to a closed implementation that all the other
tools specialized on versioning do. I see the core benefit of being able to
restore individual files (easy with the new naming convention), rather than
restoring complete sets of files. Latter seems a less mainstream scenario,
biggest usage is probably source code versioning. But users who want a
sophisticated source code versioning may use GIT or SVN. Personally I use
FreeFileSync for versioning, and for small projects it seems the perfect
balance between ease of use and benefit. The scenario of restoring a connected
set of files for a given date in time, in my experience, has never been more
demanding than to restore the files for a specific version of FreeFileSync. So
I am having trouble seeing the demand for more sophistication. This is either
because I didn't have these specific problems yet, that require restoring a
version for an arbitrary point in time, or the framework I'm working in is
powerful enough to suit all relevant needs.

Posts: 7 · clh42 11 Aug 2012, 17:17

I was working on this reply as Zenju posted his response a few minutes
ago, but as I already had this mostly written, here's my thoughts.

> Also it seems a "limit revision count" could also solve your problem of
providing means to undo an accidental deletion?

Well, yes in general, but it still affects the resulting files in the
revisions directory. Revision count means that if I modify a file every day,
or multiple times a day, I still can never have more than X revisions, which
means if I modify a file a lot, I might end up only a few days back of old
versions and how far back will vary depending on how often the file is
modified. And if I delete a file with revision count purging, that deleted
file will never get deleted from the revisions directory, ever.

With a date age purging, I know my often modified file is around for a fixed
period of time regardless of how often it's modified. And I know deleted files
are still purged out after that period of time to free up space (again, if I
haven't needed it back after I my specified period, I deem that I don't need
it).

> But there is a problem with limiting the time: In order to find all the
outdated files, FFS would need to scan the entire versioning directory at the
end of every sync.

Well, you make a good point. I was going to say, just handle it on a per file
basis as you sync each file, just look for files older the date range and
delete them as you sync each file and copy over the new revisions for modified
files. You already know the main file name as you sync each folder and I don't
think it will add a it won't add a whole lot more time to filter and delete
revisions older than the date range right after you copy the new revisions in.
But as I reasoned through that, I see the catch in this is that, that works
for files that are being synced, but it doesn't help for files that were
modified and synced at one time but don't get synced again, like my example of
the deleted file, or even just a file that might get modified several times in
a short period and then . So yeah, I guess at some point you'd have to go back
and scan everything I guess just to be safe.

I don't know what to tell you, and I don't know how they do it, but I've
looked at 3 other sync apps and they all offer both a revision count and
number of days type of purging. My only other thought was that I saw other
disccusion threads for FFS about whether or not to use a database for some or
all types of syncs. I can tell you that the other software I've checked out
does all use databases for all types of syncs, and maybe tracking revisions
the database makes it easier to handle purging of old revisions with either
method. That's purely a guess though, I really have no idea.

> I don't understand why this would be any more difficult than the revision
count. For any given file you're making a revision copy, couldn't you just do
a wildcard search for that file, and delete any of the resulting files that
exceed the time limit?

bkeadle, as Zenju basically just said in his previous reply he wrote as I
was writing this, I don't think he necessarily meant that it wasn't that
difficult to code, it's the performance hit of having to scan through
everything as described above, which doesn't have to be done with the revision
count scheme.

> Also, what are your thoughts about being able to *RESTORE* files from a
certain date - sync back? Are we left to our own devices for copying back a
revisioned file and having to handle the removal of the date/time suffix?

I have to defend Zenju on this one from his comments in his previous
reply. I'll again point to what I've seen in other sync software. Software
marketed as "sync" software, at least the ones I've seen, do not provide a
"restore" function. If you truly want the full functionality of that, you need
to look at software that's marketed specifically as actual "Backup" software.
Although, bkeadle, could you maybe reverse your script to create a linked
directory structure mimicing the old way from the new way?

Last, I'll throw this out there just for comparison. I've rechecked the other
sync software that I've tried in how they handle versioning. 1 of them does it
the way FFS did originally, with a separate directory for each sync. The other
2 do it the way I suggested and has now been implemented in the FFS beta, by
using the same original source directory structure and renaming the old files
to include the date and time of the sync operation in the file name.

And with the original FFS date method that would actually make the date
revision method easier because you no longer have to scan through for dates,
but you just look at the date revision directories and delete entire date
directories (not caring about the individual files within the date directory)
older than x days. But on the # of revisions method, now it's harder because
you're not guaranteed that a specific file exists in all old revision date
directories, so you have to search back through every directory and count
revisions as you search, and track which one is the oldest.

BUT, as I mentioned above, all 3 of the other programs include the options to
purge either on # of revisions or on # of days, so they manage it somehow,
using either method of storing old revisions.

Posts: 24 · bkeadle 11 Aug 2012, 23:28

> In order to cleanup files older than "last x days" you need to travese the
complete revisions directory including subdirectories. On the other hand if
you need to ensure that only a fixed number of revisions exist per file, you
only need to check a single directory, and only do this if you add a new
revision

Are talking about the new structure here? In the old structure, yes, you would
need to "traverse the complete revisions directory", but in the new structure,
for each file copied, they're all in the same directory - thus whether you
want x number of revisions or revisions older than x days, they're all there
in the same directory to determine. Perhaps this debate is getting mixed up
with my suggestion that the old structure remain as an option. If the old
structure were selected, then yes, the version limit would cost too much in
terms of performance.

> A lot of files have the same name but a different extension, e.g.
"source.cpp, source.h". The extension is therefore used as part of the name.

Yes, I understand that. You said the new format for the filename is to be:

<filename>.<ext> YYYY-MM-DD HHMMSS.<ext>

And I'm suggesting that the middle <ext> reference be removed, and just keep
the extension at the end where it belongs:

<filename> YYYY-MM-DD HHMMSS.<ext>

> What reasons do you mean exactly?

"Having a choice of delimeter allows for flexibility to generate a file
listing and then act on the file listing through other utilities/scripts so
that <filename> (delim) <YYYY-MM-DD HHMMss> <ext> is easily parsed for
actions. Simply modify the DELIM= variable just under the :BEGIN label to a
valid delimeter of your choice. I've using "---" by default."
This would be especially useful to address the ability to do a restore using
the new naming convention as I mentioned earlier - especially if the old
revisioning structure is not an option moving forward, which would make
restore files from a certain date easy. As for the suggestion that FFS is
intended and a sync utility, not a backup utility doesn't make sense to me,
especially in the context of keep file revisions.

Posts: 7210 · Zenju 12 Aug 2012, 16:27

> can never have more than X revisions, which means if I modify a file a lot,
I might end up only a few days back of old versions

Good point, a limit on "revision count" cannot avoid old versions getting
lost, if a specific file is updated and synced repeatedly in a short time
frame.
But the deeper question is what do you want to achieve? If you just want do
make sure you are able to restore a version, say 10 days back, then you just
would not set a limit at all. Maybe you want to conserve disk space? In this
case, perhaps an option "limit revision total to X bytes" may be more
suitable?
From a performance point of view it is the same if I traverse the revisions
directory including subdirectory in order to remove outdated files or if I
count the bytes and delete as many old files as are required to get below a
"bytes limit".

> filter and delete revisions older than the date range right after you copy
the new revisions in. But as I reasoned through that, I see the catch in this
is that, that works for files that are being synced

Yes. If the user sets option "limit to x days" he naturally expects this to be
applied to all files, not just the ones that are newly synced. If the
functionality would be weakened to just apply to files touched, it may not be
useful at all anymore; but at least the performance problem would be gone and
it would scale nicely ;)

> other software I've checked out does all use databases for all types of
syncs

Most of the performance considerations concerning scanning all sub-directories
could be invalidated if there were an index file, which contains the location
and date of all revisioned files. I don't like this idea too much, because the
information is redundant and can theoretically become out of sync with the
real representation. Perhaps this is even more tricky than it should be, since
it's an implementation detail leaking out for reasons not apparent for the
user.
Also the performance argument should not be overestimated: In my relatively
large revisions directory located on a non-buffered USB memory stick,
traversing the full tree structure takes less than a second. In conceivably
larger scenarios where this may become a real bottleneck I should show a
status in FFS like "scanning revisions directory". This will make it
transparent to the user what FFS is doing and allow him to just select "limit
revision count" instead, which does not show this behavior. So he can decide
himself if "last x days" is worth the performance drawback or not.

> Are talking about the new structure here?

Yes. The performance may become a problem if the requirement is to "remove all
outdated files" rather than only the files that are newly revisioned.

> And I'm suggesting that the middle <ext> reference be removed

In this case sorting would not work because files that differ in extension
only would be intermixed according to the date.

> choice of delimeter

I don't think a delmiter is really needed. The time stamp is special enough to
be detectable programatically:
A simple filter that matches a versioned file could be:
* *-*-* *
or to be more safe:
* ????-??-?? ??????

or even with regex
.* {4}-{2}-{2} {6}
.* \d{4}-\d{2}-\d{2} \d{6}

Just to make sure I don't miss something when thinking about requirements;
these are the goals a limit option (whatever that will be) should fulfill:

- limit number of revisions in order to have a clearly arranged, short, user-friendly list when viewed with a file manager
- limit disk space

Posts: 24 · bkeadle 12 Aug 2012, 16:48

Ah! The light has gone on.

As for the limit in X days and traversing the directory discussion: yes I was
only considering the files that are being modified to be checked for limit X
days - I hadn't considered *all files* in the REVISIONS subdir to be evaluated
for x days - that being the case, yes, I agree that can be a performance
concern. So perhaps a check-box option for "Apply limit on all files" or
"Apply to only modified files" (or some better verbiage). Personally, I would
only use "apply only to modified files".

A "revision total to X bytes" sounds interesting - that would be a unique
feature, wouldn't it? If it's "cheap" to implement as an option, why not -
could prove to be useful.

And about the filename convention using your 2 references of <ext> vs. my
suggestion of just once (the suffix). I now see your point and makes good
sense.

Posts: 74 · mfreedberg 14 Aug 2012, 01:11

Let me cast my vote on this very interesting discussion - and I am really
excited about this change, as I have been doing a lot of clean up of older
"monthly" folders based on a folder naming convention designed to let me do
the purging more easily.

My vote: keep it simple at first and implement a version limit. If necessary,
you could add this as a limit per folder pair, so that we could have some
flexibility to manage the version limits based on how often a particular file
set may change.

I understand the drive to include other filters, like "make sure this revision
folder is never bigger than..." but then the one time you increase the number
of files or make some larger changes and you then unexpectedly bump one of
your revisions, then this option will seem ill-advised.

Posts: 7210 · Zenju 14 Aug 2012, 19:05

> A "revision total to X bytes" sounds interesting

This would essentially be a Windows Recycle Bin reimplementation; I'm
beginning to like this idea... it would be simpler to grasp than "limit last x
days", have more clear semantics and do a similar thing.

> My vote: keep it simple at first and implement a version limit.

I agree to keep it simple. It's hard to pull features back later and it's
difficult to decide if users complaining miss a feature or are just unwilling
to use the alternative.
But I figured to first go with the "limit bytes" option, which looks like a
safe bet considering the Recycle Bin similarities.
I'm still uncertain under what scenarios a version limit is actually useful.
It doesn't guarantee that the total file size is kept in bounds and also does
not handle the scenario of frequent updates for a single file. I first thought
it might help to keep the revisions directory small and more managable. But in
my tests, this isn't really a big deal. If you look for a specific file, you
enter the initial letters in explorer and find it instantly.

> add this as a limit per folder pair

In any case, the limit options will be at both local and global folder pair
level. Since they are part of the sync config, local configuration, if
existent, overwrites the global one.

Posts: 24 · bkeadle 14 Aug 2012, 19:47

But I figured to first go with the "limit bytes" option, which looks like a
safe bet considering the Recycle Bin similarities.

Boy, I sure don't know about this. "Limit bytes" per file or per directory?
That sure seems like a moving target, as one backup may be 10s of MB and
another 100s of MB (or GB as the case may be). Seems like the most practical
and common use would be number of revisions of any given/changed file. But in
the case of multiple changes in a single day to a single file, then there may
need to have number of days modifier. And as mfreedberg points out:

but then the one time you increase the number of files or make some larger changes and you then unexpectedly bump one of your revisions, then this option will seem ill-advised.

Posts: 7210 · Zenju 14 Aug 2012, 21:20

> "Limit bytes" per file or per directory?

This was about a limit on the total number of bytes of the whole revisions
directory.

Posts: 24 · bkeadle 14 Aug 2012, 22:07

Ah, I guess that makes more sense. However, how would you determine that? That
sounds hairy. You would first need to scan the entire REVISIONS root to get
current size, right? That can be costly - unless you're keeping track of that
in some index file. Then how would you determine *what* to delete in order to
make room for the next backup? Say I'm 10MB from my limit. I have 9 1 MB files
to copy, and 1 11MB file to copy. Seems like it'd be difficult to reconcile
what to delete to make room.

Seems like the x revisions and and x revision-days is a more simple
implementation and less costly (or maybe my light has just gone off again. :-)
)

Posts: 7 · clh42 15 Aug 2012, 03:30

Gotta put my own big "no" vote in for limit by size, or at least not without
having it as an additional option to either (or both) by revisions or by days.
I want to know there's consistency in my retention.

And again, comparing to other sync software I've tried, they all offer by days
or by # of revisions, and none offer based on size.

If anything, I could see size being on top of # of revisions or # of days so
that regardless of those specs it never exceeds a certain size and doesn't
fill up the drive with revisions, but still keeps # of revisions or # of days
as long as the size isn't exceeded, but I'd still rather manage size myself
and adjust my # of revisions or # of days if I start filling up my drive with
revisions. And having ONLY the option by size wouldn't be used by me.

Posts: 7210 · Zenju 16 Aug 2012, 20:55

> That sounds hairy.

If that's hairy, then so is Windows Recycle Bin, it's the same aproach.
Performance will be similar (but probably much better :)

> how would you determine *what* to delete

Delete as many old files (= use time-stamp at end of filename) until the total
is below the threshold.

For all three options "limit versions count, limit by days, limit by size"
performance won't be the decisive aspect. But I certainly don't want to
implement all three options. As a mathematician, I'd like to find a formal
argument for one or a combination of two options, or maybe a third one not
considered yet. Seems this one needs deeper thinking.

Posts: 24 · bkeadle 16 Aug 2012, 21:32

"As a mathematician..." - well that helps to explain why this probably makes
more sense to you. :-)

"Delete as many old files..." - from what directories? You say size is based
on the entire REVISIONS directory structure, so you would be deleting older
files from a random mix of subdirectories, the oldest in any given directory
under REVISIONS? Again, unless you're keeping an index of these files, I don't
know how you're going to efficiently do that - but, you're the mathematician
*AND* the programmer, so you clearly know better than me!
:-)

Also, if you're just deleting the oldest, say you have 10 MB free, and need to
copy a 100 MB file. How do you know you won't be whacking the only backup copy
of a bunch of small files in order to make the room? If you're oldest file
happens to be a 200MB file, then you're an easy one delete away from making
room. But if your oldest files happen to be a bunch of small files (like
valuable scripts!), you could run the risk of deleting some "precious" files.
For that matter, that oldest 200MB might be your only, last backup of that
file. Just sounds too arbitrary of what might get deleted.

However, if you implement an index, then all these concerns go away, as you'll
be able to more intelligently (I think) ensure that you're not deleting your
only backup of a particular file.

Of the 3 options, limit version count and limit days seems not only easier,
but more appropriate for FFS to manage as part of the process. Limit by size
seems it would be more of a storage management issue handled outside of FFS.
Personally, I can see where I would used the first 2 options, but can't
conceive of when I'd ever use the last option - limit by size.

Speaking of storage management...that might be better handle by using hard
links as posted here (hint, hint)!
:-)

Posts: 7 · clh42 17 Aug 2012, 03:56

I agree with bkeadle, I think by size is not useful as it's not consistent
since you never know what's going to need to be synced and how that might
affect what files are left. I don't mean that from a programmer standpoint, I
just mean from an end user standpoint that I don't see how anybody would find
that useful.

Posts: 71 · Giangi 17 Aug 2012, 09:15

So far I was only reading this interesting topic, personally I do not use the
versioning because I do not agree on keeping the files on one "side" only... I
have wrote somewhere else that it's a nonsense to move from one side to the
other the "versioned" files... :-)

Anyway me too I would like to give my "NO" vote for the limit-by-size version,
I agree that only limit-by-time and limit-by-number should be implemented! :-)

Ciao, Giangi

Posts: 7210 · Zenju 17 Aug 2012, 11:45

> You say size is based on the entire REVISIONS directory structure, so you
would be deleting older files from a random mix of subdirectories

> I don't see how anybody would find that useful.

Hm, Microsoft seems to think it's useful, and most users probably think
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least. Generally, the problem of accidentally deleting
required data because you delete a single large file is less a big deal than
one might think. This is because the limit for Windows Recycler is quite
large, like 10% of the volume's total size by default.

All three options do a similar thing, but have different trade-offs. I'd like
to get rid of this redundancy and have like two orthogonal features.

Posts: 71 · Giangi 17 Aug 2012, 12:16

> Hm, Microsoft seems to think it's useful, and most users probably think
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least

Uhm... I think you are mixing apples with pears... :-)
The Recycle Bin is just ONE single container, while with the term
Versioning you mean a place where storing different versions of the sime
item.
So for the first is correct to do a delete-by-size while for the second is
not! ...of course is just my opinion... :-)

Posts: 7210 · Zenju 17 Aug 2012, 12:31

> Recycle Bin is just ONE single container

It's actually one container per volume, each with a limit on total size.

> with the term Versioning you mean

It's not fixed what "versioning" should mean for FFS. I picked the name
because I think it sounds nice and it creates "versions", i.e. files with a
time stamp. The goal is not so much to implement the perfect versioning
(whatever the exact definitions may be), but rather find the optimal
functionality for a third "deletion handling" option next to "delete
permanently" and "use recycler".

Posts: 71 · Giangi 17 Aug 2012, 13:01

Ok, I have omitted the "per volume"... but it's still one container "flat",
without a directory structure... :-)

I understand what you mean, but the word versioning gives the user a
"specific" idea of what it means! ...at least it did it with me (and english
is not my mother language... :-)
Adding a time-stamp to the file make it a "real" versioning system (of course
not as accurate as a CSV system can be!) :-)

Anyway I was just trying to not promote the limit-by-size logic... :-))))

Posts: 2450 · Plerry 22 Aug 2012, 18:49

I really support the new version approach!

Like Giangi I don't seen the need for a limitation in size.

I like the approach of X-versions and Y-days.
However, I would suggest to be able to choose between an AND and an OR of
these two conditions.
Programmatically this seems to be straigh forward.
If not, I would personally prefer the AND condition; i.e. versions only get
deleted if they are at least Y-days old
and if there are at least X newer versions.

Next: the nice option to select a file in Windows Explorer and then via right
mouse click see (and restore?) the
available previous versions ...

Plerry

Posts: 7210 · Zenju 23 Aug 2012, 11:31

I'm a little surprised that there is no interest in a "limit by total size"
which is a "recycle bin" alternative for USB sticks and network shares. After
all this is the most prominent way of versioning unconsciously used by all
Windows users. And they seem to be okay with the limitations; I've rarely seen
requests for alternate ways to manage deleted files on Windows.

I've done some more research, the three options discussed so far "limit count,
limit total size, limit by days" are the standard limits used in versioning. I
had hoped there would be some alternate, more elegant solution, but it doesn't
seem to be the case. Most are variants or combinations of the three options
discussed. Still I was surprised to find one peculiar approach:

- keep all revisions which are a fixed number of days old only. Then keep one version per week, one per month and one per year. In other words: The older the revision the bigger the distance in time between two of them.

Posts: 24 · bkeadle 23 Aug 2012, 11:57

"Recyle bin" is fine as a fail-safe to recover deleted files, but should not
be counted on as a backup "method". One should not "file" things in trash -
it's not good practice in the physical world, nor in the digital world. :-)
"trash" implies uneeded, and should it get deleted, it should be of no
concern. But using FFS as a backup tool doesn't mean what I keep in backup are
"trash", except for conditions of count or age (days).

Posts: 24 · bkeadle 23 Aug 2012, 12:01

I should add.. the way you put it, "Recycle bin for USB and network shares"
does sound attractive, but again, it seems an arbitrary way of keeping
backups, and is at risk of deleting an only backup of any (random) file.

Posts: 7210 · Zenju 23 Aug 2012, 12:48

I'm beginning to see your point: We're dealing with two similar but different
scenarios:
1. the user is manually managing his files and as a human is making mistakes. So he needs a way to quickly undo a deletion via Recycle Bin.
2. file management is handled by a synchronization tool. Conceptually this can be seen as one layer above 1: since the individual operations are guided by rules (automatic, mirror sync) and sync is automated, there is less demand for a facility to undo an accidental deletion. Further, sync is initiated at regular times when the user considers his data consistent. So the demand is more for keeping a backup of different versions than undoing inconsiderate deletions.

A "limit by total size" respects limited disk space more than the implicit
knowledge that more recent versions contain more relevant data. Considering
today's big backup hard drives, this may not be the right tradeoff anymore.

This leaves "limit revision count" and "by date". Former doesn't limit the
total size directly, it still scales with the total size of the user's backup
data. E.g. setting a revision count of 10 limits the total size of the
revisions directory to roughly 10 x user data size (assuming the file sizes
stay the same). Also it will keep at least one version per file. The overall
semantics look quite useful.

For "limit by x days" there are two variants: I) Apply to each file after
sync. This will delete all revisions of a particular file if it was not
updated in the recent x days. This seems to be similar a behavior like Recycle
Bin: Ensure to be able to recover data within a limited time frame (for
recycler the time frame is implicitly defined by its size and number and size
of new deletions) From a backup perspective it's less useful as you generally
may not find any "old" versions.

II) Apply only to newly revisioned files: This ensures there will always be at
least one old version per file, similar to "limit revision count". On the
other hand there is no (implicit) limit on the total size of revisioned data.
Large collections of old revisions will not be cleaned until a new revision is
added for a particular file.

So far a "limit revision count" seems to offer the greatest advantages and the
least drawbacks.

Posts: 24 · bkeadle 23 Aug 2012, 14:11

Well said. On your later point, seems that II) is best option.

Posts: 2450 · Plerry 23 Aug 2012, 15:31

My earlier suggestion to use an AND condition of X-versions and Y-days:

> Versions only get deleted if they are at least Y-days old and if there are
at least X newer versions.

applied to each file after sync seems to allow the benefits of both option I)
and II) above.
* setting X to 0 would effectively give option I) :
all revisions older than Y-days will get deleted
(X=0 AND Y=0 might need to be flagged as deleting all / preventing any
revisions)
* setting X to any integer >0 would give option II) :
at least one most recent previous version is available (if the file was ever
changed ...)
(Y=0 would just keep the latest X versions, if any)
Disadvantage for X>0 and Y>0 might be that the size explodes due to frequently
getting changed.

Plerry

Posts: 7210 · Zenju 27 Aug 2012, 17:59

For v5.7 I've implemented the limit on revision count which has received the
greatest consensus. Here is the beta for testing:
[404, Invalid URL: http://freefilesync.sourceforge.net/FreeFileSync_5.7_beta_setup.exe]