Keep previous versions of files
- Posts: 24
- Joined: 25 Nov 2009
> In order to cleanup files older than "last x days" you need to travese the
complete revisions directory including subdirectories. On the other hand if
you need to ensure that only a fixed number of revisions exist per file, you
only need to check a single directory, and only do this if you add a new
revision
Are talking about the new structure here? In the old structure, yes, you would
need to "traverse the complete revisions directory", but in the new structure,
for each file copied, they're all in the same directory - thus whether you
want x number of revisions or revisions older than x days, they're all there
in the same directory to determine. Perhaps this debate is getting mixed up
with my suggestion that the old structure remain as an option. If the old
structure were selected, then yes, the version limit would cost too much in
terms of performance.
> A lot of files have the same name but a different extension, e.g.
"source.cpp, source.h". The extension is therefore used as part of the name.
Yes, I understand that. You said the new format for the filename is to be:
<filename>.<ext> YYYY-MM-DD HHMMSS.<ext>
And I'm suggesting that the middle <ext> reference be removed, and just keep
the extension at the end where it belongs:
<filename> YYYY-MM-DD HHMMSS.<ext>
> What reasons do you mean exactly?
"Having a choice of delimeter allows for flexibility to generate a file
listing and then act on the file listing through other utilities/scripts so
that <filename> (delim) <YYYY-MM-DD HHMMss> <ext> is easily parsed for
actions. Simply modify the DELIM= variable just under the :BEGIN label to a
valid delimeter of your choice. I've using "---" by default."
This would be especially useful to address the ability to do a restore using
the new naming convention as I mentioned earlier - especially if the old
revisioning structure is not an option moving forward, which would make
restore files from a certain date easy. As for the suggestion that FFS is
intended and a sync utility, not a backup utility doesn't make sense to me,
especially in the context of keep file revisions.
complete revisions directory including subdirectories. On the other hand if
you need to ensure that only a fixed number of revisions exist per file, you
only need to check a single directory, and only do this if you add a new
revision
Are talking about the new structure here? In the old structure, yes, you would
need to "traverse the complete revisions directory", but in the new structure,
for each file copied, they're all in the same directory - thus whether you
want x number of revisions or revisions older than x days, they're all there
in the same directory to determine. Perhaps this debate is getting mixed up
with my suggestion that the old structure remain as an option. If the old
structure were selected, then yes, the version limit would cost too much in
terms of performance.
> A lot of files have the same name but a different extension, e.g.
"source.cpp, source.h". The extension is therefore used as part of the name.
Yes, I understand that. You said the new format for the filename is to be:
<filename>.<ext> YYYY-MM-DD HHMMSS.<ext>
And I'm suggesting that the middle <ext> reference be removed, and just keep
the extension at the end where it belongs:
<filename> YYYY-MM-DD HHMMSS.<ext>
> What reasons do you mean exactly?
"Having a choice of delimeter allows for flexibility to generate a file
listing and then act on the file listing through other utilities/scripts so
that <filename> (delim) <YYYY-MM-DD HHMMss> <ext> is easily parsed for
actions. Simply modify the DELIM= variable just under the :BEGIN label to a
valid delimeter of your choice. I've using "---" by default."
This would be especially useful to address the ability to do a restore using
the new naming convention as I mentioned earlier - especially if the old
revisioning structure is not an option moving forward, which would make
restore files from a certain date easy. As for the suggestion that FFS is
intended and a sync utility, not a backup utility doesn't make sense to me,
especially in the context of keep file revisions.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> can never have more than X revisions, which means if I modify a file a lot,
I might end up only a few days back of old versions
Good point, a limit on "revision count" cannot avoid old versions getting
lost, if a specific file is updated and synced repeatedly in a short time
frame.
But the deeper question is what do you want to achieve? If you just want do
make sure you are able to restore a version, say 10 days back, then you just
would not set a limit at all. Maybe you want to conserve disk space? In this
case, perhaps an option "limit revision total to X bytes" may be more
suitable?
From a performance point of view it is the same if I traverse the revisions
directory including subdirectory in order to remove outdated files or if I
count the bytes and delete as many old files as are required to get below a
"bytes limit".
> filter and delete revisions older than the date range right after you copy
the new revisions in. But as I reasoned through that, I see the catch in this
is that, that works for files that are being synced
Yes. If the user sets option "limit to x days" he naturally expects this to be
applied to all files, not just the ones that are newly synced. If the
functionality would be weakened to just apply to files touched, it may not be
useful at all anymore; but at least the performance problem would be gone and
it would scale nicely ;)
> other software I've checked out does all use databases for all types of
syncs
Most of the performance considerations concerning scanning all sub-directories
could be invalidated if there were an index file, which contains the location
and date of all revisioned files. I don't like this idea too much, because the
information is redundant and can theoretically become out of sync with the
real representation. Perhaps this is even more tricky than it should be, since
it's an implementation detail leaking out for reasons not apparent for the
user.
Also the performance argument should not be overestimated: In my relatively
large revisions directory located on a non-buffered USB memory stick,
traversing the full tree structure takes less than a second. In conceivably
larger scenarios where this may become a real bottleneck I should show a
status in FFS like "scanning revisions directory". This will make it
transparent to the user what FFS is doing and allow him to just select "limit
revision count" instead, which does not show this behavior. So he can decide
himself if "last x days" is worth the performance drawback or not.
> Are talking about the new structure here?
Yes. The performance may become a problem if the requirement is to "remove all
outdated files" rather than only the files that are newly revisioned.
> And I'm suggesting that the middle <ext> reference be removed
In this case sorting would not work because files that differ in extension
only would be intermixed according to the date.
> choice of delimeter
I don't think a delmiter is really needed. The time stamp is special enough to
be detectable programatically:
A simple filter that matches a versioned file could be:
* *-*-* *
or to be more safe:
* ????-??-?? ??????
or even with regex
.* {4}-{2}-{2} {6}
.* \d{4}-\d{2}-\d{2} \d{6}
Just to make sure I don't miss something when thinking about requirements;
these are the goals a limit option (whatever that will be) should fulfill:
- limit number of revisions in order to have a clearly arranged, short, user-friendly list when viewed with a file manager
- limit disk space
I might end up only a few days back of old versions
Good point, a limit on "revision count" cannot avoid old versions getting
lost, if a specific file is updated and synced repeatedly in a short time
frame.
But the deeper question is what do you want to achieve? If you just want do
make sure you are able to restore a version, say 10 days back, then you just
would not set a limit at all. Maybe you want to conserve disk space? In this
case, perhaps an option "limit revision total to X bytes" may be more
suitable?
From a performance point of view it is the same if I traverse the revisions
directory including subdirectory in order to remove outdated files or if I
count the bytes and delete as many old files as are required to get below a
"bytes limit".
> filter and delete revisions older than the date range right after you copy
the new revisions in. But as I reasoned through that, I see the catch in this
is that, that works for files that are being synced
Yes. If the user sets option "limit to x days" he naturally expects this to be
applied to all files, not just the ones that are newly synced. If the
functionality would be weakened to just apply to files touched, it may not be
useful at all anymore; but at least the performance problem would be gone and
it would scale nicely ;)
> other software I've checked out does all use databases for all types of
syncs
Most of the performance considerations concerning scanning all sub-directories
could be invalidated if there were an index file, which contains the location
and date of all revisioned files. I don't like this idea too much, because the
information is redundant and can theoretically become out of sync with the
real representation. Perhaps this is even more tricky than it should be, since
it's an implementation detail leaking out for reasons not apparent for the
user.
Also the performance argument should not be overestimated: In my relatively
large revisions directory located on a non-buffered USB memory stick,
traversing the full tree structure takes less than a second. In conceivably
larger scenarios where this may become a real bottleneck I should show a
status in FFS like "scanning revisions directory". This will make it
transparent to the user what FFS is doing and allow him to just select "limit
revision count" instead, which does not show this behavior. So he can decide
himself if "last x days" is worth the performance drawback or not.
> Are talking about the new structure here?
Yes. The performance may become a problem if the requirement is to "remove all
outdated files" rather than only the files that are newly revisioned.
> And I'm suggesting that the middle <ext> reference be removed
In this case sorting would not work because files that differ in extension
only would be intermixed according to the date.
> choice of delimeter
I don't think a delmiter is really needed. The time stamp is special enough to
be detectable programatically:
A simple filter that matches a versioned file could be:
* *-*-* *
or to be more safe:
* ????-??-?? ??????
or even with regex
.* {4}-{2}-{2} {6}
.* \d{4}-\d{2}-\d{2} \d{6}
Just to make sure I don't miss something when thinking about requirements;
these are the goals a limit option (whatever that will be) should fulfill:
- limit number of revisions in order to have a clearly arranged, short, user-friendly list when viewed with a file manager
- limit disk space
- Posts: 24
- Joined: 25 Nov 2009
Ah! The light has gone on.
As for the limit in X days and traversing the directory discussion: yes I was
only considering the files that are being modified to be checked for limit X
days - I hadn't considered *all files* in the REVISIONS subdir to be evaluated
for x days - that being the case, yes, I agree that can be a performance
concern. So perhaps a check-box option for "Apply limit on all files" or
"Apply to only modified files" (or some better verbiage). Personally, I would
only use "apply only to modified files".
A "revision total to X bytes" sounds interesting - that would be a unique
feature, wouldn't it? If it's "cheap" to implement as an option, why not -
could prove to be useful.
And about the filename convention using your 2 references of <ext> vs. my
suggestion of just once (the suffix). I now see your point and makes good
sense.
As for the limit in X days and traversing the directory discussion: yes I was
only considering the files that are being modified to be checked for limit X
days - I hadn't considered *all files* in the REVISIONS subdir to be evaluated
for x days - that being the case, yes, I agree that can be a performance
concern. So perhaps a check-box option for "Apply limit on all files" or
"Apply to only modified files" (or some better verbiage). Personally, I would
only use "apply only to modified files".
A "revision total to X bytes" sounds interesting - that would be a unique
feature, wouldn't it? If it's "cheap" to implement as an option, why not -
could prove to be useful.
And about the filename convention using your 2 references of <ext> vs. my
suggestion of just once (the suffix). I now see your point and makes good
sense.
- Posts: 74
- Joined: 17 Mar 2008
Let me cast my vote on this very interesting discussion - and I am really
excited about this change, as I have been doing a lot of clean up of older
"monthly" folders based on a folder naming convention designed to let me do
the purging more easily.
My vote: keep it simple at first and implement a version limit. If necessary,
you could add this as a limit per folder pair, so that we could have some
flexibility to manage the version limits based on how often a particular file
set may change.
I understand the drive to include other filters, like "make sure this revision
folder is never bigger than..." but then the one time you increase the number
of files or make some larger changes and you then unexpectedly bump one of
your revisions, then this option will seem ill-advised.
excited about this change, as I have been doing a lot of clean up of older
"monthly" folders based on a folder naming convention designed to let me do
the purging more easily.
My vote: keep it simple at first and implement a version limit. If necessary,
you could add this as a limit per folder pair, so that we could have some
flexibility to manage the version limits based on how often a particular file
set may change.
I understand the drive to include other filters, like "make sure this revision
folder is never bigger than..." but then the one time you increase the number
of files or make some larger changes and you then unexpectedly bump one of
your revisions, then this option will seem ill-advised.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> A "revision total to X bytes" sounds interesting
This would essentially be a Windows Recycle Bin reimplementation; I'm
beginning to like this idea... it would be simpler to grasp than "limit last x
days", have more clear semantics and do a similar thing.
> My vote: keep it simple at first and implement a version limit.
I agree to keep it simple. It's hard to pull features back later and it's
difficult to decide if users complaining miss a feature or are just unwilling
to use the alternative.
But I figured to first go with the "limit bytes" option, which looks like a
safe bet considering the Recycle Bin similarities.
I'm still uncertain under what scenarios a version limit is actually useful.
It doesn't guarantee that the total file size is kept in bounds and also does
not handle the scenario of frequent updates for a single file. I first thought
it might help to keep the revisions directory small and more managable. But in
my tests, this isn't really a big deal. If you look for a specific file, you
enter the initial letters in explorer and find it instantly.
> add this as a limit per folder pair
In any case, the limit options will be at both local and global folder pair
level. Since they are part of the sync config, local configuration, if
existent, overwrites the global one.
This would essentially be a Windows Recycle Bin reimplementation; I'm
beginning to like this idea... it would be simpler to grasp than "limit last x
days", have more clear semantics and do a similar thing.
> My vote: keep it simple at first and implement a version limit.
I agree to keep it simple. It's hard to pull features back later and it's
difficult to decide if users complaining miss a feature or are just unwilling
to use the alternative.
But I figured to first go with the "limit bytes" option, which looks like a
safe bet considering the Recycle Bin similarities.
I'm still uncertain under what scenarios a version limit is actually useful.
It doesn't guarantee that the total file size is kept in bounds and also does
not handle the scenario of frequent updates for a single file. I first thought
it might help to keep the revisions directory small and more managable. But in
my tests, this isn't really a big deal. If you look for a specific file, you
enter the initial letters in explorer and find it instantly.
> add this as a limit per folder pair
In any case, the limit options will be at both local and global folder pair
level. Since they are part of the sync config, local configuration, if
existent, overwrites the global one.
- Posts: 24
- Joined: 25 Nov 2009
But I figured to first go with the "limit bytes" option, which looks like a
safe bet considering the Recycle Bin similarities.
Boy, I sure don't know about this. "Limit bytes" per file or per directory?
That sure seems like a moving target, as one backup may be 10s of MB and
another 100s of MB (or GB as the case may be). Seems like the most practical
and common use would be number of revisions of any given/changed file. But in
the case of multiple changes in a single day to a single file, then there may
need to have number of days modifier. And as mfreedberg points out:
but then the one time you increase the number of files or make some larger changes and you then unexpectedly bump one of your revisions, then this option will seem ill-advised.
safe bet considering the Recycle Bin similarities.
Boy, I sure don't know about this. "Limit bytes" per file or per directory?
That sure seems like a moving target, as one backup may be 10s of MB and
another 100s of MB (or GB as the case may be). Seems like the most practical
and common use would be number of revisions of any given/changed file. But in
the case of multiple changes in a single day to a single file, then there may
need to have number of days modifier. And as mfreedberg points out:
but then the one time you increase the number of files or make some larger changes and you then unexpectedly bump one of your revisions, then this option will seem ill-advised.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> "Limit bytes" per file or per directory?
This was about a limit on the total number of bytes of the whole revisions
directory.
This was about a limit on the total number of bytes of the whole revisions
directory.
- Posts: 24
- Joined: 25 Nov 2009
Ah, I guess that makes more sense. However, how would you determine that? That
sounds hairy. You would first need to scan the entire REVISIONS root to get
current size, right? That can be costly - unless you're keeping track of that
in some index file. Then how would you determine *what* to delete in order to
make room for the next backup? Say I'm 10MB from my limit. I have 9 1 MB files
to copy, and 1 11MB file to copy. Seems like it'd be difficult to reconcile
what to delete to make room.
Seems like the x revisions and and x revision-days is a more simple
implementation and less costly (or maybe my light has just gone off again. :-)
)
sounds hairy. You would first need to scan the entire REVISIONS root to get
current size, right? That can be costly - unless you're keeping track of that
in some index file. Then how would you determine *what* to delete in order to
make room for the next backup? Say I'm 10MB from my limit. I have 9 1 MB files
to copy, and 1 11MB file to copy. Seems like it'd be difficult to reconcile
what to delete to make room.
Seems like the x revisions and and x revision-days is a more simple
implementation and less costly (or maybe my light has just gone off again. :-)
)
- Posts: 7
- Joined: 1 Aug 2012
Gotta put my own big "no" vote in for limit by size, or at least not without
having it as an additional option to either (or both) by revisions or by days.
I want to know there's consistency in my retention.
And again, comparing to other sync software I've tried, they all offer by days
or by # of revisions, and none offer based on size.
If anything, I could see size being on top of # of revisions or # of days so
that regardless of those specs it never exceeds a certain size and doesn't
fill up the drive with revisions, but still keeps # of revisions or # of days
as long as the size isn't exceeded, but I'd still rather manage size myself
and adjust my # of revisions or # of days if I start filling up my drive with
revisions. And having ONLY the option by size wouldn't be used by me.
having it as an additional option to either (or both) by revisions or by days.
I want to know there's consistency in my retention.
And again, comparing to other sync software I've tried, they all offer by days
or by # of revisions, and none offer based on size.
If anything, I could see size being on top of # of revisions or # of days so
that regardless of those specs it never exceeds a certain size and doesn't
fill up the drive with revisions, but still keeps # of revisions or # of days
as long as the size isn't exceeded, but I'd still rather manage size myself
and adjust my # of revisions or # of days if I start filling up my drive with
revisions. And having ONLY the option by size wouldn't be used by me.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> That sounds hairy.
If that's hairy, then so is Windows Recycle Bin, it's the same aproach.
Performance will be similar (but probably much better :)
> how would you determine *what* to delete
Delete as many old files (= use time-stamp at end of filename) until the total
is below the threshold.
For all three options "limit versions count, limit by days, limit by size"
performance won't be the decisive aspect. But I certainly don't want to
implement all three options. As a mathematician, I'd like to find a formal
argument for one or a combination of two options, or maybe a third one not
considered yet. Seems this one needs deeper thinking.
If that's hairy, then so is Windows Recycle Bin, it's the same aproach.
Performance will be similar (but probably much better :)
> how would you determine *what* to delete
Delete as many old files (= use time-stamp at end of filename) until the total
is below the threshold.
For all three options "limit versions count, limit by days, limit by size"
performance won't be the decisive aspect. But I certainly don't want to
implement all three options. As a mathematician, I'd like to find a formal
argument for one or a combination of two options, or maybe a third one not
considered yet. Seems this one needs deeper thinking.
- Posts: 24
- Joined: 25 Nov 2009
"As a mathematician..." - well that helps to explain why this probably makes
more sense to you. :-)
"Delete as many old files..." - from what directories? You say size is based
on the entire REVISIONS directory structure, so you would be deleting older
files from a random mix of subdirectories, the oldest in any given directory
under REVISIONS? Again, unless you're keeping an index of these files, I don't
know how you're going to efficiently do that - but, you're the mathematician
*AND* the programmer, so you clearly know better than me!
:-)
Also, if you're just deleting the oldest, say you have 10 MB free, and need to
copy a 100 MB file. How do you know you won't be whacking the only backup copy
of a bunch of small files in order to make the room? If you're oldest file
happens to be a 200MB file, then you're an easy one delete away from making
room. But if your oldest files happen to be a bunch of small files (like
valuable scripts!), you could run the risk of deleting some "precious" files.
For that matter, that oldest 200MB might be your only, last backup of that
file. Just sounds too arbitrary of what might get deleted.
However, if you implement an index, then all these concerns go away, as you'll
be able to more intelligently (I think) ensure that you're not deleting your
only backup of a particular file.
Of the 3 options, limit version count and limit days seems not only easier,
but more appropriate for FFS to manage as part of the process. Limit by size
seems it would be more of a storage management issue handled outside of FFS.
Personally, I can see where I would used the first 2 options, but can't
conceive of when I'd ever use the last option - limit by size.
Speaking of storage management...that might be better handle by using hard
links as posted here (hint, hint)!
:-)
more sense to you. :-)
"Delete as many old files..." - from what directories? You say size is based
on the entire REVISIONS directory structure, so you would be deleting older
files from a random mix of subdirectories, the oldest in any given directory
under REVISIONS? Again, unless you're keeping an index of these files, I don't
know how you're going to efficiently do that - but, you're the mathematician
*AND* the programmer, so you clearly know better than me!
:-)
Also, if you're just deleting the oldest, say you have 10 MB free, and need to
copy a 100 MB file. How do you know you won't be whacking the only backup copy
of a bunch of small files in order to make the room? If you're oldest file
happens to be a 200MB file, then you're an easy one delete away from making
room. But if your oldest files happen to be a bunch of small files (like
valuable scripts!), you could run the risk of deleting some "precious" files.
For that matter, that oldest 200MB might be your only, last backup of that
file. Just sounds too arbitrary of what might get deleted.
However, if you implement an index, then all these concerns go away, as you'll
be able to more intelligently (I think) ensure that you're not deleting your
only backup of a particular file.
Of the 3 options, limit version count and limit days seems not only easier,
but more appropriate for FFS to manage as part of the process. Limit by size
seems it would be more of a storage management issue handled outside of FFS.
Personally, I can see where I would used the first 2 options, but can't
conceive of when I'd ever use the last option - limit by size.
Speaking of storage management...that might be better handle by using hard
links as posted here (hint, hint)!
:-)
- Posts: 7
- Joined: 1 Aug 2012
I agree with bkeadle, I think by size is not useful as it's not consistent
since you never know what's going to need to be synced and how that might
affect what files are left. I don't mean that from a programmer standpoint, I
just mean from an end user standpoint that I don't see how anybody would find
that useful.
since you never know what's going to need to be synced and how that might
affect what files are left. I don't mean that from a programmer standpoint, I
just mean from an end user standpoint that I don't see how anybody would find
that useful.
- Posts: 71
- Joined: 22 May 2006
So far I was only reading this interesting topic, personally I do not use the
versioning because I do not agree on keeping the files on one "side" only... I
have wrote somewhere else that it's a nonsense to move from one side to the
other the "versioned" files... :-)
Anyway me too I would like to give my "NO" vote for the limit-by-size version,
I agree that only limit-by-time and limit-by-number should be implemented! :-)
Ciao, Giangi
versioning because I do not agree on keeping the files on one "side" only... I
have wrote somewhere else that it's a nonsense to move from one side to the
other the "versioned" files... :-)
Anyway me too I would like to give my "NO" vote for the limit-by-size version,
I agree that only limit-by-time and limit-by-number should be implemented! :-)
Ciao, Giangi
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> You say size is based on the entire REVISIONS directory structure, so you
would be deleting older files from a random mix of subdirectories
> I don't see how anybody would find that useful.
Hm, Microsoft seems to think it's useful, and most users probably think
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least. Generally, the problem of accidentally deleting
required data because you delete a single large file is less a big deal than
one might think. This is because the limit for Windows Recycler is quite
large, like 10% of the volume's total size by default.
All three options do a similar thing, but have different trade-offs. I'd like
to get rid of this redundancy and have like two orthogonal features.
would be deleting older files from a random mix of subdirectories
> I don't see how anybody would find that useful.
Hm, Microsoft seems to think it's useful, and most users probably think
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least. Generally, the problem of accidentally deleting
required data because you delete a single large file is less a big deal than
one might think. This is because the limit for Windows Recycler is quite
large, like 10% of the volume's total size by default.
All three options do a similar thing, but have different trade-offs. I'd like
to get rid of this redundancy and have like two orthogonal features.
- Posts: 71
- Joined: 22 May 2006
> Hm, Microsoft seems to think it's useful, and most users probably think
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least
Uhm... I think you are mixing apples with pears... :-)
The Recycle Bin is just ONE single container, while with the term
Versioning you mean a place where storing different versions of the sime
item.
So for the first is correct to do a delete-by-size while for the second is
not! ...of course is just my opinion... :-)
Windows Recycle Bin is useful, so a "limit by total size" cannot be a plain
stupid idea, at least
Uhm... I think you are mixing apples with pears... :-)
The Recycle Bin is just ONE single container, while with the term
Versioning you mean a place where storing different versions of the sime
item.
So for the first is correct to do a delete-by-size while for the second is
not! ...of course is just my opinion... :-)
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> Recycle Bin is just ONE single container
It's actually one container per volume, each with a limit on total size.
> with the term Versioning you mean
It's not fixed what "versioning" should mean for FFS. I picked the name
because I think it sounds nice and it creates "versions", i.e. files with a
time stamp. The goal is not so much to implement the perfect versioning
(whatever the exact definitions may be), but rather find the optimal
functionality for a third "deletion handling" option next to "delete
permanently" and "use recycler".
It's actually one container per volume, each with a limit on total size.
> with the term Versioning you mean
It's not fixed what "versioning" should mean for FFS. I picked the name
because I think it sounds nice and it creates "versions", i.e. files with a
time stamp. The goal is not so much to implement the perfect versioning
(whatever the exact definitions may be), but rather find the optimal
functionality for a third "deletion handling" option next to "delete
permanently" and "use recycler".
- Posts: 71
- Joined: 22 May 2006
Ok, I have omitted the "per volume"... but it's still one container "flat",
without a directory structure... :-)
I understand what you mean, but the word versioning gives the user a
"specific" idea of what it means! ...at least it did it with me (and english
is not my mother language... :-)
Adding a time-stamp to the file make it a "real" versioning system (of course
not as accurate as a CSV system can be!) :-)
Anyway I was just trying to not promote the limit-by-size logic... :-))))
without a directory structure... :-)
I understand what you mean, but the word versioning gives the user a
"specific" idea of what it means! ...at least it did it with me (and english
is not my mother language... :-)
Adding a time-stamp to the file make it a "real" versioning system (of course
not as accurate as a CSV system can be!) :-)
Anyway I was just trying to not promote the limit-by-size logic... :-))))
- Posts: 2451
- Joined: 22 Aug 2012
I really support the new version approach!
Like Giangi I don't seen the need for a limitation in size.
I like the approach of X-versions and Y-days.
However, I would suggest to be able to choose between an AND and an OR of
these two conditions.
Programmatically this seems to be straigh forward.
If not, I would personally prefer the AND condition; i.e. versions only get
deleted if they are at least Y-days old
and if there are at least X newer versions.
Next: the nice option to select a file in Windows Explorer and then via right
mouse click see (and restore?) the
available previous versions ...
Plerry
Like Giangi I don't seen the need for a limitation in size.
I like the approach of X-versions and Y-days.
However, I would suggest to be able to choose between an AND and an OR of
these two conditions.
Programmatically this seems to be straigh forward.
If not, I would personally prefer the AND condition; i.e. versions only get
deleted if they are at least Y-days old
and if there are at least X newer versions.
Next: the nice option to select a file in Windows Explorer and then via right
mouse click see (and restore?) the
available previous versions ...
Plerry
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
I'm a little surprised that there is no interest in a "limit by total size"
which is a "recycle bin" alternative for USB sticks and network shares. After
all this is the most prominent way of versioning unconsciously used by all
Windows users. And they seem to be okay with the limitations; I've rarely seen
requests for alternate ways to manage deleted files on Windows.
I've done some more research, the three options discussed so far "limit count,
limit total size, limit by days" are the standard limits used in versioning. I
had hoped there would be some alternate, more elegant solution, but it doesn't
seem to be the case. Most are variants or combinations of the three options
discussed. Still I was surprised to find one peculiar approach:
- keep all revisions which are a fixed number of days old only. Then keep one version per week, one per month and one per year. In other words: The older the revision the bigger the distance in time between two of them.
which is a "recycle bin" alternative for USB sticks and network shares. After
all this is the most prominent way of versioning unconsciously used by all
Windows users. And they seem to be okay with the limitations; I've rarely seen
requests for alternate ways to manage deleted files on Windows.
I've done some more research, the three options discussed so far "limit count,
limit total size, limit by days" are the standard limits used in versioning. I
had hoped there would be some alternate, more elegant solution, but it doesn't
seem to be the case. Most are variants or combinations of the three options
discussed. Still I was surprised to find one peculiar approach:
- keep all revisions which are a fixed number of days old only. Then keep one version per week, one per month and one per year. In other words: The older the revision the bigger the distance in time between two of them.
- Posts: 24
- Joined: 25 Nov 2009
"Recyle bin" is fine as a fail-safe to recover deleted files, but should not
be counted on as a backup "method". One should not "file" things in trash -
it's not good practice in the physical world, nor in the digital world. :-)
"trash" implies uneeded, and should it get deleted, it should be of no
concern. But using FFS as a backup tool doesn't mean what I keep in backup are
"trash", except for conditions of count or age (days).
be counted on as a backup "method". One should not "file" things in trash -
it's not good practice in the physical world, nor in the digital world. :-)
"trash" implies uneeded, and should it get deleted, it should be of no
concern. But using FFS as a backup tool doesn't mean what I keep in backup are
"trash", except for conditions of count or age (days).
- Posts: 24
- Joined: 25 Nov 2009
I should add.. the way you put it, "Recycle bin for USB and network shares"
does sound attractive, but again, it seems an arbitrary way of keeping
backups, and is at risk of deleting an only backup of any (random) file.
does sound attractive, but again, it seems an arbitrary way of keeping
backups, and is at risk of deleting an only backup of any (random) file.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
I'm beginning to see your point: We're dealing with two similar but different
scenarios:
1. the user is manually managing his files and as a human is making mistakes. So he needs a way to quickly undo a deletion via Recycle Bin.
2. file management is handled by a synchronization tool. Conceptually this can be seen as one layer above 1: since the individual operations are guided by rules (automatic, mirror sync) and sync is automated, there is less demand for a facility to undo an accidental deletion. Further, sync is initiated at regular times when the user considers his data consistent. So the demand is more for keeping a backup of different versions than undoing inconsiderate deletions.
A "limit by total size" respects limited disk space more than the implicit
knowledge that more recent versions contain more relevant data. Considering
today's big backup hard drives, this may not be the right tradeoff anymore.
This leaves "limit revision count" and "by date". Former doesn't limit the
total size directly, it still scales with the total size of the user's backup
data. E.g. setting a revision count of 10 limits the total size of the
revisions directory to roughly 10 x user data size (assuming the file sizes
stay the same). Also it will keep at least one version per file. The overall
semantics look quite useful.
For "limit by x days" there are two variants: I) Apply to each file after
sync. This will delete all revisions of a particular file if it was not
updated in the recent x days. This seems to be similar a behavior like Recycle
Bin: Ensure to be able to recover data within a limited time frame (for
recycler the time frame is implicitly defined by its size and number and size
of new deletions) From a backup perspective it's less useful as you generally
may not find any "old" versions.
II) Apply only to newly revisioned files: This ensures there will always be at
least one old version per file, similar to "limit revision count". On the
other hand there is no (implicit) limit on the total size of revisioned data.
Large collections of old revisions will not be cleaned until a new revision is
added for a particular file.
So far a "limit revision count" seems to offer the greatest advantages and the
least drawbacks.
scenarios:
1. the user is manually managing his files and as a human is making mistakes. So he needs a way to quickly undo a deletion via Recycle Bin.
2. file management is handled by a synchronization tool. Conceptually this can be seen as one layer above 1: since the individual operations are guided by rules (automatic, mirror sync) and sync is automated, there is less demand for a facility to undo an accidental deletion. Further, sync is initiated at regular times when the user considers his data consistent. So the demand is more for keeping a backup of different versions than undoing inconsiderate deletions.
A "limit by total size" respects limited disk space more than the implicit
knowledge that more recent versions contain more relevant data. Considering
today's big backup hard drives, this may not be the right tradeoff anymore.
This leaves "limit revision count" and "by date". Former doesn't limit the
total size directly, it still scales with the total size of the user's backup
data. E.g. setting a revision count of 10 limits the total size of the
revisions directory to roughly 10 x user data size (assuming the file sizes
stay the same). Also it will keep at least one version per file. The overall
semantics look quite useful.
For "limit by x days" there are two variants: I) Apply to each file after
sync. This will delete all revisions of a particular file if it was not
updated in the recent x days. This seems to be similar a behavior like Recycle
Bin: Ensure to be able to recover data within a limited time frame (for
recycler the time frame is implicitly defined by its size and number and size
of new deletions) From a backup perspective it's less useful as you generally
may not find any "old" versions.
II) Apply only to newly revisioned files: This ensures there will always be at
least one old version per file, similar to "limit revision count". On the
other hand there is no (implicit) limit on the total size of revisioned data.
Large collections of old revisions will not be cleaned until a new revision is
added for a particular file.
So far a "limit revision count" seems to offer the greatest advantages and the
least drawbacks.
- Posts: 24
- Joined: 25 Nov 2009
Well said. On your later point, seems that II) is best option.
- Posts: 2451
- Joined: 22 Aug 2012
My earlier suggestion to use an AND condition of X-versions and Y-days:
> Versions only get deleted if they are at least Y-days old and if there are
at least X newer versions.
applied to each file after sync seems to allow the benefits of both option I)
and II) above.
* setting X to 0 would effectively give option I) :
all revisions older than Y-days will get deleted
(X=0 AND Y=0 might need to be flagged as deleting all / preventing any
revisions)
* setting X to any integer >0 would give option II) :
at least one most recent previous version is available (if the file was ever
changed ...)
(Y=0 would just keep the latest X versions, if any)
Disadvantage for X>0 and Y>0 might be that the size explodes due to frequently
getting changed.
Plerry
> Versions only get deleted if they are at least Y-days old and if there are
at least X newer versions.
applied to each file after sync seems to allow the benefits of both option I)
and II) above.
* setting X to 0 would effectively give option I) :
all revisions older than Y-days will get deleted
(X=0 AND Y=0 might need to be flagged as deleting all / preventing any
revisions)
* setting X to any integer >0 would give option II) :
at least one most recent previous version is available (if the file was ever
changed ...)
(Y=0 would just keep the latest X versions, if any)
Disadvantage for X>0 and Y>0 might be that the size explodes due to frequently
getting changed.
Plerry
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
For v5.7 I've implemented the limit on revision count which has received the
greatest consensus. Here is the beta for testing:
[404, Invalid URL: http://freefilesync.sourceforge.net/FreeFileSync_5.7_beta_setup.exe]
greatest consensus. Here is the beta for testing:
[404, Invalid URL: http://freefilesync.sourceforge.net/FreeFileSync_5.7_beta_setup.exe]
- Posts: 7
- Joined: 12 Oct 2010
Looks like I am late to the party... I have created a script that purge the
deleted folder based on X copies, and maintain a minimum time gap between
copies. The is defined by the below command:
TimeGaps=Array(1,2,3,4,8,16,32,90)
‘ Array that define the time gaps for the backup copies
' Default to:
' The 1st copy is older than 1 day but younger than 2 days
' The 2nd copy is older than 2 day but younger than 3 days
' ...
' The 5nd copy is older than 8 day but younger than 16 days
' For all cases, if there are more than 1 copies within the range, we keep the
oldest copy.
The unit of measurement may be days, hours, minutes etc... below is the code
deleted folder based on X copies, and maintain a minimum time gap between
copies. The is defined by the below command:
TimeGaps=Array(1,2,3,4,8,16,32,90)
‘ Array that define the time gaps for the backup copies
' Default to:
' The 1st copy is older than 1 day but younger than 2 days
' The 2nd copy is older than 2 day but younger than 3 days
' ...
' The 5nd copy is older than 8 day but younger than 16 days
' For all cases, if there are more than 1 copies within the range, we keep the
oldest copy.
The unit of measurement may be days, hours, minutes etc... below is the code
Option explicit
' Unit of measurement: "d" for time gap in Days, "m" for Months, "h" for hours "n" for minutes
Const stUOM="h"
' Array that defines the time gaps for the backup copies
' Default to:
' The 1st copy is older than 1 day but younger than 2 days
' The 2nd copy is older than 2 day but younger than 3 days
' ...
' The 5nd copy is older than 8 day but younger than 16 days
' For all cases, if there are more than 1 copies within the range, we keep the oldest copy.
TimeGaps=Array(1,2,3,4,8,16,32,90)
' Keep a minimum number of copies, regardless of timegap
Const MinNumOfCopies=3
'Delete files older than 32 days, regardless of # of copies to keep
Const DeleteExpiredCopies=TRUE
Dim f , LogFile, CSVFile
Dim FirstRun
'index starts fronm 0
Dim TimeGapsIndex
Dim TimeGaps
Dim NoOfCopiesToDelete
Dim CopiesDeleted
Dim bAlreadyKeptACopy
Const Simulate=FALSE
Const logCSV=TRUE
Const logLOG=FALSE
' This script compliment FreeFileSync to purge the files deleted to user-defined directory.
' Credit to [url]http://sogeeky.blogspot.com/2006/08/vbscript-using-disconnected-recordset.html[/url] where I learn ador.recordset
' and [url]http://www.scriptinganswers.com/forum2/forum_posts.asp?TID=2099&PN=1[/url] where I learn how to delete the files.
' and [url]http://www.tek-tips.com/viewthread.cfm?qid=1472748&page=4[/url] on processing named arg
'
' Parameters
'
' /Path: The user-defined directory defined within FreeFileSync
'
Dim fso, startFolder, rs
Set fso = CreateObject("Scripting.FileSystemObject")
Dim Args
'Process Argument
set Args = wScript.Arguments.Named
If Args.Exists("Path") Then
startFolder= Args.Item("Path")
else
' Default path to folder where the script is.
startFolder= fso.GetParentFolderName(Wscript.ScriptFullName)
end if
Set Args=Nothing
set rs = createobject("ador.recordset")
' Const for adro.recordset
Const adVarChar = 200
Const adWVarChar = 202
Const adDate = 7
Const adBSTR = 8
Const adDouble = 5
Const MaxCharacters = 255
Const adNumeric=131
with rs.fields
.append "FileNameFullPath",adWVarChar , MaxCharacters
.append "FileName",adWVarChar , MaxCharacters
.append "FileAge",adDouble
end with
rs.open
Const ForReading = 1, ForWriting = 2, ForAppending = 3
Const Tristate=-1 ' 0=ASICC, -1:Unicode, -2: system default
if logLOG then Set LogFile = fso.OpenTextFile(startFolder&"\DelOldFileStep.log", 8, True, Tristate)
if logCSV then Set CSVFile = fso.OpenTextFile(startFolder&"\DelOldFileStep.CSV", 8, True, Tristate)
For TimeGapsIndex=0 to UBound(TimeGaps)
if logCSV then CSVFile.Write(TimeGaps(TimeGapsIndex)&":" )
NExt
if logCSV then CSVFile.Writeline()
'---------------------- Collect file records------------------
FirstRun=True 'used in GetFilesRecords
GetFilesRecords startFolder
rs.Sort = "FileName,FileAge DESC,FileNameFullPath DESC" ' DESC/ASC
'----------------------- Off load records for debug ---------------
rs.MoveFirst
Do Until rs.EOF
if logCSV then CSVFile.Writeline(rs.Fields.Item("FileName") &vbtab &rs.Fields.Item("FileAge")&vbtab & _
left(rs.Fields.Item("FileNameFullPath"),len(rs.Fields.Item("FileNameFullPath"))-len(rs.Fields.Item("FileName")) ))
rs.MoveNext
Loop
if logCSV then CSVFile.Writeline
'----------------------- Purge files---------------
f=""
rs.MoveFirst
TimeGapsIndex=UBound(TimeGaps)
if logCSV then CSVFile.WriteLine("Purge "&startFolder&" on"&vbtab&Date&vbtab&time)
Do Until rs.EOF
if NOT (rs.Fields.Item("FileName") = f) then
if DeleteExpiredCopies then 'If you should delete files that are older that the oldest gap?
NoOfCopiesToDelete=DeleteFileCount(MinNumOfCopies,TimeGaps(UBound(TimeGaps)))
else
NoOfCopiesToDelete=DeleteFileCount(MinNumOfCopies,0)'
End if
if NoOfCopiesToDelete > 0 then
f=rs.Fields.Item("FileName")
TimeGapsIndex=UBound(TimeGaps)
bAlreadyKeptACopy=FALSE
CopiesDeleted =0
'Do not move next so that the first entry is tested too
'rs.MoveNext
End if
end if
if NoOfCopiesToDelete>0 then
if rs.Fields.Item("FileAge")>=TimeGaps(TimeGapsIndex) then
if logCSV then CSVFile.Write(f &vbtab & rs.Fields.Item("FileAge") &vbtab & left(rs.Fields.Item("FileNameFullPath"),len(rs.Fields.Item("FileNameFullPath"))-len(f) ))
if bAlreadyKeptACopy then
' If we already kept a copy for this range...
if NoOfCopiesToDelete> CopiesDeleted then
if NOT Simulate then fso.DeleteFile(rs.Fields.Item("FileNameFullPath"))
CopiesDeleted =CopiesDeleted +1
if logCSV then CSVFile.Write(vbtab&"deleted "&CopiesDeleted& "/" &NoOfCopiesToDelete)
End if
else
' Delete all files older than the oldest time id DeleteExpiredCopies=TRUE
if ((TimeGapsIndex)=UBound(TimeGaps) AND DeleteExpiredCopies) AND NoOfCopiesToDelete> CopiesDeleted then
bAlreadyKeptACopy=FALSE
if NOT Simulate then fso.DeleteFile(rs.Fields.Item("FileNameFullPath"))
CopiesDeleted =CopiesDeleted +1
if logCSV then CSVFile.Write(vbtab&"deleted "&CopiesDeleted& "/" &NoOfCopiesToDelete)
else
bAlreadyKeptACopy=TRUE
if logCSV then CSVFile.Write(vbtab&"Keep Index "&TimeGapsIndex& " "&TimeGaps(TimeGapsIndex))
End if
end if
rs.MoveNext
if logCSV then CSVFile.Writeline()
else
if TimeGapsIndex > 0 then
TimeGapsIndex=TimeGapsIndex-1
bAlreadyKeptACopy=FALSE
end if
end if
End if
Loop
if logCSV then CSVFile.WriteLine("Purge "&startFolder&" ended on "&Date&" "&time)
if logLOG then LogFile.WriteLine("Purge "&startFolder&" ended on "&Date&" "&time)
if logLOG then LogFile.Close
if logCSV then CSVFile.Close
set fso=Nothing
set rs=Nothing
'----------------------
Function DeleteFileCount(ByVal NoOfCopies,ByVal MaxAge)
Dim BookMark
Dim iCount
Dim ExitLoop
Dim refFile
ExitLoop=False
iCount =0
Bookmark=rs.Bookmark
refFile=rs.Fields.Item("FileName")
'Assume EOF when function ends
DeleteFileCount=0
Do Until rs.EOF OR ExitLoop
if (rs.Fields.Item("FileName") = refFile) then
iCount=iCount+1
if (MaxAge> 0 AND rs.Fields.Item("FileAge") > MaxAge ) then iCount=iCount+1
rs.MoveNext
else
if iCount > NoOfCopies then
rs.Bookmark=Bookmark
ExitLoop=TRUE
DeleteFileCount=iCount-NoOfCopies
if DeleteFileCount < 0 then DeleteFileCount=0
else
iCount =0
DeleteFileCount=0
Bookmark=rs.Bookmark
refFile=rs.Fields.Item("FileName")
end if
End if
loop
End Function
'------------------------
Function GetFilesRecords(folderName)
Dim folder, file, fileCollection, folderCollection, subFolder
Dim FileRelPath
Set folder = fso.GetFolder(folderName)
Set fileCollection = folder.Files
if NOT (startFolder=folderName) then
For Each file In fileCollection
FileRelPath=right(file.Path,len(file.Path)-len(startFolder)-2)
if len(FileRelPath) > 15 then
rs.addnew
rs("FileNameFullPath").Value=CStr(file.Path)
rs("FileName").Value=CStr(right(FileRelPath,len(FileRelPath)-instr(FileRelPath,"\")+1))
rs("FileAge").Value=DateDiff(stUOM,file.DateLastModified,Now)/24
rs.update
end if
Next
end if
Set folderCollection = folder.SubFolders
For Each subFolder In folderCollection
' Add a simple check to ensure that the start folder is correct.
' FreeFileSync folder is named as yyyy-mm-dd tttttt
If FirstRun AND Not mid(subFolder.Path,len(subFolder.Path)-6,1)=" " then
wscript.echo subFolder.Path&" does not look like a folder from FreeFileSync"
wscript.quit
else
FirstRun=False
End if
GetFilesRecords subFolder.Path
' Delete empty folders
If fso.getfolder(subFolder.Path).SubFolders.Count = 0 AND fso.getfolder(subFolder.Path).Files.Count = 0 Then
fso.DeleteFolder(subFolder.Path)
End If
Next
End Function
Hi, I have a question about the new behavior of versioning in 5.7
If there are two configurations:
-moms-stuff: syncs /users/mom/pictures and moves older versions to /versions
-dads-stuff: syncs /users/dad/pictures and moves older versions to /versions
If I run both, FFS 5.7 will create a directory called /versions/pictures,
where mom's and dad's pictures will be mixed. It would seem more sensible to
create a directory for each sync configuration:
- /versions/moms-stuff/pictures
- /versions/dads-stuff/pictures
This is the same behavior of 5.6, without the timestamp.
Congratulations for this program, after trying many similar ones, it's been a
very long time since I stopped searching, FFS does exactly what I need.
If there are two configurations:
-moms-stuff: syncs /users/mom/pictures and moves older versions to /versions
-dads-stuff: syncs /users/dad/pictures and moves older versions to /versions
If I run both, FFS 5.7 will create a directory called /versions/pictures,
where mom's and dad's pictures will be mixed. It would seem more sensible to
create a directory for each sync configuration:
- /versions/moms-stuff/pictures
- /versions/dads-stuff/pictures
This is the same behavior of 5.6, without the timestamp.
Congratulations for this program, after trying many similar ones, it's been a
very long time since I stopped searching, FFS does exactly what I need.
- Site Admin
- Posts: 7212
- Joined: 9 Dec 2007
> sensible to create a directory for each sync configuration
Likewise the user might want to put both into the same versions directory. By
not adding the jobname (which is not even available for a non-saved GUI
config) both scenarios can be fulfilled.
Likewise the user might want to put both into the same versions directory. By
not adding the jobname (which is not even available for a non-saved GUI
config) both scenarios can be fulfilled.
Would it make sense to have variables like these?
/versions/%job_name% %job_timestamp%
This would add flexibilit,y.
/versions/%job_name% %job_timestamp%
This would add flexibilit,y.
- Posts: 74
- Joined: 17 Mar 2008
I am seeing something odd with regards to file modified dates after migrating
to the 5.7 version and using the new versioning option. The new version is
being created in the right place, using the new (correct, as in original)
folder name with versions of the file there, but the versioned file does not
seem to have its original modified date intact, all dates on the file are now
the original created date.
Using PowerShell to get the lastwritedate for the versioned file:
(get-item "...:\backup\old
versions\2012-September\Documents\...\the_dream_was_the_same_every_n.txt
2012-09-08 203257.txt").lastwritetime
Sunday, January 30, 2011 11:58:44 AM
The modified file that was backed up:
(get-item "...\backup\...\Documents\...\the_dream_was_the_same_every_n.txt").l
astwritetime
Saturday, September 08, 2012 7:53:54 PM
Is this by design or an issue with the file copy/versioning routine? That does
not seem correct to me, that the version-backup file no longer has its
original last modified date - is this a bug or something FFS is doing by
design? Changing last modified dates on existing files does not seem like a
good idea to me.
to the 5.7 version and using the new versioning option. The new version is
being created in the right place, using the new (correct, as in original)
folder name with versions of the file there, but the versioned file does not
seem to have its original modified date intact, all dates on the file are now
the original created date.
Using PowerShell to get the lastwritedate for the versioned file:
(get-item "...:\backup\old
versions\2012-September\Documents\...\the_dream_was_the_same_every_n.txt
2012-09-08 203257.txt").lastwritetime
Sunday, January 30, 2011 11:58:44 AM
The modified file that was backed up:
(get-item "...\backup\...\Documents\...\the_dream_was_the_same_every_n.txt").l
astwritetime
Saturday, September 08, 2012 7:53:54 PM
Is this by design or an issue with the file copy/versioning routine? That does
not seem correct to me, that the version-backup file no longer has its
original last modified date - is this a bug or something FFS is doing by
design? Changing last modified dates on existing files does not seem like a
good idea to me.