How to Optimize A Multi-Directory Backup

Posts: 8 · johnb_atl 13 Feb 2012, 20:32

This forum entry is to pass along some interesting results I've achieved in
speeding up FreeFileSync's performance when backing up files across a slow
network connection. This has been a result of a long discussion with Zenju
regarding Feature Request ID: 3482558 (Optimized Order of Sync Actions)
viewable at [[404, Invalid URL: https://sourceforge.net/tracker/?func=detail&aid=3482558].]([404, Invalid URL: https://sourceforge.net/tracker/?func=detail&aid=3482558].) Also see feature request
3444554 which is about the same issue ([[404, Invalid URL: https://sourceforge.net/tracker/?func=]
detail&aid=3444554).]([404, Invalid URL: https://sourceforge.net/tracker/?func=detail&aid=3444554]
%29.)

These tests and the resulting strategy allow the user to achieve nearly the
same results as the feature requests, without losing the disk usage size
optimizations that FFS already currently performs.

Use Case: The user is a mobile or remote worker who needs to periodically back up files from their local disk to one or more network drives, across a VPN or other subjectively "slow" connection. The user wants to get as many of the smallest files done first, in case the connection gets interrupted or the FFS job has to be aborted for some reason.

Testing Scenario: Using FFS v5.0. There are 6 folder pairs to be updated, left to right. There are no overlaps between any of the directory trees regardless of whether they are sources or destinations. FFS has been set to ignore symbolic links, ignore errors, and permanently delete files. I've selected 200mb as my arbitrary boundary between "small" files and "large" files. I've allowed a 1 kb overlap to avoid accidentally excluding any files. (This size is based on a brief statistical analysis of my file sizes and connection speed. YMMV, so you might need to select a different boundary.)

Control Setup: A single job file with the 6 folder pairs listed, with no size filters in place (other than an upper limit of 64mb, an artifact of my testing environment on a VPN; all my files larger than 64mb need to wait until I'm in my office with a gigabit ethernet connection.)

Test Setup 1: A single job file with 12 pairs listed. The first 6 are the left-right pairs with a size filter of . The last 6 are the same pairings but with size filters of .

Test Setup 2: 12 job files; each file contains one of the folder pairs just described. A batch file was used to start each of the jobs in the same sequence as each pair appeared in SETUP 1, with a 2 second delay between each job start, to ensure the "large file" jobs are temporarily locked by their corresponding "small file" jobs until the small files are done.

Test Setup 3: 6 job files; each file contains two folder pair listings. The first listing is the left-right pair with the size filter of . The second listing is the same pair with the size filter. A batch file was used to start each of these jobs with a 2 second delay between each. The delay is only to ensure a proper comparison of performance against TEST SETUP 2.

RESULTS

Setups #2 and #3 completed sooner than the control setup. Setup #3 was BY FAR
the fastest, taking less than half the time of the control. Setup #1 took
roughly the same time as the control, but at least has the desired "small
files first" behavior. I would need to re-run these to obtain objective time
measurements; if anyone would like to do those separately, please post your
results here!

ANALYSIS OF RESULTS

Setup #1 (single big job file) appears to use multiple threads to scan the
file pairs and set up its operations, and doesn't appear to take appreciably
longer doing this than the control does, even though it has twice as many pair
sets. According to Zenju, after the scanning, it processes the pairs in the
sequence they appear in the job file, so it handled the sub-200kb files first.
For my particular file set, it made it's way through approx. 7000 small files
quite quickly, and then started handling the approx. 150 larger files. I
believe it did not achieve an overall net speed gain because each pair had to
wait for the previous one to complete, even though the sources and
destinations were all independent of each other.

Setup #2 (12 separate job files) achieved a moderate speed gain over Setup #1
and the control. Setup #2 allowed each instance of FFS to handle its pair
independently, avoiding the apparent bottleneck of Setup #1. Some of the large
file jobs (i.e. job files 7-12) were able to start while other small jobs were
still going. This reduced the overall net time of the backup, and it ensured
that a LOT of small files were handled early and quickly. But each of the
large-files jobs did not start scanning until its corresponding small-files
job had completed.

Setup #3 achieved a surprising speed gain over Setup #2, and completes in less
than half the time of the control! It also used less system RAM than Setup #2!
I have no way of knowing without looking at the source code (which I have no
time to do), but I suspect that Setup #3 removes a bottleneck in the initial
scanning. Each instance of FFS first performed all its scanning, then updated
the small files, and finally updated the large files. Since all the source and
target directories were independent of each other, each FFS instance was able
to operate fully independently. Perhaps during the scans, each instance had to
examine specific pair of files or directories only once?

CONCLUSION

The strategy used in Setup #3 is the clear winner. Summarizing this strategy:

* Use one job for each directory to be backed up.

* In each job, split it into 2 folder pairs. Everything should be the same between the first and second pair EXCEPT that the size filtering on the first pair should allow only "small" files and the size filtering on the second pair should allow only "large" files.

* Try to structure your jobs so they are independent of each other. In other words, minimize overlaps or avoid them altogether: between different source directories, different target directories, and sources/targets taken altogether.

* Start a new copy of FFS for each of the jobs in your set so you make the best use of bandwidth. For example, use a batchfile and set the task scheduler to run this batchfile when the system is idle.

Posts: 8 · johnb_atl 13 Feb 2012, 20:44

Oops, sorry for the broken links!

[404, Invalid URL: http://sourceforge.net/tracker/?func=detail&aid=3482558&group_id=234430&atid=1093083]
( [404, Invalid URL: http://sourceforge.net/tracker/?func=detail&aid=3482558&group_id=234430&atid=1093083] )

and

[404, Invalid URL: http://sourceforge.net/tracker/?func=detail&aid=3444554&group_id=234430&atid=1093083]
( [404, Invalid URL: http://sourceforge.net/tracker/?func=detail&aid=3444554&group_id=234430&atid=1093083] )

Posts: 8 · johnb_atl 13 Feb 2012, 20:47

Correction: There is a serious typo in my message. The size boundary I used was 200 _kilobytes_, not 200 mb!

Posts: 7210 · Zenju 13 Feb 2012, 22:52

> It also used less system RAM than Setup #2!
FreeFileSync scans each unique directory just once, where uniqueness is not
dependent from filter settings like "time span" or "file size" (but it is
dependent on name filters!) A "pair of folder pairs" with different soft
filter settings consumes roughly the same RAM as a single pair. Therefore
Setups 1 and 2 are expected to save the most RAM.

I'm surprised to see a performance difference between Setup 2 and 3 through.
FreeFileSync employs directory locking at synchronization base directory
level. So Setup 2 is effectively serialized to a pattern like Setup 3.

Maybe cache effects play a role here. But I guess I do not have to mention
that for reliable un-cached results it is required to restart all machines
between each of the test runs.

Posts: 8 · johnb_atl 14 Feb 2012, 01:51

Just so I don't leave anyone wondering, yes, I took caching into account and
started with a clean slate every time.

> FreeFileSync scans each unique directory just once, where uniqueness is not
dependent from filter settings like "time span" or "file size" (but it is
dependent on name filters!)

Actually, this handily explains why Setup 3 had such good performance! Let me
break things down with a little more granularity, and you'll see. I had 6
points in the directory tree to be backed up, and 6 target directories to back
them up to. (A real example of one of these would be C:\Documents and
Settings\username\My Documents\, copied to a network drive mapped to M:) So
let's label the source directories "A" through "F", and call the targets "U"
through "Z". ("A" gets backed up to "U", B --> V, ..., F-->Z. If I just refer
to "A" you know it means the source directory, and if I say the "'A' fileset"
you know it's a shorthand for A-->U.)

Now, I've split each of these backup sets into 2 FFS pairs, one to handle
small files and one to handle big files. I'll use lowercase subscript "s" for
small, and "h" meaning huge. (Avoiding "B" or "L" because "B" is already used
and lowercase "L" is just plain confusing.) I'll leave off the target
directories here because they're redundant. So there are 12 FFS pairs: As, Ah,
Bs, Bh, Cs Ch, Ds, Dh, Es, Eh, Fs, Fh.

Setup 2 starts 12 different copies of FFS, each one running a batch job for one of the pairs above. They're started in the following order with a 2 second delay between each: As, Bs, Cs, Ds, Es, Fs, Ah, Bh, Ch, Dh, Eh, Fh. All the "s" instances were able to start scanning immediately. Each "h" instance had to wait for its corresponding "s" instance to fully complete before it could even start scanning. As each "s" instance completed, it released the lock file and the corresponding "h" instance then picked up. (So, a lot of seriality, and each directory A, B, C etc. got scanned twice!)

Setup 3 starts only 6 different copies of FFS. Each copy runs a different FFS batch job; each batch job has two of the FFS pairs in it: , , ..., . According to your explanation, the performance increase fits, because the batch job only had to scan source "A" (and its target "U") once. I don't believe it's correct to say that Setup 2 serializes to the pattern in Setup 3, because Setup 3 is less serial and has greater parallelism.

This also explains the difference in RAM usage. Setup 1 runs 1 copy of FFS.
Setup 2 runs 12 copies. Setup 3 runs 6 copies. The code in the DLLs is shared,
so it's not a linear relationship, but each instance has it's own working
memory. And even though Setup 2 pares down the number of running instances the
fastest, if you were to compare it side-by-side with Setup 3, it would never
catch up; Setup 3 would always have fewer running instances at any moment in
time.

Posts: 7210 · Zenju 14 Feb 2012, 11:24

> I don't believe it's correct to say that Setup 2 serializes to the pattern
in Setup 3
If all 12 tasks in Setup 2 are allowed to run truly asynchronously then it's
exactly the same as Setup 3. Except for "rounding errors" due to 2 sec delays
in your batch file and up to 5 sec wait time by FFS's directory locking - the
sum of all these delays is probably significant..
But this may already be the misconception. From the setup descriptions it's
not always clear which jobs run synchronously and which start truly
asynchonously. If they run async, there is not much need of a 2 sec delay.

Posts: 8 · johnb_atl 14 Feb 2012, 16:17

In setup 3, there's no need for any delays at all. In setup 2, I could start
these with a MUCH shorter delay: the only purpose behind it is to ensure that
the FFS instances running the "s" scripts each have enough time to get started
and create their lockfiles before any of the "h" instances are started. In all
cases of starting an instance of the app, they are allowed to be free running;
i.e. I haven't used anything like the /WAIT flag in the START command.