Performance with lots of folders

Discuss new features and functions
Posts: 15
Joined: 25 Mar 2005

lee_jay

The performance of FFS with lots of files is quite impressive. It's less
impressive with lots of folders. I use an application that (unfortunately) can
create tens of thousands of folders, each with 1 or 2 files. This slows down
FFS by more than an order of magnitude compared with syncing the same number
of files in a few folders. My question is, is this behavior inherent to the
operation system and thus not under the control of FFS, or could it
potentially be improved?
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

Good question. A while ago I had a similar question, that was what is the best
performance that FreeFileSync could possible achieve in an ideal world. It's
main constraint is obviously the Operating System API that encapsulates all
file system accesses.
Therefore I ran a huge testcase, scanning the complete system harddrive (about
200.000 files).

The result for a standard installation of FFS using operating system
buffering(!) for file accesses:
10781 ms

The same without using any kinds of filtering within FFS:
10281ms (that's good, it means filtering can always be left switched on)

Last and most important test: I ran the complete testcase without evaluating
any data and without doing anything else than letting the OS API traverse the
hard drive. This will practically be the minimum time that any file traversing
software will require at least:
7937ms

FFS is not that far away I would say. But back to your question. The numbers
above respresent pure CPU time (buffered file accesses). File Synchronization
however is largely IO bound (unless of course you own a SSD, in this case CPU
begins to dominate) therefore I have to say:

> not under the control of FFS
The OS file access consumes most of the runtime (in unbuffered access case)
and is responsible for most of the wait time (depends largely on your
harddrive but a rough number would be "above 80%"). The smaller part where FFS
has a say is already close to optimal as the tests so far have shown.
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

PS: I should mention that the OS API traversing to retrieve the best possible
time was a small C++ program I wrote that does nothing else than calling the
Windows C-API.
Posts: 15
Joined: 25 Mar 2005

lee_jay

Thanks for the response. If you have a test case (application/exe) you'd like
for me to run, I have a situation where I have approximately 95,000 files in
about 1,200 folders, and also another location with about 85,000 files in
approximately 50,000 folders. This is all on XP. Just doing windows properties
on these two parent folders results in more than an order of magnitude
difference in execution times so I'm not sure there's much hope.
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

Windows properties is already quite close to the ideal traversing time that is
possible. File traversing times that are (noticable) faster than that are not
feasible (with the constraint of accessing the hard drive through the OS layer
only).
Posts: 15
Joined: 25 Mar 2005

lee_jay

Found an interesting thing regarding performance, not just with lots of folder
but with lots of files too.

If you have two folders to sync, each with lots of files, and you're doing
them on a slow link (i.e. wireless), comparing them simultaneously will make
both complete faster than either one on its own, odd as that might seem. It's
not a trivial difference, it's 3x-5x faster. I don't know how that could be
incorporated into the code to increase performance but it's interesting to
note.
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

First it would need to be understood where this effect is coming from. Maybe
the filesystem cache is responsible for this behavior.
Posts: 15
Joined: 25 Mar 2005

lee_jay

It's not from the cache, I don't think. I've done full-system cold starts and
still seen this effect on the first launch. Also, if I am in the middle of a
sync and launch another one (using a different tool) the first one
dramatically speeds up as soon as the other one starts interrogating the other
system.
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

That's very strange. Would be the first case where actually computing "more"
is faster than "less"...
As the test is over a WLAN, an explanation could be that the signal strength
is increased if two consumers access the network resource.
Posts: 15
Joined: 25 Mar 2005

lee_jay

Yes, I think it's odd too. However, it's not two users. There is only one
wireless link and only two computers. All I do is add another instance of the
app and both speed up. In fact, if one instance is FFS and I add an instance
of the other app, FFS speeds up (a lot) when the other app starts accessing
the remote file system.

I can't explain it, but it's repeatable!
User avatar
Site Admin
Posts: 7210
Joined: 9 Dec 2007

Zenju

I guess you're on your own on this, I have no idea what is happening. However
if you find some optimization technique that all users could benefit from I'm
always happy to improve the tool.