Exponential slowdown when dealing with large numbers of small files

Get help for specific problems
mike loeven
Posts: 3
Joined: 4 Nov 2018

Post by mike loeven • 14 Apr 2019, 22:21

Been noticing some issues syncing a specific application's data folder between desktop and laptop

Running a backup with 8 threads and when initially starting the sync within a few seconds it will scan and analyze about 15K files however it will than take about 30 seconds to get to 25 and than about 5 minutes to get to 30k and the slow down keeps increasing exponentially as the number of small loose files being scanned increases. not sure what is causing the exponential performance loss however a simple file scan operation should theoretically maintain a consistent scan rate regardless of number of files in a directory especially since comparison mode size and date not actual file data

User avatar
Zenju
Site Admin
Posts: 4927
Joined: 9 Dec 2007

Post by Zenju • 16 Apr 2019, 17:34

Hard to tell in general. How are other tools behaving? As always you need to consider caching effects which could already explain this.

wm-sf
Posts: 73
Joined: 13 Nov 2003

Post by wm-sf • 16 Apr 2019, 18:26

I have had exactly the same problem since I installed 10.11 yesterday. Directories with many files bring it to a standstill. I noticed this most obviously on Google Chrome's \Code Cache\js directory which contained 65K or so very small files. I trimmed that last night to under 30K but my mirror this afternoon hadn't completed after more than an hour (normally it takes 10 minutes or so) and it was very obviously stepping through the files in that directly one by one. Win 8.1 no other system problems suddenly appeared in the last day or so. Probably best for me to go back to 10.8 (the last version I was using) which didn't display this problem (I kept the install file).

wm-sf
Posts: 73
Joined: 13 Nov 2003

Post by wm-sf • 19 Apr 2019, 16:09

I've found a temp work around, if I Clear browsing data / Cached images and files (all time) before I exit GoogleChrome then the \Code Cache\js directory is emptied and FFS works as before (I had to run it twice to clear out the other side of the mirror because I delete to the recycle bin).

So I'm OK for now but it may not suit the original poster as he may need all the files in his big directory.

It does prove to my satisfaction that the problem is a large number of files in a directory.

mike loeven
Posts: 3
Joined: 4 Nov 2018

Post by mike loeven • 28 Apr 2019, 04:55

@Zenju can you clarify what you mean by caching effects so know what to look for? But I doubt caching is the issue as the files being transferred are small text files only a few KB each at most and when the folder is manually copied using robocopy from the command line in mirror mode (which makes it copy files similarly to a one way sync in FFS) it takes a very small fraction of the time FFS takes to run a sync.

User avatar
Zenju
Site Admin
Posts: 4927
Joined: 9 Dec 2007

Post by Zenju • 28 Apr 2019, 08:28

In order to clarify that the issue is related to FFS, and not something else you'd have to set up two test cases (one FFS, one robocopy) with identical starting conditions. If, say FFS performance were to be tested first, then robocopy next, this would skew the results because the OS caches file I/O from the first run.

wm-sf
Posts: 73
Joined: 13 Nov 2003

Post by wm-sf • 28 Apr 2019, 13:38

mike loeven wrote:
28 Apr 2019, 04:55
But I doubt caching is the issue as the files being transferred are small text files only a few KB each at most and when the folder is manually copied using robocopy from the command line in mirror mode (which makes it copy files similarly to a one way sync in FFS) it takes a very small fraction of the time FFS takes to run a sync.
I don't think it is caching either but if you want to avoid the pitfall (and you don't have always-on systems) just restart between the two test attempts. Otherwise flush caches

It might be useful if we compare some settings, Mike, I think the most relevant will be:

Code: Select all

    <Compare>
        <Variant>TimeAndSize</Variant>
        <Symlinks>Exclude</Symlinks>
        <IgnoreTimeShift>1</IgnoreTimeShift>
    </Compare>
    <Synchronize>
        <Variant>Mirror</Variant>
        <DetectMovedFiles>false</DetectMovedFiles>
        <DeletionPolicy>RecycleBin</DeletionPolicy>
        <VersioningFolder Style="Replace"/>
    </Synchronize>