BUG?: star synchronisation Windows 10

Get help for specific problems
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

Hello all

I have a NAS (FreeNAS-9.10.2-U6 (561f0d7a1)) called Zeus that I use as the central location for a folder synchronised across 4 Windows 10 laptops. Each laptop has, as far as I can tell, pretty much the same configuration: a FFS job performing a two-way synchronisation with the folder on the server and a local copy thereof; the FFS job log is saved locally. Each has RTS running and which compares the local (C:) version of the folder with the server equivalent and invokes the aforementioned FFS job should a change occur.

I have seen varied results:
- It works
- It sort of works
- It works and then undoes what it has done
- It fails

At this stage, I will not give details about what I mean by the above results. I will try and keep it simple because I have just managed to experience a reasonable well-encapsulated version of the above, and a good enough starting point for reporting some of the aspects of this "bug".

To prepare for this posting, I rebooted all five machines, after which I noticed the following. Note: the last action I performed was to delete a file from the local copy of the shared folder on one of the laptops. For the record, this is an example where it sort of works: it has deleted the file everywhere, but cannot seem to clean up and finish. Worse still, it is clogging up

- I see three new FFS log files, each one minute apart
- The first says that there is nothing to synchronise
- The second results in Error Code 183
- The third is the same as the first
- The shared folder on the server has a file, sync.ffs_db that has a timestamp the same as the third log file
- Watching the shared folder on the NAS, I see two other FFS files appear and disappear - the lock file and a temp temporary sync file
- At approximately 30 minute intervals this process repeats itself


The following error log appears on one of the laptops:

______________________________________________________________________________________
|19-Jan-18 - Photography - Automatic - Communal: Synchronisation completed with errors
|
| Elements processed: 0
| Total time: 00:00:02
_____________________________________________________________________________________

[10:56:23] Info: Synchronising folder pair: [<- Two way ->]
C:\Communal\Automatic Replication
\\ZEUS\Communal\Automatic Replication
[10:56:24] Error: Cannot move file
"\\ZEUS\Communal\Automatic Replication\sync.534c.ffs_tmp" to
"\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
Error Code 183: Cannot create a file when that file already exists. [MoveFileEx]
[10:56:24] Error: Synchronisation completed with errors


A little while later, I received the following after renaming a file on the Server:

_____________________________________________________________________________________
|19-Jan-18 - Photography - Automatic - Communal: Synchronisation completed with errors
|
| Elements processed: 0
| Total time: 00:00:02
_____________________________________________________________________________________

[11:20:33] Error: Cannot open file "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
Error Code 5: Access is denied. [CreateFile]
Setting default synchronisation directions: Old files will be overwritten with newer files.
[11:20:33] Warning: The following items have unresolved conflicts and will not be synchronised:
"Flip\Cooking\Duck rendang.doc": Items differ in attributes only
<- "Duck rendang.doc"
-> "Duck Rendang.doc"
[11:20:33] Info: Synchronising folder pair: [<- Two way ->]
C:\Communal\Automatic Replication
\\ZEUS\Communal\Automatic Replication
[11:20:34] Error: Cannot delete file "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
Error Code 5: Access is denied. [DeleteFile]
[11:20:34] Error: Synchronisation completed with errors

I am sceptical of the Code 5 errors: why does it not always occur (or at least, really frequently).


Thanks in advance for the help I am about to receive - please indicate if you need more information or if I have made some basic noob error.


regards, Freda
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

sync.ffs_db is at the core of the "two way" sync variant, so if reading/writing this file does not work reliably, the "two way" sync variant cannot determine the correct sync directions.

So the question is why do these errors occur:
[10:56:24] Error: Cannot move file
"\\ZEUS\Communal\Automatic Replication\sync.534c.ffs_tmp" to
"\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
Error Code 183: Cannot create a file when that file already exists. [MoveFileEx]

[11:20:34] Error: Cannot delete file "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
Error Code 5: Access is denied. [DeleteFile]
My bet is that some other application is holding a handle open to this file. Most-likely culprit: misbehaving AV software. To verify this theory you can use a tool like Process Monitor: https://freefilesync.org/faq.php#trace
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

Hi Zenju

I have just sent you a ZIP file from Process Monitor as it makes very little sense to me.

Regards, Freda
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

Further to previous posts of error logs, here are two more. They are quite similar and may be the same problem but they provide a fuller picture. The operation I was performing was deleting two local files, file01.pdf and file02.pdf which should have resulted in a propagated delete to ZEUS.
[18:12:11]  Warning:  The recycle bin is not supported by the following folders. Deleted or overwritten files will not be able to be restored:
                      \\ZEUS\Communal\Automatic Replication
[18:12:11]  Info:  Synchronising folder pair: [<- Two way ->]
                       C:\Communal\Automatic Replication
                       \\ZEUS\Communal\Automatic Replication
[18:12:11]  Info:  Deleting file "\\ZEUS\Communal\Automatic Replication\Scanned\file01.pdf"
[18:12:11]  Info:  Deleting file "\\ZEUS\Communal\Automatic Replication\Scanned\file02.pdf"
[18:12:12]  Error:  Cannot move file
                    "\\ZEUS\Communal\Automatic Replication\sync.8053.ffs_tmp" to
                    "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
                    Error Code 183: Cannot create a file when that file already exists. [MoveFileEx]
[18:12:12]  Error:  Synchronisation completed with errors
... followed pretty soon afterwards with
[18:12:39]  Warning:  Cannot set directory lock for "\\ZEUS\Communal\Automatic Replication".
                 Cannot read file attributes of "\\ZEUS\Communal\Automatic Replication\sync.ffs_lock".
                 Error Code 2: The system cannot find the file specified. [FindFirstFile]
[18:12:41]  Info:  Synchronising folder pair: [<- Two way ->]
                       C:\Communal\Automatic Replication
                       \\ZEUS\Communal\Automatic Replication
[18:12:41]  Info:  Creating file "C:\Communal\Automatic Replication\Scanned\file01.pdf"
[18:12:41]  Info:  Creating file "C:\Communal\Automatic Replication\Scanned\file02.pdf"
[18:12:41]  Warning:  Synchronisation completed with warnings
To my untrained eye, the second looks like a race condition.


Freda
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

I'm not sure what scenario you sent me, I don't see any error reading/writing the sync.ffs_db. However I'm seeing Windows Defender accessing the file, which is a recipe for the spurious file deletion errors you saw:
sync_ffs_db.png
sync_ffs_db.png (30.24 KiB) Viewed 2607 times
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

I followed your instructions regarding disabling the AV programs. Three of the laptops run Windows Defender and the other runs Avast Free. All have been modified to ignore the shared folder on the Server (ZEUS). After this change, I rebooted the server and all laptops. No change to the errant behaviour was observed.

The following may or not be relevant:

When looking at the shared folder on the Server, and without doing anything, I see the sync and lock files being updated constantly.

Each of the laptops has over a hundred log files even though no action is being performed.

It *seems* to me that the updating of the sync file is detected by a local RTS, which then caused the sync file to be updated. Which causes another RTS to detect it and so it continues. If this is the case, what would cause the process to stop.

My most recent test involved deleting a file. I received a log (warning) file saying Recycle Bin is not supported on the Server (FREENAS) - correct. That was followed by a couple of log files saying "nothing to synchronise" - WHY? And soon after that an error log file saying it could not delete the sync file. A few more "nothing to synchronise", a few "can't delete" and then a "can't create". Are you sure that all I/O operations are being performed under a lock? Because there will be concurrent updates from four laptops.

Further to the above test: I did the same thing from another laptop. It did NOT propagate the delete. Rather, it did a number of "nothing to sync" and then restored the deleted file. I do not see why I get different behaviours depending on which laptop I use.

I presume that FFS, if there is a lock file present, needs to retry its work after waiting for some some period of time. Would it not be possible to retry the I/O request on the sync file 'n' times with a short break. The assumption here is that if you could not get exclusive access to the file, then AV is probably looking at it atm but will be finished pretty quickly

Currently, of the three FFS working files, there is only a temp sync file on the Server folder. Is that normal?

I now often run Process Monitor to try and see what is happening but the output is huge - the errors may take a minute or two to appear. I have run a few with a filter set to the shared folder on the Server and that limits the output significantly. Would these filtered logs be helpful to you. If so, and/or if you would like other filters set, just let me know.

You say that the sync files are central to the process and if they are compromised/corrupted then the whole process is compromised. Would it help if I deleted all sync files in all of the folders and re-establish syncing?

I love FFS/RTS and would therefore really like to resolve this issue. This has the feel of a software rather than an environmental error but I know that looks can deceive.
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

> No change to the errant behaviour was observed.
Can you create a new Process Monitor trace up to the point where writing the sync.ffs_db file fails?

> and without doing anything, I see the sync and lock files being updated constantly.
That's a problem.

> It *seems* to me that the updating of the sync file is detected by a local RTS
ffs_db and ffs_lock files are excluded from RTS change detection.

> That was followed by a couple of log files saying "nothing to synchronise" - WHY?
Obviously RTS is detecting some changed file. Currently it's not convenient to find out what RTS is detecting, but it's possible via "%change_path%": https://freefilesync.org/manual.php?topic=realtimesync

> and then restored the deleted file
That's the issue regaring the db file access errors above.

> Would it not be possible to retry the I/O request on the sync file 'n' times with a short break.
You mean the sync.ffs_db file. Retrying I/O automatically is generally not a good idea: https://blogs.msdn.microsoft.com/oldnewthing/20051107-20/?p=33433
But anyway, first it needs to be clarified whether what causes the error is temporary at all, e.g. when caused by AV.

> Currently, of the three FFS working files, there is only a temp sync file on the Server folder. Is that normal?
I don't understand.

>Would it help if I deleted all sync files in all of the folders and re-establish syncing?
That's generally not necessary.

> This has the feel of a software rather than an environmental error but I know that looks can deceive.
You have a complex setup. It's enough to have one weak link in the chain to get some cascading effect.
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

> It *seems* to me that the updating of the sync file is detected by a local RTS
> ffs_db and ffs_lock files are excluded from RTS change detection.
Including sync.tmp.ffs_db ?

> Currently, of the three FFS working files, there is only a temp sync file on the Server folder. Is that normal?
> I don't understand.
What I mean, is that the only FFS file in the folder was sync.tmp.ffs_db

> No change to the errant behaviour was observed.
>Can you create a new Process Monitor trace up to the point where writing the sync.ffs_db file fails?
I have just emailed it to you

Here is the accompanying FFS job log:
|21-Jan-18 - Flip_VAIO - Automatic - Communal: Synchronization completed with errors
|
|    Items processed: 1 (5 bytes)
|    Total time: 00:00:29
|___________________________________________________________________________________

[19:50:33] Warning: Database file "\\ZEUS\Communal\Automatic Replication\sync.ffs_db" is incompatible.
                    Setting default synchronization directions: Old files will be overwritten with newer files.
[19:50:33] Info: Synchronizing folder pair: [<- Two way ->]
                     C:\Communal\Automatic Replication
                     \\ZEUS\Communal\Automatic Replication
[19:50:33] Info: Creating file "C:\Communal\Automatic Replication\Scanned\file01.pdf"
[19:50:33] Error: Cannot move file
                  "\\ZEUS\Communal\Automatic Replication\sync.tmp.ffs_db" to
                  "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
                  Error Code 183: Cannot create a file when that file already exists. (MoveFileEx)
[19:50:33] Error: Synchronization completed with errors
Regards, F
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

> Including sync.tmp.ffs_db ?
Yes

> What I mean, is that the only FFS file in the folder was sync.tmp.ffs_db
No, that's not normal. But probably with similar cause, namely something was locking the file, thous prevented deletion.

> I have just emailed it to you
I have not received the email.

> sync.ffs_db" is incompatible.
That's something new. Either the ffs_db file has become corrupted (this hints at a general issue with your network, since FFS saves this file transactionally) or you're using different, incompatible FFS versions at the same time.
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

That was the first time that I have seen the "incompatibility" message. On checking, I found 1 x 8.6, 2 x 9.5 and 1 x 9.7

All are now running the same version (9.7), all machines rebooted but no discernible change to the result.

@Zenju: I have just emailed you a more complete set of logs and Process Monitor output obtained after I updated the FFS/RTS suites and rebooted.
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

I've found one RTS issue in the Process Monitor log: sync.8ea2.ffs_tmp was triggering RTS, but should not, which is a regression. This explains probably most "nothing to synchronize" logs.

Here's the FFS 9.8 beta including the fixed RTS:
http://www.mediafire.com/file/dkbka7jys4c2chr/FreeFileSync_9.8_%5BBeta%5D_Windows_Setup.exe

In Logfile04.pml I see a hint about the following error:
[10:55:55]  Error:  Cannot move file
                    "\\ZEUS\Communal\Automatic Replication\sync.7bf4.ffs_tmp" to
                    "\\ZEUS\Communal\Automatic Replication\sync.ffs_db".
                    Error Code 183: Cannot create a file when that file already exists. [MoveFileEx]
already-existing.PNG
already-existing.PNG (30.95 KiB) Viewed 2537 times
There's the NAME_COLLISION (error 183), shortly followed by NAME_NOT_FOUND, both for "sync.ffs_db" which looks like a contradiction. However it can be explained by the spurious "CloseFile" in the middle: The "sync.ffs_db" deletion should already have completed above after the "CloseFile" following "SetDispositionInformationFile", but obviously it has not, so the file is still not physically deleted. Why? The call stack for the spurious "CloseFile" seems to originate from somewhere within the OS, not the FreeFileSync process. Also I do not see a corresponding "CreateFile". Network driver issue?
call-stack.PNG
call-stack.PNG (23.42 KiB) Viewed 2537 times

As for the remaining file access errors I suggest the following approach:
First simplify your test case to only include two PCs into your setup.
Then add PCs until the first "Error Code 5: Access is denied" occurs.
Run Process Monitor on the PC added last that triggered the error.
Hopefully the log will show which processes on this PC are responsible.
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

Have downloaded and installed FFS 9.8 Beta on all machines. Everything rebooted. No discernible difference in operation. Took your advice and ran some tests using two machines only.

Just about to email you another two sets of error logs:

Test 08: Rename of a file - error experienced on same machine as rename
Test 10: Deletion of a file - error experienced on other machine

Your request raises a design question. Assume three machines M1 M2 and M3, all having files F1, F2 and F3, all successfully synchronised (2-way). Switch M3 off. Rename F2 to F2a and delete F3 on M2. M1 and M2 will synchronise successfully (after this bug is addressed). Q: When M3 is switched on, what should happen to the modifications that have been made to M1 and M2. Are they queued and then applied to M3? Or will F2 and F3 now be re-instated on M1 and M2.
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

The logs are similar to the previous ones. So far the main issue seems to be "NAME_COLLISION (error 183)". First I'd check if the most recent network drivers are installed on all machines.

> Are they queued and then applied to M3?
Yes
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

According to Windows Update, all network drivers are up to date. Ditto for Device Manager and "update Driver".

I am sceptical about a network driver being the issue because I am pretty certain that when I was testing for the error and only using two computers, I created errors with every pair of computers. For it to be a driver issue, surely every network driver would have to be at fault.

For the record, two machines are Microsoft Surface Pro 4, 1 is a Sony VAIO and one is a Dell desktop. All of those machines run the latest (ie they are all up to date with maintenance) version of Windows 10. The server is running FreeNAS 9.10, which, I think, is based on FreeBSD 10.3 - it, too, is up to date
User avatar
Site Admin
Posts: 7056
Joined: 9 Dec 2007

Zenju

For the computer where the "NAME_COLLISION (error 183)," occurs: Are the network drivers up to date as provided by the mainboard manufacturer?
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

I will try and check for you. When you say "mainboard", do you mean "motherboard" because I don't think the driver is motherboard related. And, to be clear, when you say "network driver" do you mean the driver associated with the (wireless) network card?

But ... as I have said: I have received the NAME_COLLISION (error 183) on EVERY machine at some stage. What you are implying is that every network driver is at fault. There does not seem to be any pattern to the errors - sometimes the request works and sometimes it doesn't. And there does not seem to be any pattern about which machine receives the error - sometimes no machine gets an error and sometimes multiple machines receive an error.
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

404 error :-( It is now available !??

FYI: Samba on FreeNAS is 4.5.11
Posts: 13
Joined: 19 Jan 2018

FredericaNurk

@Zenju: I only had time today to try the first of the SMB modifications (the DirectoryCacheLifetime value) sugested in the link above. It seemed to make things worse although that is not necessarily a true reflection of the situation because, once it is a little confused, it can stay that way for quite some time. Unfortunately, I am about to undertake some travel and so will not be able to test any new scenarios for a while. Thanks a lot for looking at the issue with/for me - it is obviously quite difficult to diagnose. I will probably resurrect this problem ticket when I return - unless you've managed to solve it in my absence :-)