Windows 7 x64 File Copy Corruption!

Discuss new features and functions
dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 18 Mar 2016, 14:48

Ok, this is an odd bug that I can't believe I only discovered last night. Apparently, Windows 7 x64 has a major bug that effects many (but not all) computers, when copying to an external USB connected drive.

https://social.technet.microsoft.com/Forums/windows/en-US/13a7426e-1a5d-41b0-9e16-19437697f62b/windows-7-64bit-corrupting-altering-large-files-copied-to-external-ntfs-drives?forum=w7itproperf

There are literally dozens of these threads across the internet. I realized last night when an MD5 checksum didn't match a file from the original source. I went back and started checking other files, and sure enough I have a whole bunch of random ones that don't match. I thought it was hardware related, but changing to a different drive and different cable didn't fix it. Some googling revealed this is actually an OS bug right under my nose that I wasn't aware of for years! I wonder how many people have this problem and just never noticed?

The solution for many people who have noticed, is to avoid using Windows' copy routine, and use something like Teracopy instead. Teracopy actually has its own MD5 verification thing, but even without verifying the files, the copies appear to not exhibit the corruption that occurs with Windows file copy.
I use freefilesync to incrementally back up my working files daily to folder on an external drive. But as I understand it, FFS relies on the Windows copy function? So I've just realized that not only am I corrupting large files by copying them from the source to my external working drive, backing them up with FFS to another drive might have been corrupting them even further!

All I can say is that I'm happy I've never had to restore from my backup folder yet. But this leaves me in an odd conundrum. I'm stuck with 7 x64 for now because of some legacy software that might not play well with W10. There is no verified "cure" for this copy error. I can use TeraCopy to do initial file copies, but I really want to use FFS to do my daily backups. But if I can't trust Windows' own file copy procedure, how can I trust FFS?

User avatar
Zenju
Site Admin
Posts: 4397
Joined: 9 Dec 2007

Post by Zenju • 19 Mar 2016, 17:40

Actually there is more than "one" Windows copy routine. FFS uses CopyFileEx, Windows Explorer uses an implementation based on COM streams... The fact that copying seems to work with TeraCopy but not with Windows Explorer is not a proof that there is a bug in Windows Explorer or even that there is some fundamental OS bug. More likely it's just a driver-level bug of the USB device that just doesn't manifest under all circumstances.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 20 Mar 2016, 01:36

Zenju- thanks for the reply. Did you get a moment to look at that thread above? There are literally dozens of others that try to troubleshoot it and all end up not being able to verify a fix. As far being a windows explorer bug- they have confirmed it happens using the command line copy and even xcopy.
They only confirmed that applications which don't rely on Windows copy routine are immune to it. I have far too much data to manually md5 checksum verify, but I'm going to need to look into this further.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 20 Mar 2016, 02:25

Just to clarify, you're saying if this bug happens in command line xcopy and windows explorer copy, it *wouldn't* in FFS? I saw mention of some rudimentary verification in FFS that can be enabled via xml config? I might have to do that, however I understand it isn't MD5 or anything like that. As you can imagine, I'm very nervous my Terabytes of backup files are corrupt.

User avatar
Zenju
Site Admin
Posts: 4397
Joined: 9 Dec 2007

Post by Zenju • 20 Mar 2016, 11:33

I think you're trying to solve the wrong problem. Theoretically silent errors can happen at a lot of places, e.g. RAM could become corrupted during file copy, hard drives can return corrupt data... However all these cases of silent data corruption are generally extremely unlikely, probably too unlikely to even bother with. If this is still too much a risk, then there is only one solution, which is to associate the files with checksums (+ don't forget to verify the files again *after* the checksum was generated to see if the checksum corresponds to valid data!). The simplest workable solution is to put the files into archvies like zip, rar.

PS: Yes, FreeFileSync has the option to VerifyCopiedFiles after copy which might add value in some cases.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 20 Mar 2016, 19:53

Zenju, thanks again for the response.
I'm afraid you may be downplaying the specific problem, however. I'm aware that there are many places data can be corrupted, but I'm not talking about random data corruption. This is specific to Windows 7 x64 and USB connected drives, and can easily be reproduced on affected systems. When copying large files (4GB and up), there seems to be a 20-30% chance of data corruption! And error rate that high is not random. That's a pretty prominent bug. Did you get a moment to look at the Technet thread linked above? There are literally dozens of similar threads that I've found, if not hundreds more documenting the problem that I didn't even get around to reading yet.

This is a documented bug that seems to occur specifically in the following scenario:
- Certain USB controllers (although they can't seem to figure out why it effects some and not others with the same chipset)
- Windows 7 X64 (confirmed that 32-bit doesn't have it)
- Drive being written to formatted with NTFS (although it may be because FAT32 forces smaller file sizes and therefore requires less overhead in the copy routine- I can't confirm whether or not exFAT has it or not).

In the examples given on various forums, those are the only common factors. RAM was ruled out (systematically tested each stick individually, ran overnight tests on RAM, swapped for other ram on a working system without the bug, etc), the drive itself was ruled out (happens consistently on any USB connected NTFS drive, old/new, etc. I ran CHKDSK on all of my drives first because I assumed physical damage). Some said disabling the virus scan fixed it, while others claimed it didn't, same with changing USB cache and disconnect options (safe removal vs high performance, etc). I'm guessing its a combination of hardware and software, and only rears its head in specific scenarios, on that specific OS.
The worst part is that I think many, many people don't even realize it is happening, as Windows does not report any errors, files are the correct size. The only way to tell something went wrong is using a checksum to verify (or in my case, when editing a video and realizing frames are missing). But knowing that it will happen to one out of 5 files over a certain size is terrifying!

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 22 Mar 2016, 15:28

To be clear- I'm witnessing 1 out of every 5 or 6 files transferred being corrupted (mismatched MD5). This only appears to happen when transferring files 4GB+ in size, and when destination is NTFS connected via USB interface. This is not me being afraid of something that *might* happen. This *IS* happening, and according to threads like the one above, it's happening to quite an amount of people, and likely a lot more who aren't aware of it (since, like you, I also blindly trusted Windows OS copy schemes, because why would I have any reason to doubt it?). I only became aware of this when I realized my files were corrupted after the fact, and after trying everything else to determine what happened now see it has been happening for years under my nose. I just copied a directory of about 6 hours worth of unedited video (75GB). Verified with checksum generator, and sure enough one of the 7 mov files has a mismatched MD5.

In the mean time, Teracopy and others with their own routines for copying files seem the only ones to reliably not have this very specific corruption bug. This is all very frustrating, as you can imagine. Do you still think I'm trying to solve the wrong problem?

User avatar
Zenju
Site Admin
Posts: 4397
Joined: 9 Dec 2007

Post by Zenju • 23 Mar 2016, 18:06

When copying large files (4GB and up), there seems to be a 20-30% chance of data corruption
Alright, you're facing a real issue then, not some "quantum fluctuation", that can never be outruled in theory.
Do you still think I'm trying to solve the wrong problem?
Yes, I don't think this bug is on Microsoft's side. Windows does in fact error checking but this can only be as good as the underlying drivers. I suspect the symptoms you are seeing are hardware or driver related. In any case, even if Teracopy seems to be able to copy the files without errors I wouldn't trust a system with a 20% chance of data corruption to do any critical task until the root of the problem is found.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 27 Mar 2016, 20:42

Zenju wrote: Alright, you're facing a real issue then, not some "quantum fluctuation", that can never be outruled in theory.
...
Yes, I don't think this bug is on Microsoft's side. Windows does in fact error checking but this can only be as good as the underlying drivers. I suspect the symptoms you are seeing are hardware or driver related. In any case, even if Teracopy seems to be able to copy the files without errors I wouldn't trust a system with a 20% chance of data corruption to do any critical task until the root of the problem is found.
Agreed. The reason I linked to that technet thread above was to try and give some background. This is a real problem (not just quantum fluctuation), and it isn't limited to just me, that's what I've been trying to get at.
I mentioned this to some of my more tech savvy friends in IT and software engineering, and they also dismissed it as something wrong with my setup at first. That is, until they started testing it on their machines and found that SOME OF THEIR COMPUTERS HAD IT AS WELL FOR NO APPARENT REASON. And it was always the same factors: Windows 7 x64, drive formatted NTSC and connected via USB. The fact there there are threads of people having the same issue, and no conclusion other to blame the OS, is at least somewhat comforting: at least I know it isn't just me!

I guess one solution would be to upgrade to Win10, however I have serious reservations about that due to legacy software I use for work. Meanwhile, I enabled verification in FFS and hope that will at least alert me if/when something goes wrong now. I tried some of the other suggestions in that thread (change removal policy, etc) and I'm hoping for the best. Just to be clear- verification set to ON will verify that the final copied file has the same checksum as the original? What is it using, if I may ask? CRC? Seems too fast to be MD5. I'm under the impression that MD5 would probably be overkill for something like this, however. What do you think?

Oh, and thanks for taking the time to respond, I really do appreciate that!

User avatar
Zenju
Site Admin
Posts: 4397
Joined: 9 Dec 2007

Post by Zenju • 28 Mar 2016, 09:32

dishe wrote: I guess one solution would be to upgrade to Win10
I would first update all mainboard drivers, especially the USB-related ones, then do a few hardware checks e.g. checking ram, different storage media. If this issue was caused by a Windows bug, it probably would affect way more people than it seems to affect currently.
dishe wrote: Just to be clear- verification set to ON will verify that the final copied file has the same checksum as the original?
It's a bit-wise comparison, so this implies the same checksum. Again, I wouldn't rely on the checksum to keep the data safe, but first find out what's wrong with the overall system.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 28 Mar 2016, 23:41

Zenju wrote: I would first update all mainboard drivers, especially the USB-related ones, then do a few hardware checks e.g. checking ram, different storage media. If this issue was caused by a Windows bug, it probably would affect way more people than it seems to affect currently.
That's what I'm saying- I think its effecting more people than realize! I found it when searching for a specific solution to why some of my video files have missing frames occasionally, and stumbled upon many people in many forums describing this problem. In some of those forums, people tried to help them troubleshoot what was causing it only to realize it was happening TO THEM as well, and for who knows how long! It seems to only pop up under very specific circumstances and most people wouldn't be aware of it being caused by a larger bug. Even some of my IT friends tried it and were able to reproduce it as well, much to their surprise!

I don't mean to sound like an alarmist or conspiracy theorist, but I think this may be a rather serious bug that has somehow been overlooked and then swept under the rug at MS!
The good news is that I've managed to keep it relatively under control by changing the removal policy of the USB device. This changes the write cacheing, as I understand it, which is likely where the root of the bug is hiding. Teracopy doesn't use the same cacheing that windows copy does, which I'm guessing is why they don't seem to have the problem.
dishe wrote: Just to be clear- verification set to ON will verify that the final copied file has the same checksum as the original?
It's a bit-wise comparison, so this implies the same checksum. Again, I wouldn't rely on the checksum to keep the data safe, but first find out what's wrong with the overall system.
Agreed. So far, the files seem to be fine once they are copied and verified to be a match. The corruption seems to be happening during the copy process, and a checksum comparison immediately after the copy finishes reveals if there was a problem or not. What I've been doing for now is copying and then running an MD5 calculator on both sides to verify. If they match, I move on with my day. If they don't, I re-copy the offending file and repeat until they do match. If FFS's verify is doing that for me, it would be wonderful to be able to skip that extra time-consuming step.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 28 Mar 2016, 23:44

However, to be fair, since performing some of the tweaks mentioned in the threads like the one linked at the top here, the problem seems to have all but disappeared. I'm still verifying large copied files until I feel confident it won't pop up again. That's really all I'm looking for at this point- knowing the FFS is verifying for me. It would also be nice to know that FFS's procedure managed to be immune to whatever is going wrong the way that Teracopy does, but that might be too much to expect.

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 31 Mar 2016, 05:11

Zenju, will FFS alert me if the files don't pass verification? I see there's a log I can click on at the end of a sync session that will list the files copied and verifications performed. I'm not sure what an error would look like, however. Will it simply show up in the log as a process that didn't verify (and I'd need to examine each entry in the log to find it)?
Or will the program halt or alert me at the finish that it wasn't completed successfully? Will it retry the file that failed verification?

User avatar
Zenju
Site Admin
Posts: 4397
Joined: 9 Dec 2007

Post by Zenju • 31 Mar 2016, 11:13

Verfication failure will be reported as an error (and the presumely corrupt target file will be deleted).

dishe
Posts: 10
Joined: 18 Mar 2016

Post by dishe • 03 Apr 2016, 17:13

Will the error only show up in the logs, or will it be immediately obvious on screen when it happens?