Detecting [hash?] change when size of file stays the same

Get help for specific problems
Posts: 76
Joined: 8 Jun 2023

Synchronizator

In FFS on my left side I have paths to folders with data that I work on almost every day, while on the right are my mirror backups of them. Recently I got solved the issue of halted automated updating of my backup files due to changes of their size when their meta data is changed- like in case of audio files and their tags [viewtopic.php?t=10422]


But what if the size of e.g. FLAC file is not affected, because only one of its tag fields was adjusted and it was such small meta data change that it did not implicate a difference in amount of bytes a file takes up? A change, based on a size of file, will not be detected by RTS as it is not occurring; and thus I would have to have also such file's date / time affected in order for my backup version of it to be updated, right?

Unfortunately I must not be affecting timestamps of my audio files on the account of having just adjusted their tags, because my intricate tagging and audio editing system does not allow for such behavior [as dates of files bear for me important additional info]


So how do I tackle this conundrum?

Is it perhaps possible for RTS to detect changes of hash values and act also upon detecting them?
User avatar
Posts: 2946
Joined: 22 Aug 2012

Plerry

If neither the file timestamp nor its size will change, comparing by file Time+Size will not work for you.

Instead, you can try comparing by Content in the FFS Compare settings (F6).
However, I am not sure if the meta-data is included in the comparison by content.
You can try it yourself, or perhaps Zenju can tell.

But note that, even if the comparison by Content includes the meta-data, comparing by Content is obviously much slower than comparing by Time+Size.
Posts: 76
Joined: 8 Jun 2023

Synchronizator

I have tested out the

Actions > Comparison settings > Select a variant > File content

option. And unfortunately it seems that for me it is just unusable


First I had to use re-synchronize using FFS: it took me ~5 hour. Now RTS takes ~2 hours to run through all my backups I need to have constantly updated. And on top of that it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute

So even if I would clean up my already clean data, rethink my modus operandi and prioritize some data over majority of it - then maybe I could cut down my ~500 000 files weighting 1.75 TB to something like ~125 000 most important files weighting 0.5 TB. This would probably replace 2 hours with 0.5 hour work. And if in foreseeable future my machine would be 50% faster then that would take 15 minutes. So this would still be a pseudo real time backup. And my current setup already is quite new / fast, with only the fastest SSD NVMe and SATA drives I was able to find 1.5 year ago

As for those popping sounds: they do not go away when I pump up in Task Manager the priority of video process and lower that of FFS. And I already had painstankingly experienced this issue a long time on a different machine, whenever I was using uTorrent - and despite back then trying out many tricks and installing drivers and updates there was nothing I could do to make them go away except pausing downloads or closing uTorrent. [This is a niche but nasty issue that can render even a thousands dollars worth home recording studio unusable after user spending an insane amount of time and energy on trying to find its root cause]


And so: is there another way of doing this with FFS / RTS?

If I were to use comparison of hash values then would not it be much faster? But then again: I would have my drives filled up with additional files storing those hashes, which would also be backed up unless filtered out? But maybe I could only compare hash values for whole folders - thus program would look into its content on a file-by-file basis only if a mismatch would be detected of their whole container? But such program would have to start with the deepest sub-sub-folders in order to not waste time on checking everything such main folder has, right?
Posts: 1202
Joined: 8 May 2006

therube

it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute
You're saying this happens while a sync is ongoing?
if in foreseeable future my machine would be 50% fastert then that would take 15 minutes
I wouldn't bet on that, necessarily.
priority of video process
So what, you're playing videos or you're doing something like encoding videos?

Lowest common denominator will likely determine outcome (which might be LAN throughput).
maybe I could only compare hash values for whole folders
That would then flag "the entire folder" when a single file has changed.
Posts: 1202
Joined: 8 May 2006

therube

There are duplicate file finders that... that what... can check for "tag" diffs.
Actually (AllDup), that's not going to help because it would do a Content check on the data - ignoring tags, so it still is doing a hash...
Posts: 1202
Joined: 8 May 2006

therube

Maybe your tagging program can reset or set a file attribute, like the Archive attribute?
Such that if you (initially) attrib -a *.*, then if the tagger sets the archive attribute, you would then know that any files with the archive attribute set should be different, i.e., needing backup/update.

You could check only those hashes to be certain...
Posts: 1202
Joined: 8 May 2006

therube

Duplicate Cleaner (pay) has an option to ignore content & check tags. (I don't know how effective that might or might not be, I've not used it?)

It looks like you tell it the tags you want to check; Artist, Title, Album... & then whether it should be an exact or similar match...

Image
Posts: 76
Joined: 8 Jun 2023

Synchronizator

it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute
You're saying this happens while a sync is ongoing? therube, 26 Jun 2023, 16:33
Yes- if it is done by comparison of content

It is useless then to try to watch a video in MPC-HC or listen to audio in Winamp
[...]
Lowest common denominator will likely determine outcome (which might be LAN throughput). therube, 26 Jun 2023, 16:33
I do not use any kind of networking for playback
maybe I could only compare hash values for whole folders
That would then flag "the entire folder" when a single file has changed. therube, 26 Jun 2023, 16:33
But that is the point

Instead of always comparing content of 500 files in my e.g. Hans Zimmer folder it could just compare hash value of this folder - and only upon detecting of mismatch perform 500 content comparisons. 99.99% of the time every folder would past such initial test positively, as on average only a handful of files in all of the monitored paths would change their content since last comparison


Maybe your tagging program can reset or set a file attribute, like the Archive attribute?
[...] therube, 26 Jun 2023, 16:45
Maybe

But that would only take care of my audio files. And also
in that regard would not be good enough. Because info available at https://www.digitalvolcano.co.uk/dcfeatures.html [in the Audio Mode section] tells me that it will not detect for example a change in a made-up by me tag field in any file format supported by this program or in any tag field in also used by me TTA files which are unsupported by this program. And what if I edit "insignificantly" a TXT or XLSM file?

For that this Duplicate Cleaner [even in free version] does support hash checking - so I think I should just test it out. But then again right of the bat: would not hash checking of even only of 100 000 files take a multitude of acceptable by me [lets say] 5 minutes? Thus I would need a method / tool to check hashes of folders first - and only then move on if necessary to further checking [file-by-file]?
Posts: 76
Joined: 8 Jun 2023

Synchronizator

Coming out from this part of discussion on another forum https://www.voidtools.com/forum/viewtopic.php?p=58638#p58638, it seems that to overcome this conundrum firstly the FFS would have to be able to reset the Archive attribute - hence I wanted to make a separate topic about it, but I found this already existing one: viewtopic.php?t=1509
Posts: 76
Joined: 8 Jun 2023

Synchronizator

How bout using a database file?

Would it take notice of hash change between the original and mirrored / backed file, thus be able to secure even tiny changes to tag data of files?
Posts: 76
Joined: 8 Jun 2023

Synchronizator

Well?
User avatar
Posts: 4866
Joined: 11 Jun 2019

xCSxXenon

You should use a scheduled task and compare by file content
Posts: 76
Joined: 8 Jun 2023

Synchronizator

But in order to detect a possible differences between two sets of lets say 1 TB of data I would need to perform 2 processes of scanning- each in size of 1 TB of data, right?

Thus sometimes dozens of minutes would pass between any kind of change could / would be noticed?
Posts: 76
Joined: 8 Jun 2023

Synchronizator

You should use a scheduled task and compare by file content xCSxXenon, 14 Jan 2024, 15:14

It might be possible in the future for FSS to be a true alternative to RAID, NAS and Storage Spaces. But until FSS adds ability to compare hash values, its option of

Synchronization Settings > Comparison (F6) > Select a variant: > File content

seems to be useless [as my renewed extensive tests has shown] - because it only slows down the comparison process and it gives the same end results as the

Synchronization Settings > Comparison (F6) > Select a variant: > File time and size

method

And so, only after this program's major update will it [hopefully] be able to stop creating pseudo-mirrors and thus users would be able to use such set-up:
● First backup drive ["B"] with instant updates by using >>File content<< - it copies intentionally made changes to tag fields on main drive ["A"] which edits might not always change number of bytes and / or datastamps. Damaged files from source will overwrite already existing good versions [on "B"]
● Second backup drive ["C"] with delayed automatic comparison by >>File time and size<< only - bit rotted or silently corrupted files from main drive ["A"] will not overwrite automatically their backup versions. It would need a manual or scheduled Job to verify mirrored items with compare-by-content for the purpose of flagging mismatches - so that it would present a report to user and thus inform of need for manual investigation


And on top of that there is the issue of FSS not being able to copy certain locked items e.g. from paths

AppData\Local\BraveSoftware\Brave-Browser\User Data\Default\Cache\Cache_Data\
AppData\Local\Google\Chrome\User Data\Default\
AppData\Local\Microsoft\WindowsApps\

because the free version does not use VSS. So currently it will throw errors if such items [or rather whole] paths are not excluded. But if those problematic paths [containing whatever items] had been already created [with whichever method] in the destination area [like a drive intended to be a mirror], then those mirrored paths [and whatever is in them] will not be deleted. So unless removed outside of FSS environment, that content will remain there forever because it is not being monitored - and thus this is one of two reasons that FSS creates pseudo-mirrors [the other being unable to compare hash values]
Posts: 1202
Joined: 8 May 2006

therube

> because the free version does not use VSS

Pretty sure that is incorrect.
Though you do need to start FFS "as Administrator" for it to be effective.
(And I'd assume that RTS likewise would need the same.)
Posts: 76
Joined: 8 Jun 2023

Synchronizator

I have selected the iption of

FreeFileSync > Options > Copy locked files

a long time ago. I have also added this REG hack to my system some time ago
Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers]

;
; Permanent elevation of FreeFileSync is needed in order to avoid the disruptive pop-up blockades that will say
;
; >>
; ERROR_ELEVATION_REQUIRED: The requested operation requires elevation. [AdjustTokenPrivileges(SeSecurityPrivilege)]
; <<
;
; which errors with prompted questions may appear after pressing of the Synchronize button
;

"C:\\Program Files\\FreeFileSync\\FreeFileSync.exe"="RUNASADMIN"
;
; Permanent elevation of FreeFileSync is needed in order to avoid the disruptive pop-up blockades that will say
;
; >>
; ERROR_ELEVATION_REQUIRED: The requested operation requires elevation. [AdjustTokenPrivileges(SeSecurityPrivilege)]
; <<
;
; when an LNK file leading to FFS_REAL file is put in the system folder of
;
; C:\Users\YOUR-USER-NAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
;
; which errors with prompted questions may appear after execution of monitoring process [Job] - and which error does not show up when the very same FFS_REAL file is executed manually even from that problematic StartUp system folder
;

"C:\\Program Files\\FreeFileSync\\RealTimeSync.exe"="RUNASADMIN"

And have just rechecked for the n-th time in Task Manager if my RTS is indeed running with elevated rights- and it does as always

But despite this I still get access blockades that look like Volume Shadow Copy Service is causing them