Detecting [hash?] change when size of file stays the same

Posts: 65 · Synchronizator 18 Jun 2023, 12:44

In FFS on my left side I have paths to folders with data that I work on almost every day, while on the right are my mirror backups of them. Recently I got solved the issue of halted automated updating of my backup files due to changes of their size when their meta data is changed- like in case of audio files and their tags [viewtopic.php?t=10422]

But what if the size of e.g. FLAC file is not affected, because only one of its tag fields was adjusted and it was such small meta data change that it did not implicate a difference in amount of bytes a file takes up? A change, based on a size of file, will not be detected by RTS as it is not occurring; and thus I would have to have also such file's date / time affected in order for my backup version of it to be updated, right?

Unfortunately I must not be affecting timestamps of my audio files on the account of having just adjusted their tags, because my intricate tagging and audio editing system does not allow for such behavior [as dates of files bear for me important additional info]

So how do I tackle this conundrum?

Is it perhaps possible for RTS to detect changes of hash values and act also upon detecting them?

Posts: 2615 · Plerry 18 Jun 2023, 15:10

If neither the file timestamp nor its size will change, comparing by file Time+Size will not work for you.

Instead, you can try comparing by Content in the FFS Compare settings (F6).
However, I am not sure if the meta-data is included in the comparison by content.
You can try it yourself, or perhaps Zenju can tell.

But note that, even if the comparison by Content includes the meta-data, comparing by Content is obviously much slower than comparing by Time+Size.

Posts: 65 · Synchronizator 25 Jun 2023, 12:29

I have tested out the

Actions > Comparison settings > Select a variant > File content

option. And unfortunately it seems that for me it is just unusable

First I had to use re-synchronize using FFS: it took me ~5 hour. Now RTS takes ~2 hours to run through all my backups I need to have constantly updated. And on top of that it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute

So even if I would clean up my already clean data, rethink my modus operandi and prioritize some data over majority of it - then maybe I could cut down my ~500 000 files weighting 1.75 TB to something like ~125 000 most important files weighting 0.5 TB. This would probably replace 2 hours with 0.5 hour work. And if in foreseeable future my machine would be 50% faster then that would take 15 minutes. So this would still be a pseudo real time backup. And my current setup already is quite new / fast, with only the fastest SSD NVMe and SATA drives I was able to find 1.5 year ago

As for those popping sounds: they do not go away when I pump up in Task Manager the priority of video process and lower that of FFS. And I already had painstankingly experienced this issue a long time on a different machine, whenever I was using uTorrent - and despite back then trying out many tricks and installing drivers and updates there was nothing I could do to make them go away except pausing downloads or closing uTorrent. [This is a niche but nasty issue that can render even a thousands dollars worth home recording studio unusable after user spending an insane amount of time and energy on trying to find its root cause]

And so: is there another way of doing this with FFS / RTS?

If I were to use comparison of hash values then would not it be much faster? But then again: I would have my drives filled up with additional files storing those hashes, which would also be backed up unless filtered out? But maybe I could only compare hash values for whole folders - thus program would look into its content on a file-by-file basis only if a mismatch would be detected of their whole container? But such program would have to start with the deepest sub-sub-folders in order to not waste time on checking everything such main folder has, right?

Posts: 1101 · therube 26 Jun 2023, 16:33

it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute

You're saying this happens while a sync is ongoing?

if in foreseeable future my machine would be 50% fastert then that would take 15 minutes

I wouldn't bet on that, necessarily.

priority of video process

So what, you're playing videos or you're doing something like encoding videos?

Lowest common denominator will likely determine outcome (which might be LAN throughput).

maybe I could only compare hash values for whole folders

That would then flag "the entire folder" when a single file has changed.

Posts: 1101 · therube 26 Jun 2023, 16:38

There are duplicate file finders that... that what... can check for "tag" diffs.

Actually (AllDup), that's not going to help because it would do a Content check on the data - ignoring tags, so it still is doing a hash...

Posts: 1101 · therube 26 Jun 2023, 16:45

Maybe your tagging program can reset or set a file attribute, like the Archive attribute?
Such that if you (initially) attrib -a *.*, then if the tagger sets the archive attribute, you would then know that any files with the archive attribute set should be different, i.e., needing backup/update.

You could check only those hashes to be certain...

Posts: 1101 · therube 26 Jun 2023, 16:51

Duplicate Cleaner (pay) has an option to ignore content & check tags. (I don't know how effective that might or might not be, I've not used it?)

It looks like you tell it the tags you want to check; Artist, Title, Album... & then whether it should be an exact or similar match...

Posts: 65 · Synchronizator 27 Jun 2023, 08:19

it destroys the audio when videos are played by constantly adding popping sounds, dozens of times per minute
You're saying this happens while a sync is ongoing? therube, 26 Jun 2023, 16:33

Yes- if it is done by comparison of content

It is useless then to try to watch a video in MPC-HC or listen to audio in Winamp

[...]
Lowest common denominator will likely determine outcome (which might be LAN throughput). therube, 26 Jun 2023, 16:33

I do not use any kind of networking for playback

maybe I could only compare hash values for whole folders
That would then flag "the entire folder" when a single file has changed. therube, 26 Jun 2023, 16:33

But that is the point

Instead of always comparing content of 500 files in my e.g. Hans Zimmer folder it could just compare hash value of this folder - and only upon detecting of mismatch perform 500 content comparisons. 99.99% of the time every folder would past such initial test positively, as on average only a handful of files in all of the monitored paths would change their content since last comparison

Maybe your tagging program can reset or set a file attribute, like the Archive attribute?
[...] therube, 26 Jun 2023, 16:45

Maybe

But that would only take care of my audio files. And also

Duplicate Cleaner (pay)
[...] therube, 26 Jun 2023, 16:51

in that regard would not be good enough. Because info available at https://www.digitalvolcano.co.uk/dcfeatures.html [in the Audio Mode section] tells me that it will not detect for example a change in a made-up by me tag field in any file format supported by this program or in any tag field in also used by me TTA files which are unsupported by this program. And what if I edit "insignificantly" a TXT or XLSM file?

For that this Duplicate Cleaner [even in free version] does support hash checking - so I think I should just test it out. But then again right of the bat: would not hash checking of even only of 100 000 files take a multitude of acceptable by me [lets say] 5 minutes? Thus I would need a method / tool to check hashes of folders first - and only then move on if necessary to further checking [file-by-file]?

Posts: 65 · Synchronizator 15 Jul 2023, 12:37

Coming out from this part of discussion on another forum https://www.voidtools.com/forum/viewtopic.php?p=58638#p58638, it seems that to overcome this conundrum firstly the FFS would have to be able to reset the Archive attribute - hence I wanted to make a separate topic about it, but I found this already existing one: viewtopic.php?t=1509

Posts: 65 · Synchronizator 25 Aug 2023, 18:04

How bout using a database file?

Would it take notice of hash change between the original and mirrored / backed file, thus be able to secure even tiny changes to tag data of files?

Posts: 65 · Synchronizator 13 Jan 2024, 07:35

Well?

Posts: 4296 · xCSxXenon 14 Jan 2024, 15:14

You should use a scheduled task and compare by file content

Posts: 65 · Synchronizator 25 Jan 2024, 14:53

But in order to detect a possible differences between two sets of lets say 1 TB of data I would need to perform 2 processes of scanning- each in size of 1 TB of data, right?

Thus sometimes dozens of minutes would pass between any kind of change could / would be noticed?