Comparing only parts of files

Discuss new features and functions
Posts: 5
Joined: 27 Oct 2021

amatala

Hi,

When comparing file contents, would it be possible to have a configuration option to only compare the first x bytes of the files (value to be manually configured)? I am currently using ViceVersa to backup & sync my music collection comprised of hundreds of thousands of FLAC files and the differences are always in the headers (files are different when I update some TAGs but the rest of the data stream is always identical). With VIceVersa I can only compare the headers, but ViceVersa is quite buggy and unstable... FreeFileSync would be a better alternative for me but the lack of this option makes it many times slower than ViceVersa (my music collection is more than 10TB in size).

Thanks!
User avatar
Posts: 2451
Joined: 22 Aug 2012

Plerry

> files are different when I update some TAGs but the rest of the data stream is always identical

I suppose changing the tags in the header will also update the modified-date of your files.
So, why compare based on file content and not on file date (and size)?
That should make your comparison much faster.
Posts: 1038
Joined: 8 May 2006

therube

(There are duplicate file finders out there, & probably more so for audio rather then video, that can choose to ignore tags, or to only compare tags. Something like that might ? be more appropriate?

AllDup can "Ignore the meta data of FLAC files".)

I suppose changing the tags in the header will also update the modified-date of your files.
Some tagging programs have the option of maintaining the original file date/time.
Posts: 5
Joined: 27 Oct 2021

amatala

I am always preserving the original file time when mass updating tens or hundreds of files... otherwise this would trigger a massive rescan of my music library which would take many hours and also a resync of my backups... if I keep the original times unchanged, this allows me to rescan my library and trigger backups incrementally at more convenient times. But of course then I have to use a comparison by file contents. I am currently using ViceVersa to compare only the first 32K of the files which is very fast, but ViceVersa is very slow at copying files and very buggy too, it keep on crashing randomly... so I was hoping to find another more reliable software, but unfortunately ViceVersa seems to be the only backup software on Earth which allows to compare only the headers...
FreeFileSync is much faster at copying files, but much slower at comparing file contents as it always takes into account the full files, which takes ages...
Also FreeFileSync is often filling up my backup drivers by saving all updated files to the RecycleBin tmp folder which needlessly takes up TB's of space and I need to manually clean it up periodcally... This means I can't leave FreeFileSync back up my files at night because it would get stuck in a while and I would need to take manual action in the morning to unblock it...
Posts: 5
Joined: 27 Oct 2021

amatala

(There are duplicate file finders out there, & probably more so for audio rather then video, that can choose to ignore tags, or to only compare tags. Something like that might ? be more appropriate?

AllDup can "Ignore the meta data of FLAC files".)
therube, 28 Oct 2021, 16:02
But I don't want to ignore metadata... on the contrary, I want to back up my files if only metadata has changed... of course the data stream will stay the same when I edit TAGs...
Posts: 1038
Joined: 8 May 2006

therube

Duplicate Cleaner 5 (pay version) can compare tags (only).

(dropdown also has, "Ignore content (match by tags or attributes)"
Duplicate Cleaner 5 dropdown boxes not readable.png
Duplicate Cleaner 5 dropdown boxes not readable.png (17.48 KiB) Viewed 716 times
(you're able to choose particular tags you'd wish to compare)
Duplicate Cleaner Same File Size option is grayed out in Video Mode, stemmed from Regular Mode.png
Duplicate Cleaner Same File Size option is grayed out in Video Mode, stemmed from Regular Mode.png (38.21 KiB) Viewed 716 times

(I know these features exist, but haven't messed with them, cause I'm not really concerned about the particular feature.)


Oh, & different file formats, & different tag versions, can store their data at different places in a file.
Like with ID3, ID3v1, the tag is stored at the end of the file, & with ID3v2 it is stored at (towards) the beginning.
Posts: 5
Joined: 27 Oct 2021

amatala

But Duplicate Cleaner doesn't seem to be a backup tool... I want to back up all files for which TAGs are different, not find duplicates... in the description it doesn't say anything about backing up the files from one drive to another...
Posts: 1038
Joined: 8 May 2006

therube

No, they are not backup tools.
But maybe they could be used in conjunction... (or maybe not).

---

Maybe you could automate something?
(As it is, FFS is not designed to do what you're looking for.)

Use the (UNIX-like) cmp command.

So if you know the tag has to be stored in the first 4096 bytes (I have no clue), you could do something like this (pseudo-code):
cmp.exe -n 4096 --quiet   michaelgarrison_source.flac  michaelgarrison_backup.flac
if ERRORCODE > 0,  echo %filename%  >> i_need_to_back_this_up.TXT
You would run that in a loop, for each of your .flac file pairs,
& for any that fail the compare (of their first 4096 bytes),
their names are written to a file, that can then be parsed,
to copy from source to backup.

DiffUtils for Windows has cmp.exe.
(You'd need both the Binaries & Dependencies, ZIP [or the .exe installer].)
Posts: 5
Joined: 27 Oct 2021

amatala

Thanks for the suggestion... yes I could go for the custom script approach and it seems to be the only way forward. I never imagined it would be so hard to find a backup software which allows for partial comparisons... For me this could be a nice performance enhancement, because if the files are different in the first KBs, then they are different so comparison can stop there... Of course this would require the users to know their files well before enabling it, but it could even be a paying option.

I myself would gladly pay to have something like that as it would be a great time saver for me... and if ViceVersa would be more stable, I would stick with it...

Image2.png
Image2.png (30.06 KiB) Viewed 692 times