Non-identical characters are considered identical by FFS

Get help for specific problems
Posts: 6
Joined: 22 Aug 2023

guarster

I have a file named "Ημερολόγιο.mp4" (Unicode: \u0397\u03bc\u03b5\u03c1\u03bf\u03bb\u03bf\u0301\u03b3\u03b9\u03bf\u002e\u006d\u0070\u0034). I have renamed it to "Ημερολόγιο.mp4" (Unicode: \u0397\u03bc\u03b5\u03c1\u03bf\u03bb\u03cc\u03b3\u03b9\u03bf\u002e\u006d\u0070\u0034).
But FFS considers the names to be identical and it doesn't synchronize the file on the backup.

I have noticed that on Windows (and here too btw) they look exactly the same but if I copy the filenames in a text editor the stress is above a different letter each time.
Στιγμιότυπο οθόνης 2023-08-22 145708.jpg
Στιγμιότυπο οθόνης 2023-08-22 145708.jpg (4.39 KiB) Viewed 2262 times
Στιγμιότυπο οθόνης 2023-08-22 145728.jpg
Στιγμιότυπο οθόνης 2023-08-22 145728.jpg (3.98 KiB) Viewed 2262 times
I have many files like this, and my video player throws an error if I use the first name, so I try to find them and fix them but my problem is that the changes I make don't sync with the backups.

I think the problem is with the character \u0301.

I'm not sure if this is considered a bug or if it works like this by design since they look the same on Windows, but I believe they should be considered different since they don't behave the same way.
User avatar
Site Admin
Posts: 7505
Joined: 9 Dec 2007

Zenju

The character "ό" has two different representations when encoded via Unicode, both of which should be considered equivalent by applications: https://en.wikipedia.org/wiki/Unicode_equivalence

Precomposed form: \u03cc
Decomposed form: \u03bf\u0301

but if I copy the filenames in a text editor the stress is above a different letter each time. guarster, 22 Aug 2023, 12:20
That's a bug in the text editor.

my video player throws an error if I use the first name guarster, 22 Aug 2023, 12:20
This is also a bug in the video player. There is no reason it should fail with one Unicode encoding form, but not the other. Most likely the player is messing with the Unicode Normalization forms, which it really shouldn't.
Posts: 1202
Joined: 8 May 2006

therube

(Not that I know about such things...)

The accented lowercase y (if you will) does not appear to be a valid character?
(Accented uppercase Y is OK.)
That "o", with the accent (the accent belongs to the "o") seemingly is \u0301.
https://en.wikipedia.org/wiki/Greek_diacritics
Ημερολόγιο ὁ Ύ ◌́ γ.m4a
(No issues here with the above file name in my players; mpui/mplayer & mpv.net.)
Posts: 6
Joined: 22 Aug 2023

guarster

The character "ό" has two different representations when encoded via Unicode, both of which should be considered equivalent by applications: https://en.wikipedia.org/wiki/Unicode_equivalenceZenju, 22 Aug 2023, 15:38
I was not aware that such a thing as Unicode equivalence existed. Thank you. I have noticed though that two other programs I use and love, Everything and WinMerge, don't consider the two names to be equivalent. Windows, too, since they both can exist in the same directory.

Also, personally, I would find it more useful if such cases were treated as non-equivalent by FFS, since from a practical point of view there are cases where the difference matters, even if it's caused by other apps' bugs. Usually, a change like this by a user wouldn't have been made without a reason. I understand that the issue may be more complicated than my own specific use case, but I humbly suggest that this particular design decision be reconsidered.

Finally, I want to add in which cases there is a problem with the name, in case it helps in any way. I have set Windows language to Greek, so for whatever reason there are some applications (BS.Player and TrID come to mind) that don't work if the file name contains any characters that don't belong to the Windows-1253 character set (aka Greek ANSI). Those are the applications that have a problem with these file names (since the character \u0301 doesn't belong to Greek ANSI I guess).
Posts: 1202
Joined: 8 May 2006

therube

Everything and WinMerge, don't consider the two names to be equivalent
In Everything, is there a difference when (Search Match) Diacritics is enabled, or not?
Posts: 6
Joined: 22 Aug 2023

guarster

In Everything, is there a difference when (Search Match) Diacritics is enabled, or not? therube, 22 Aug 2023, 17:56
They are considered equivalent when Match Diacritics is disabled, and non-equivalent when it is enabled.
Posts: 1202
Joined: 8 May 2006

therube

Then I take it Everything is working correctly in that regard.
User avatar
Site Admin
Posts: 7505
Joined: 9 Dec 2007

Zenju

It's not generally possible to synchronize differences in Unicode normalization forms, e.g. if the target is a macOS-hosted network share (which internally converts everything to decomposed), or for a Samba-hosted share (which converts everything to precomposed).
Also, the motivation for doing so is rather weak, which is working around bugs in other applications, that are not fully Unicode compatible.
Posts: 6
Joined: 22 Aug 2023

guarster

I see. Thank you for your explanation. I'll handle these specific cases manually whenever they arise.