Bug #7587

Different SMB and ZFS behavior on emoji symbols

Added by Dmitry Glushenok over 1 year ago. Updated about 1 year ago.

Status:NewStart date:2016-11-16
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:cifs - CIFS server and client
Target version:-
Difficulty:Medium Tags:needs-triage

Description

ZFS created with "-o utf8only=on -o normalization=formD -o casesensitivity=mixed -o nbmand=on" and shared using "sharesmb=on".

While trying to create a file with emoji in the name from Windows client - the error appears (The file name you specified is not valid or too long).
But when trying to create the same file on ZFS - it gets created. But then SMB share stops displaying some files, which were existing on the share before. For example:

# touch test1
# touch test2
# touch test3
# touch test4
# touch test5
# touch test3🔥

Stops displaying test3, test3🔥 and test4, but still shows test1, test2 and test5 (all via SMB, ZFS shows everything).

When utf8only and normalization is not used - SMB does not forbids from creating files with emoji in name, but still does not displays them. Moreover - in this case it looks like the name is somehow normalized, which prevents it from being displayed using terminal.
For example, the following "test3🔥" files were created on SMB share (18:40) and using touch on ZFS (18:37):

# /usr/bin/ls -l
<cut> Nov 16 18:40 test3??????.txt
<cut> Nov 16 18:37 test3🔥
#
# GNU-ls -l
<cut> Nov 16 18:40 'test3'$'\355\240\275\355\264\245'
<cut> Nov 16 18:37 'test3'$'\360\237\224\245'
#

Is it possible that unicode tables used by SMB are outdated and needs to be actualized?

All tests on omnios-r151018.

7587.c Magnifier - test case for u8_* functions (593 Bytes) Yuri Pankov, 2017-03-03 03:03 PM

History

#1 Updated by Yuri Pankov over 1 year ago

That's pretty strange, actually. SMB uses the same conversion routines as ZFS - u8_textprep_str(), u8_validate(), so its data shouldn't be an issue.

More so, I've done a bit of testing, and both functions treat the string with emoji as valid.

And yes, I'm able to reproduce this, with utf8only=yes and normalization=formD, though my results on windows side is even more embarassing, I don't see any files at all:

H:\>dir
 Volume in drive H is data
 Volume Serial Number is 9E0D-9C2F

 Directory of H:\

03/03/2017  05:51 PM    <DIR>          .
03/03/2017  05:51 PM    <DIR>          ..
File Not Found

however:

H:\>dir test1 test2 test3 test4 test5
 Volume in drive H is data
 Volume Serial Number is 9E0D-9C2F

 Directory of H:\

03/03/2017  05:51 PM                 0 test1

 Directory of H:\

03/03/2017  05:51 PM                 0 test2

 Directory of H:\

03/03/2017  05:51 PM                 0 test3

 Directory of H:\

03/03/2017  05:51 PM                 0 test4

 Directory of H:\

03/03/2017  05:51 PM                 0 test5
               5 File(s)              0 bytes
               0 Dir(s)  2,404,231,413,760 bytes free

#2 Updated by Yuri Pankov about 1 year ago

Just rechecked this after updating both ctype definitions (#4006) and locale data (#8170), and the problem is still there.

Also available in: Atom