Bug #14616
opendos2unix(1) breaks BOM text files, pkgsrc version is fine
0%
Description
Sample file attached.
$ /usr/bin/dos2unix sample.txt
���
%
$ cat sample.txt | /opt/local/bin/dos2unix
αβγδ
also file(1) doesn’t recognise it:
$ /usr/bin/file sample.txt
sample.txt: data
$ /opt/local/bin/file sample.txt
sample.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
Files
Updated by Toomas Soome over 1 year ago
Christoph Binner wrote:
Sample file attached.
$ /usr/bin/dos2unix sample.txt
���
%
$ cat sample.txt | /opt/local/bin/dos2unix
αβγδ
also file(1) doesn’t recognise it:
$ /usr/bin/file sample.txt
sample.txt: data
$ /opt/local/bin/file sample.txt
sample.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
dos2unix does not translate to UTF-8, it does translate to 8-bit table based on what keybard mapping is used (to detect language specific code page). And from another hand alternate dos2unix does not translate CRLF, so, you can just use iconv.
Probably good idea to switch our dos2unix to produce UTF-8.