Project

General

Profile

Actions

Bug #14616

open

dos2unix(1) breaks BOM text files, pkgsrc version is fine

Added by Christoph Binner 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cmd - userland programs
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Sample file attached.

$ /usr/bin/dos2unix sample.txt
���
%
$ cat sample.txt | /opt/local/bin/dos2unix
αβγδ

also file(1) doesn’t recognise it:

$ /usr/bin/file sample.txt
sample.txt: data
$ /opt/local/bin/file sample.txt
sample.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators


Files

sample.txt (13 Bytes) sample.txt αβγδ Christoph Binner, 2022-04-04 08:32 AM
Actions #1

Updated by Toomas Soome 3 months ago

Christoph Binner wrote:

Sample file attached.

$ /usr/bin/dos2unix sample.txt
���
%
$ cat sample.txt | /opt/local/bin/dos2unix
αβγδ

also file(1) doesn’t recognise it:

$ /usr/bin/file sample.txt
sample.txt: data
$ /opt/local/bin/file sample.txt
sample.txt: Unicode text, UTF-8 (with BOM) text, with CRLF line terminators

dos2unix does not translate to UTF-8, it does translate to 8-bit table based on what keybard mapping is used (to detect language specific code page). And from another hand alternate dos2unix does not translate CRLF, so, you can just use iconv.

Probably good idea to switch our dos2unix to produce UTF-8.

Actions

Also available in: Atom PDF