Project

General

Profile

Feature #8586

Systematically convert man(5) pages to use macro requests instead of \f text decorations

Added by C Fraire about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2017-08-24
Due date:
% Done:

0%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage

Description

The converted documents will have \fB..\fR or \fI..\fR in non-proscribed sections translated to .B, .I, .BR, .IR, .RB, or .RI as necessary with special handling for:

  • superfluous nesting that exists in illumos man pages (e.g., \fB\fB ... \fR\fR)
  • superceding nesting that exists in illumos man pages (e.g., \fB\fI ... \fR\fR)
  • spurious nesting that exists in illumos's man pages (e.g., \fB ... \fI ... \fR ... \fR)
  • multi-character troff sequences for symbols or punctuation that exist in illumos man pages (e.g., \(-> or \(mi )
  • punctuation characters that have very special, non-character-like meaning when escaped (e.g., \")
  • punctuation characters that have character-like meaning when escaped (e.g., \&)
  • punctuation characters that are themselves even when escaped (e.g., \?)
  • inserting a "Non-printing, zero width character" wherever necessary in the conversion (e.g., to avoid creating invalid line-starts or to avoid undesired, sentence-ending double-spacing)
  • using \c when necessary as a last resort for odd, non-spaced neighboring of three fonts that exists in illumos man pages (e.g., \fB ... \fR\fI .... \fR...)
  • shifting odd, word-ending punctuation in non-Roman formatting (e.g., \fIdatabase,\fR)
  • man cross-reference-like fragments with a bolded section (e.g., \fBcommand(1)\fR vs the correct \fBcommand\fR(1))
  • rejustification of eligible, contiguous lines
  • clean up common git-pbchk errors related to superfluous .sp before paragraph markers

The proscribed sections of documents are:

  • no-fill fragments (.nf ... .fi) and tables (.TS ... .TE)
  • disable-adjusting fragments (.na ... .ad) with more than one line

There will be a handful of remaining uses of \f(CW ... \fR.

The scripting will produce warnings about:

  • unexpected intra-line comments (\") that in illumos generally indicate a typo (i.e. should be \e")
  • unexpected lone \f characters that generally indicate a typo
  • unexpected lingering escaped characters that generally indicate a typo
  • ellipsis-like but with unexpected spacing that generally indicate faulty conversion (e.g., [\fIat_job_id.\fR \fI\&..\fR] in at.1)
  • SEE ALSO sections with pattern-breaking, post-conversion contents that generally indicate a typo (e.g., .BR toupper(3C), in towupper.3c)
  • remaining font decorations that generally indicate typos or very non-systematic use (e.g., \fB\FB-q\fR\fR in head.1 or \fBAdditionally, \fBaccept4()\fR in accept.3socket)

History

#1

Updated by Yuri Pankov about 2 years ago

I don't see the point in this. If you are touching the man page and have the will/skills to convert it to mdoc, nice. Converting the man(5) pages to be a bit nicer serves no purpose.

#2

Updated by C Fraire about 2 years ago

This conversion finds errors in the documents and produces content that is entirely searchable in OpenGrok, vs words smushed to \fB and \fI font characters having to be searched with the "fb" prefix.

This issue also acknowledges that semantic markup could not be added reliably systematically, though a future case to produce a conversion page-by-page to be reviewed page-by-page would benefit from systematic man(5) input.

The idea of systematically converting to mandoc but not adding any semantic tagging (or worse using heuristics to do it) while also having to endure the stricter mandoc checks for no benefit seems stupid.

#3

Updated by Yuri Pankov about 2 years ago

Yes, exactly, if it's not semantic markup, I don't really care if it's man(5) or \fB\, it does NOT matter really -- it looks the same in output and doesn't help any tool to understand our pages better.

#4

Updated by Yuri Pankov about 2 years ago

And yes, I mean, please don't do this, I'm trying to save your time -- it won't help anything. If you are aware of any situations where it does help, please let me know.

#5

Updated by C Fraire about 2 years ago

Oh well I do care. And I gave just gave a reason why a tool, OpenGrok, used by the community will understand the pages better.

#6

Updated by Yuri Pankov about 2 years ago

Ok, it's your time.

#7

Updated by C Fraire about 2 years ago

Most bizarre about this exchange, Yuri, is I entered this case because I told you I would in responding to your tangential demand on https://www.illumos.org/rb/r/642/ that I manually edit cross-references to "fix them to be at least proper man(5), i.e., using .BR — what we have now is largely an awful conversion from another source and doesn't resemble man(5) man pages."

So you're in favor of manual edits (and cumbersome review thereof) to "proper man(5)" but any purpose to systematic conversion is unfathomable to you.

Also available in: Atom PDF