Systematically convert man(5) pages to use macro requests instead of \f text decorations
The converted documents will have \fB..\fR or \fI..\fR in non-proscribed sections translated to .B, .I, .BR, .IR, .RB, or .RI as necessary with special handling for:
- superfluous nesting that exists in illumos man pages (e.g., \fB\fB ... \fR\fR)
- superceding nesting that exists in illumos man pages (e.g., \fB\fI ... \fR\fR)
- spurious nesting that exists in illumos's man pages (e.g., \fB ... \fI ... \fR ... \fR)
- multi-character troff sequences for symbols or punctuation that exist in illumos man pages (e.g., \(-> or \(mi )
- punctuation characters that have very special, non-character-like meaning when escaped (e.g., \")
- punctuation characters that have character-like meaning when escaped (e.g., \&)
- punctuation characters that are themselves even when escaped (e.g., \?)
- inserting a "Non-printing, zero width character" wherever necessary in the conversion (e.g., to avoid creating invalid line-starts or to avoid undesired, sentence-ending double-spacing)
- using \c when necessary as a last resort for odd, non-spaced neighboring of three fonts that exists in illumos man pages (e.g., \fB ... \fR\fI .... \fR...)
- shifting odd, word-ending punctuation in non-Roman formatting (e.g., \fIdatabase,\fR)
- man cross-reference-like fragments with a bolded section (e.g., \fBcommand(1)\fR vs the correct \fBcommand\fR(1))
- rejustification of eligible, contiguous lines
- clean up common git-pbchk errors related to superfluous .sp before paragraph markers
The proscribed sections of documents are:
- no-fill fragments (.nf ... .fi) and tables (.TS ... .TE)
- disable-adjusting fragments (.na ... .ad) with more than one line
There will be a handful of remaining uses of \f(CW ... \fR.
The scripting will produce warnings about:
- unexpected intra-line comments (\") that in illumos generally indicate a typo (i.e. should be \e")
- unexpected lone \f characters that generally indicate a typo
- unexpected lingering escaped characters that generally indicate a typo
- ellipsis-like but with unexpected spacing that generally indicate faulty conversion (e.g., [\fIat_job_id.\fR \fI\&..\fR] in at.1)
- SEE ALSO sections with pattern-breaking, post-conversion contents that generally indicate a typo (e.g., .BR toupper(3C), in towupper.3c)
- remaining font decorations that generally indicate typos or very non-systematic use (e.g., \fB\FB-q\fR\fR in head.1 or \fBAdditionally, \fBaccept4()\fR in accept.3socket)
Updated by C Fraire over 6 years ago
This conversion finds errors in the documents and produces content that is entirely searchable in OpenGrok, vs words smushed to \fB and \fI font characters having to be searched with the "fb" prefix.
This issue also acknowledges that semantic markup could not be added reliably systematically, though a future case to produce a conversion page-by-page to be reviewed page-by-page would benefit from systematic man(5) input.
The idea of systematically converting to mandoc but not adding any semantic tagging (or worse using heuristics to do it) while also having to endure the stricter mandoc checks for no benefit seems stupid.
Updated by C Fraire over 6 years ago
Most bizarre about this exchange, Yuri, is I entered this case because I told you I would in responding to your tangential demand on https://www.illumos.org/rb/r/642/ that I manually edit cross-references to "fix them to be at least proper man(5), i.e., using .BR — what we have now is largely an awful conversion from another source and doesn't resemble man(5) man pages."
So you're in favor of manual edits (and cumbersome review thereof) to "proper man(5)" but any purpose to systematic conversion is unfathomable to you.