7784 uts: console input should support utf-8

Review Request #329 - Created Jan. 19, 2017 and submitted

Information
Toomas Soome
illumos-gate
7784
499
c401662...
Reviewers
general

7784 uts: console input should support utf-8

The base idea is simple - instead of providing 8-bit bytes from keyboard, we should provide unicode chars (U+XXXX), that means the appropriate updates for keyboard maps and the compose and dead key maps. When keyboard driver is done, we translate the unicode chars to UTF-8 byte streams, so the next layers (ldterm) can process the input.

To support the idea, the current internal state in keyboard driver needs to be extended; the current implementation is using 2 bytes for state - low byte is char, high byte is the state bitmap. Fortunately for us, the unicode chars are actually using only 21 bits, so we have some space from 32bit data type and can use the 32-bit ints to represent state bits + the char.

This update does only the simplest transformation to support unicode and utf-8, unfortunately our keyboard driver is quite limited and will need more work to support more states, but such improvements need to be implemented as follow up work.

Also, at this state, the change is done only for x86 keyboard; the sparc version is a bit of problem, because I simply do not have access to sparc system with keyboard & console.

basic text input based on updated estonian keyboard layout map, verifying dead keys.

Issues

  • 3
  • 0
  • 0
  • 3
Description From Last Updated
This change could in theory bust compatibility for someone. That said, its exceedingly unlikely that anyone will notice this. Still, ... Garrett D'Amore Garrett D'Amore
Why bother touching files if the only change is t remove the pragma? Garrett D'Amore Garrett D'Amore
We're almost out of bits here. This is enough for Latin style keyboards I think, but I'm concerned that this ... Garrett D'Amore Garrett D'Amore
Toomas Soome
Toomas Soome
Toomas Soome
Toomas Soome
Yuri Pankov
Yuri Pankov
John Howard
Toomas Soome
Toomas Soome
Igor Kozhukhov
Toomas Soome
Review request changed

Status: Closed (submitted)

Garrett D'Amore

   
usr/src/cmd/loadkeys/loadkeys.y (Diff revision 7)
 
 

This change could in theory bust compatibility for someone. That said, its exceedingly unlikely that anyone will notice this. Still, it probably should have been called out specifically somewhere. Do any of our SPARC users boot over NFS at all?

usr/src/cmd/loadkeys/type_6/albania (Diff revision 7)
 
 

Why bother touching files if the only change is t remove the pragma?

usr/src/uts/common/io/kbtrans/kbtrans.c (Diff revision 7)
 
 

I need to understand why the 7+ is no longer needed.

I'm almost certain we have this identical conversion code somewhere else. I suspect this is duplicated where it shouldn't be. Maybe we need a common utf8 module to handle conversions between Unicode and UTF-8. That said, I'd be willing to let this go for now.

usr/src/uts/common/sys/kbd.h (Diff revision 7)
 
 
We're almost out of bits here.  This is enough for Latin style keyboards I think, but I'm concerned that this encoding is going to bite us in the arse later.

Maybe it's good enough for now... but if we ever support a true Unicode keyboard we will need a better solution.
Loading...