When terminating a process due to an erroneous condition, the common trend is to use abort(). This has a number of challenges:
1. It ends up potentially calling the program's abort signal handler, which might do bad things
2. It ends up looping, performing a number of different system call in an attempt to get things to a point that a SIGABRT can take effect
While abort(3C) should not change, in the vein of implementing the stack protector in libc per #5788, it would be valuable to have a system call that just guarantees that the process aborts and exits no matter what and optionally can convey a message in an elfnote to someone debugging it. This has use in a number of other ways. Programs like libc (for mutex errors), fmd, varpd, and others have all tried to come up with different internal panic mechanisms that ensure that core dump occurs and the program terminates with messages. Having a uniform means of this that doesn't have the drawbacks of abort would be great.
This change adds a new system call upanic(2) that ha the signature
void noreturn upanic(const void *msg, size_t len). It will copy up to the MIN bytes into an elf note (the upper bound on size is something that we can change, it is not a guarantee of the API). In most cases msg is expected to be a string, but there is no requirement there. A message is not required. If the message cannot be copied in, the program will still terminate and this will be noted in the elf note. Similarly, if the requested length is larger than can be honored, it will be truncated and that will be noted in the elf note. Critically the kernel does zero interpretation of this data. The kernel is merely passing it on. This is done out of a sense of trying to minimize risk and the fact that while we expect that using it for a string message will be the most common, there is no reason that it can't be used for binary data.
In addition, this teaches truss, mdb, and elfdump about this note. Here is an example of using upanic through the stack smashing protection:
rm@beowulf:/ws/rm/ssp/usr/src$ pfexec mdb /var/cores/core.sshd.101320 Loading modules: [ libc.so.1 libuutil.so.1 libnvpair.so.1 libsmbios.so.1 libavl.so.1 libproc.so.1: ld.so.1 ] > ::status debugging core file of sshd (64-bit) from beowulf file: /usr/sbin/sshd initial argv: /usr/sbin/sshd -R threading model: native threads status: process panicked upanic message: *** stack smashing detected
Updated by Robert Mustacchi 6 months ago
I tested this in the following ways:
- I wrote small programs that called upanic and verified the following different cases in elfdump:
- Messages made it into the elf segment
- Non-printable sequences were escaped
- We properly noted the truncation flag and handled the lack of a '\0' in the message
- If an illegal address was given, we still properly terminated the program
- I circled back and used the information in mdb, which utilizes the libproc interfaces that were written
- By wiring it up in #5788 and #13274 we were able to verify that this worked in multiple circumstances, including multi-threaded code.
- A parent sees the exit as equivalent to a SIGABRT where a core was dumped, in terms of the message passed to user land.
- Toomas kindly verified the SPARC build.
Updated by Electric Monk 6 months ago
- Status changed from New to Closed
- % Done changed from 80 to 100
commit 350ffdd54baf880f440ddf9697666e283894ded1 Author: Robert Mustacchi <email@example.com> Date: 2020-11-17T16:52:10.000Z 13273 want upanic(2) Reviewed by: C Fraire <firstname.lastname@example.org> Reviewed by: Andy Fiddaman <email@example.com> Approved by: Gordon Ross <firstname.lastname@example.org>