Bug #16040
openauditing could be calmer about translation of nodename to IP address
0%
Description
In general, it is expected that the system nodename (as described in uname(2) and utsname.h(3HEAD)) will resolve, through the hosts(5) database, to a valid IP address for the machine. In some cases it is possible for that IP address to be the "primary" or "best" IP address of the machine, though not all mobile or multi-homed systems will support such a concept.
If that expectation is not met, the auditing system can be very noisy (even and especially on the system console) with generally unhelpful messages about resolution failure; e.g.,
Jul 9 03:10:00 aunixsystem cron[1796]: [ID 574227 user.alert] Solaris_audit getaddrinfo(aunixsystem) failed[node name or service name not known]: Error 0 Jul 9 03:10:00 aunixsystem cron[1796]: [ID 574227 user.alert] Solaris_audit mapping aunixsystem to fe80::8:20ff:fe97:6ad0: Error 0
I believe this message comes from adt_get_hostIP(), which is called by both adt_load_ttyname() and adt_load_hostname(), themselves variously called while assembling audit records for inclusion in the logs. The two messages are generated seemingly every time we fail to resolve a name, which can be quite frequent depending on how often auditable events are generated. The second message is emitted when we fall back from a hosts-based mapping to the frankly pretty reasonable position of just using the first IP address we can find by looking at configured interfaces.
Though it has been historically expected, systems should not be required to maintain the mapping from the system nodename to a particular IP address in their hosts database, at least not by the core operating system. We should probably just stop emitting these messages in the logs altogether -- and even if we don't do that, we should definitely stop thrusting them onto the console. If operators are actually looking at their audit records, they're going to notice that the IP address field does or does not meet their expectations and investigate. If they're not looking, it doesn't matter anyway!
Updated by Joshua M. Clulow 22 days ago
Probably the best course of action is to leave these messages in place, but make them DPRINTF() calls instead of adt_write_syslog() calls. The code is already littered with debugging log statements in that style.
Updated by Joshua M. Clulow 22 days ago
- Status changed from New to In Progress
- Assignee set to Joshua M. Clulow
Updated by Gordon Ross 20 days ago
One other caution: audit event generation can have fairly high frequency
Lookup of IP by hostname (or vice versa) can be horribly slow, so you'll want to avoid doing it too often.
The hostname and it's IP are unlikely to change often, so a cache might be an effective solution.
Updated by Joshua M. Clulow 7 days ago
Background on Audit Files¶
As some background, I have written a program to poke at the raw contents of the audit files and studied their contents a little to make myself comfortable that ignoring this condition is fine. The program is up on GitHhub at: https://github.com/jclulow/junk-illumos-audit
You can see that along with the noisy logging, the system is indeed falling back to including some IP address in the records; e.g., this cron job execution:
RECORD = Header32Ex { version: 0x2, evtype: Event { id: 0x1803, name: "AUE_cron_invoke", desc: "cron-invoke", classes: { "ua", }, }, evmod: {}, addr: 172.20.2.160, <---------------- here sec: 0x655fbe7c, nsec: 0x8d98db3, bytes: b"$\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\xcf\x17=((\x16\xea\x16\0\xac\x14\x012(\0\x0ccrontab-job\0(\0\n/bin/true\0'\0\0\0\0\0`\0\x07global\0", }
We can also see an SSH login record. In this case I have included the parsed data, so that you can see the remote IP (the source of the SSH session) is separate from the header record, which merely attempts to identify the local machine:
RECORD = Header64Ex { version: 0x2, evtype: Event { id: 0x181c, name: "AUE_ssh", desc: "login - ssh", classes: { "lo", }, }, evmod: {}, addr: 172.20.2.160, <---------------- local machine here sec: 0x655fc027, nsec: 0x41f284b, bytes: b"u\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\xe6\xa9\x98[V\0\x02\x16\0\0\0\x116\xac\x14\x10\nr\0\0\0\0\0\0\0\0\0`\0\x07global\0", } DATA = [ Subject { audit_uid: 0x0, uid: 0x0, gid: 0x0, ruid: 0x0, rgid: 0x0, pid: 0x27e6, sid: 0xa9985b56, major: 0x21600, minor: 0x1136, machine: 172.20.16.10, <------------ remote machine! }, Return64 { number: 0x0, value: 0x0, }, ZoneName( "global", ), ]
From an architectural perspective, this field is largely useless. Many systems do not have a single stable IP address that can uniquely identify them. Even in the case of systems which have a single IP address at the time, this address appears (without any translated hostname) in the file forever and is unlikely to be useful to uniquely identify the machine in audit records stored for the long term.
When you are looking at local audit records, you know they originated on this machine: you're on the machine! When you are collecting audit records together in a central location, you must do so in a way that preserves the trail of where they came from; e.g., by naming the files for their source in some directory structure, possibly signing them with some cryptographic stamp, etc. This is even more important for long term archival of audit records for compliance reasons. This header field is not part of that story: it is easily spoofable, it depends on the current dynamic network state of the machine, etc.
We should in the future look at creating a new audit format, with more robust parsing properties (e.g., it should be possible to ignore unknown records just by looking at the stream, which should include a uniform length prefix on every frame). When we do that, we should drop this field altogether.
Updated by Joshua M. Clulow 7 days ago
Testing Notes¶
Before the change¶
Configured a system with auditing:
auditconfig -setflags ex,as,ss,lo,ps,ua auditconfig -setnaflags ex,as,ss,lo,ps,ua auditconfig -setpolicy +argv,zonename audit -s
Confirming that things are well:
# auditconfig -getflags active user default audit flags = ex,ps,as,ss,lo(0x40131000,0x40131000) configured user default audit flags = ex,ps,as,ss,lo(0x40131000,0x40131000) # auditconfig -getpolicy configured audit policies = argv,cnt,zonename active audit policies = argv,cnt,zonename # audit -v configuration ok
The hosts file does not contain an entry for the system nodename:
# uname -n jmctest # cat /etc/hosts ::1 localhost loghost 127.0.0.1 localhost loghost # ipadm show-addr -po addr vioif0/ | sed 's,/.*,,' 172.20.2.160 # getent hosts 172.20.2.160
I have a cron job that runs every minute to generate annoying log messages:
# crontab -l | tail -1 * * * * * /bin/true
And indeed every minute I get a series of errors from audit activity generated by cron:
Nov 23 20:50:00 jmctest cron[10080]: [ID 574227 user.alert] Solaris_audit getaddrinfo(jmctest) failed[node name or service name not known]: Error 0 Nov 23 20:50:00 jmctest cron[10080]: [ID 574227 user.alert] Solaris_audit setting Audit IP address to kernel: Error 0
Also when logging in on the console:
jmctest console login: root Nov 23 20:51:49 jmctest login: Solaris_audit getaddrinfo(jmctest) failed[node name or service name not known]: Error 0 Nov 23 20:51:49 jmctest login: Solaris_audit setting Audit IP address to kernel: Error 0 Nov 23 20:51:49 jmctest login: ROOT LOGIN /dev/console
After the change¶
Built and installed my change. The flood of messages as abated, on login:
Booting... Oxide Helios Version bug-16040-0-g3b6a265561 64-bit (onu) Hostname: jmctest jmctest console login: root Nov 23 21:25:56 jmctest login: ROOT LOGIN /dev/console Last login: Thu Nov 23 21:12:07 from 172.20.16.10 The illumos Project bug-16040-0-g3b6a265561 November 2023 You have new mail. #
There are also no log messages due to either cron or SSH activities.
Audit records are still being generated:
# praudit -l /var/audit/*not*term* | awk -F, '$4 == "login - ssh" || $4 == "cron-invoke"' | tail header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:27:00.254 +00:00,subject,root,root,root,root,root,502,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,95,2,login - ssh,,172.20.2.160,2023-11-23 21:27:17.415 +00:00,subject,root,root,root,root,root,504,3479429602,136704 11642 vulcan.eng.oxide.computer,return,success,0,zone,global header,95,2,login - ssh,,172.20.2.160,2023-11-23 21:27:17.951 +00:00,subject,root,root,root,root,root,508,1149871557,5632 13041 vulcan.eng.oxide.computer,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:28:00.291 +00:00,subject,root,root,root,root,root,519,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:29:00.372 +00:00,subject,root,root,root,root,root,532,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:30:00.410 +00:00,subject,root,root,root,root,root,545,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:31:00.449 +00:00,subject,root,root,root,root,root,566,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:32:00.487 +00:00,subject,root,root,root,root,root,582,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:33:00.525 +00:00,subject,root,root,root,root,root,594,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:34:00.606 +00:00,subject,root,root,root,root,root,598,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global # date -uR Thu., 23 Nov. 2023 21:34:30 +0000