Project

General

Profile

Actions

Bug #16040

open

auditing could be calmer about translation of nodename to IP address

Added by Joshua M. Clulow 22 days ago. Updated 7 days ago.

Status:
In Progress
Priority:
Normal
Category:
lib - userland libraries
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

In general, it is expected that the system nodename (as described in uname(2) and utsname.h(3HEAD)) will resolve, through the hosts(5) database, to a valid IP address for the machine. In some cases it is possible for that IP address to be the "primary" or "best" IP address of the machine, though not all mobile or multi-homed systems will support such a concept.

If that expectation is not met, the auditing system can be very noisy (even and especially on the system console) with generally unhelpful messages about resolution failure; e.g.,

Jul  9 03:10:00 aunixsystem cron[1796]: [ID 574227 user.alert] Solaris_audit getaddrinfo(aunixsystem) failed[node name or service name not known]: Error 0
Jul  9 03:10:00 aunixsystem cron[1796]: [ID 574227 user.alert] Solaris_audit mapping aunixsystem to fe80::8:20ff:fe97:6ad0: Error 0

I believe this message comes from adt_get_hostIP(), which is called by both adt_load_ttyname() and adt_load_hostname(), themselves variously called while assembling audit records for inclusion in the logs. The two messages are generated seemingly every time we fail to resolve a name, which can be quite frequent depending on how often auditable events are generated. The second message is emitted when we fall back from a hosts-based mapping to the frankly pretty reasonable position of just using the first IP address we can find by looking at configured interfaces.

Though it has been historically expected, systems should not be required to maintain the mapping from the system nodename to a particular IP address in their hosts database, at least not by the core operating system. We should probably just stop emitting these messages in the logs altogether -- and even if we don't do that, we should definitely stop thrusting them onto the console. If operators are actually looking at their audit records, they're going to notice that the IP address field does or does not meet their expectations and investigate. If they're not looking, it doesn't matter anyway!

Actions #1

Updated by Joshua M. Clulow 22 days ago

Probably the best course of action is to leave these messages in place, but make them DPRINTF() calls instead of adt_write_syslog() calls. The code is already littered with debugging log statements in that style.

Actions #2

Updated by Joshua M. Clulow 22 days ago

  • Status changed from New to In Progress
  • Assignee set to Joshua M. Clulow
Actions #3

Updated by Gordon Ross 20 days ago

One other caution: audit event generation can have fairly high frequency
Lookup of IP by hostname (or vice versa) can be horribly slow, so you'll want to avoid doing it too often.
The hostname and it's IP are unlikely to change often, so a cache might be an effective solution.

Actions #4

Updated by Joshua M. Clulow 7 days ago

Background on Audit Files

As some background, I have written a program to poke at the raw contents of the audit files and studied their contents a little to make myself comfortable that ignoring this condition is fine. The program is up on GitHhub at: https://github.com/jclulow/junk-illumos-audit

You can see that along with the noisy logging, the system is indeed falling back to including some IP address in the records; e.g., this cron job execution:

RECORD = Header32Ex {
    version: 0x2,
    evtype: Event {
        id: 0x1803,
        name: "AUE_cron_invoke",
        desc: "cron-invoke",
        classes: {
            "ua",
        },
    },
    evmod: {},
    addr: 172.20.2.160, <---------------- here
    sec: 0x655fbe7c,
    nsec: 0x8d98db3,
    bytes: b"$\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\xcf\x17=((\x16\xea\x16\0\xac\x14\x012(\0\x0ccrontab-job\0(\0\n/bin/true\0'\0\0\0\0\0`\0\x07global\0",
}

We can also see an SSH login record. In this case I have included the parsed data, so that you can see the remote IP (the source of the SSH session) is separate from the header record, which merely attempts to identify the local machine:

RECORD = Header64Ex {
    version: 0x2,
    evtype: Event {
        id: 0x181c,
        name: "AUE_ssh",
        desc: "login - ssh",
        classes: {
            "lo",
        },
    },
    evmod: {},
    addr: 172.20.2.160, <---------------- local machine here
    sec: 0x655fc027,
    nsec: 0x41f284b,
    bytes: b"u\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\xe6\xa9\x98[V\0\x02\x16\0\0\0\x116\xac\x14\x10\nr\0\0\0\0\0\0\0\0\0`\0\x07global\0",
}
  DATA = [
    Subject {
        audit_uid: 0x0,
        uid: 0x0,
        gid: 0x0,
        ruid: 0x0,
        rgid: 0x0,
        pid: 0x27e6,
        sid: 0xa9985b56,
        major: 0x21600,
        minor: 0x1136,
        machine: 172.20.16.10, <------------ remote machine!
    },
    Return64 {
        number: 0x0,
        value: 0x0,
    },
    ZoneName(
        "global",
    ),
]

From an architectural perspective, this field is largely useless. Many systems do not have a single stable IP address that can uniquely identify them. Even in the case of systems which have a single IP address at the time, this address appears (without any translated hostname) in the file forever and is unlikely to be useful to uniquely identify the machine in audit records stored for the long term.

When you are looking at local audit records, you know they originated on this machine: you're on the machine! When you are collecting audit records together in a central location, you must do so in a way that preserves the trail of where they came from; e.g., by naming the files for their source in some directory structure, possibly signing them with some cryptographic stamp, etc. This is even more important for long term archival of audit records for compliance reasons. This header field is not part of that story: it is easily spoofable, it depends on the current dynamic network state of the machine, etc.

We should in the future look at creating a new audit format, with more robust parsing properties (e.g., it should be possible to ignore unknown records just by looking at the stream, which should include a uniform length prefix on every frame). When we do that, we should drop this field altogether.

Actions #5

Updated by Electric Monk 7 days ago

  • Gerrit CR set to 3157
Actions #6

Updated by Joshua M. Clulow 7 days ago

Testing Notes

Before the change

Configured a system with auditing:

auditconfig -setflags ex,as,ss,lo,ps,ua
auditconfig -setnaflags ex,as,ss,lo,ps,ua
auditconfig -setpolicy +argv,zonename
audit -s

Confirming that things are well:

# auditconfig -getflags
active user default audit flags = ex,ps,as,ss,lo(0x40131000,0x40131000)
configured user default audit flags = ex,ps,as,ss,lo(0x40131000,0x40131000)

# auditconfig -getpolicy
configured audit policies = argv,cnt,zonename
active audit policies = argv,cnt,zonename

# audit -v
configuration ok

The hosts file does not contain an entry for the system nodename:

# uname -n
jmctest

# cat /etc/hosts
::1 localhost loghost
127.0.0.1 localhost loghost

# ipadm show-addr -po addr vioif0/ | sed 's,/.*,,'
172.20.2.160

# getent hosts 172.20.2.160

I have a cron job that runs every minute to generate annoying log messages:

# crontab -l | tail -1
* * * * * /bin/true

And indeed every minute I get a series of errors from audit activity generated by cron:

Nov 23 20:50:00 jmctest cron[10080]: [ID 574227 user.alert] Solaris_audit getaddrinfo(jmctest) failed[node name or service name not known]: Error 0
Nov 23 20:50:00 jmctest cron[10080]: [ID 574227 user.alert] Solaris_audit setting Audit IP address to kernel: Error 0

Also when logging in on the console:

jmctest console login: root
Nov 23 20:51:49 jmctest login: Solaris_audit getaddrinfo(jmctest) failed[node name or service name not known]: Error 0
Nov 23 20:51:49 jmctest login: Solaris_audit setting Audit IP address to kernel: Error 0
Nov 23 20:51:49 jmctest login: ROOT LOGIN /dev/console

After the change

Built and installed my change. The flood of messages as abated, on login:

Booting...
Oxide Helios Version bug-16040-0-g3b6a265561 64-bit (onu)
Hostname: jmctest

jmctest console login: root
Nov 23 21:25:56 jmctest login: ROOT LOGIN /dev/console
Last login: Thu Nov 23 21:12:07 from 172.20.16.10
The illumos Project     bug-16040-0-g3b6a265561 November 2023
You have new mail.
#

There are also no log messages due to either cron or SSH activities.

Audit records are still being generated:

# praudit -l /var/audit/*not*term* | awk -F, '$4 == "login - ssh" || $4 == "cron-invoke"' | tail
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:27:00.254 +00:00,subject,root,root,root,root,root,502,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,95,2,login - ssh,,172.20.2.160,2023-11-23 21:27:17.415 +00:00,subject,root,root,root,root,root,504,3479429602,136704 11642 vulcan.eng.oxide.computer,return,success,0,zone,global
header,95,2,login - ssh,,172.20.2.160,2023-11-23 21:27:17.951 +00:00,subject,root,root,root,root,root,508,1149871557,5632 13041 vulcan.eng.oxide.computer,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:28:00.291 +00:00,subject,root,root,root,root,root,519,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:29:00.372 +00:00,subject,root,root,root,root,root,532,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:30:00.410 +00:00,subject,root,root,root,root,root,545,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:31:00.449 +00:00,subject,root,root,root,root,root,566,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:32:00.487 +00:00,subject,root,root,root,root,root,582,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:33:00.525 +00:00,subject,root,root,root,root,root,594,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global
header,107,2,cron-invoke,,172.20.2.160,2023-11-23 21:34:00.606 +00:00,subject,root,root,root,root,root,598,389883944,1466 136704 monolith.eng.oxide.computer,text,crontab-job,text,/bin/true,return,success,0,zone,global

# date -uR
Thu., 23 Nov. 2023 21:34:30 +0000
Actions

Also available in: Atom PDF