Project

General

Profile

Actions

Bug #4299

closed

logadm should not lock log file for entire duration of copy

Added by Robert Mustacchi almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Category:
cmd - userland programs
Start date:
2013-11-07
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

We experienced an issue on us-beta today where logadm had blocked the marlin agent process for minutes by taking a mandatory lock on the SMF log file while it copied it. The log file was huge, so this took a long time, and the process was blocked this whole time here:

> ::jsstack
 libc_hwcap1.so.1`__write+0x15
 uv_fs_write+0x101
 _ZN4nodeL5WriteERKN2v89ArgumentsE+0x66b
 0xaff3ccc9 internal (Code: aff3cbe1)
 0xaff60379 <anonymous> (as fs.writeSync) (9fede4c1)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb761aca <anonymous> (as SyncWriteStream.write) (9fedb01d)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0x8cb1d996 <anonymous> (as Logger._emit) (a8366abd)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0x8cb1c070 <anonymous> (as <anon>) (a834fbc9)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0x8cbdd193 audit (9fe7c421)
 0xaff245a5 <InternalFrame>
 0xb1ae34b8 <anonymous> (as EventEmitter.emit) (9143aa59)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb71de51 <anonymous> (as Server._addRoute.route.on.self.log.warn.err) (
aa7bdfd9)
 0xb1ae35b5 <anonymous> (as EventEmitter.emit) (9143aa59)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb715cbd next (b9779bb5)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xb1a1b68c maTaskApiLive (a8351dbd)
 0xbb715c37 next (b9779bb5)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb7b4abd maTaskApiInit (9fe627f9)
 0xbb715c37 next (b9779bb5)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xb1af7060 parseQueryString (aa7664cd)
 0xbb715c37 next (b9779bb5)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0x8cb43ee5 parseAccept (aa7664a9)
 0xbb715c37 next (b9779bb5)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xb1a0cd34 run (9fec5e1d)
 0xbb71e7f6 run (b9778b45)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xb1a12433 _request (9fec6091)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb71b828 onRequest (9fe74889)
 0xb1ae35b5 <anonymous> (as EventEmitter.emit) (9143aa59)
 0xaff0db41 <ArgumentsAdaptorFrame>
 0xbb77c1ae <anonymous> (as parser.onIncoming) (a1b4b1ed)
 0xbb7d79f3 parserOnHeadersComplete (a8355515)
 0xaff21bb9 <InternalFrame>
 0xaff12b4a <EntryFrame>

_ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0_6ObjectEEEiPS5_Pb+0x
101
 v8::internal::Execution::Call+0xc9
 v8::Function::Call+0xf0
 node::Parser::on_headers_complete+0x535
 http_parser_execute+0x428
 node::Parser::Execute+0x23d
 0xfe0410bb internal (Code: fe040fc1)
 0xbb70f25f <anonymous> (as socket.ondata) (a1b4b1a5)
 0xbb7ced93 onread (a8354d4d)
 0xaff21bb9 <InternalFrame>
 0xaff12b4a <EntryFrame>

_ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0_6ObjectEEEiPS5_Pb+0x
101
 v8::internal::Execution::Call+0xc9
 v8::Function::Call+0xf0
 node::MakeCallback+0x11b
 node::MakeCallback+0x69
 node::StreamWrap::OnReadCommon+0x10c
 node::StreamWrap::OnRead+0x2e
 uv__read+0x28d
 uv__stream_io+0x14b
 ev_invoke_pending+0x63
 uv__run+0x94
 uv_run+0x17
 node::Start+0x1c7
 main+0x1b
 _start+0x83

For completeness, here was the pfiles of another process in the same state:

[root@00-25-90-94-c0-58 (us-beta-1) ~]# pfiles 23085 | head -20
23085:  /opt/smartdc/agents/lib/node_modules/marlin/build/node/bin/node /opt/s
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:532,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /devices/pseudo/mm@0:null
      offset:0
   1: S_IFREG mode:02644 dev:90,65699 ino:3096 uid:0 gid:0 size:171543077398
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
      mandatory write lock set by process 23385
      /var/svc/log/smartdc-marlin-agent:default.log
      offset:171543077398
   2: S_IFREG mode:02644 dev:90,65699 ino:3096 uid:0 gid:0 size:171543077398
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
      mandatory write lock set by process 23385
      /var/svc/log/smartdc-marlin-agent:default.log
      offset:171543077398
   3: S_IFPORT mode:0000 dev:543,0 uid:0 gid:0 size:6
   4: S_IFIFO mode:0000 dev:530,0 ino:163985 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   5: S_IFIFO mode:0000 dev:530,0 ino:163985 uid:0 gid:0 rdev:0,0

Notice the mandatory lock, held by:

[root@00-25-90-94-c0-58 (us-beta-1) ~]# pargs 23385
23385:  /usr/sbin/logadm
argv[0]: /usr/sbin/logadm

In SF we discussed a possible solution using "ptrim", which takes filenames "old" and "new" and a pid, stops the process, uses the agent lwp to replace all references to "old" with "new", and runs the process again. This would be a much faster mechanism for log rotation.

How could we configure logadm to do this? One approach would be to have svc.startd be responsible for invoking logadm at some interval with the service's logfile and the service's contract id. logadm would rotate the log file and "ptrim" each of the pids in the contract. This keeps the control logic outside logadm while still facilitating a per-service invocation of logadm that knows the service's contract id. Or instead of having svc.startd do this, it could be another "cron" entry that iterates over services.

--

Upon further discussion, we have decided that a more robust course of action is to only lock the log file when the end of the log file is to be copied – the vast majority of the log file can be safely copied without requiring and locking. Here are the diffs:

Actions #1

Updated by Robert Mustacchi almost 9 years ago

  • Status changed from New to Resolved

Resolved in d39070fc31de34cc42eb40ac42b1572eb4650f33.

Actions

Also available in: Atom PDF