Project

General

Profile

Actions

Bug #14305

closed

Sync the tod chip following ntp_adjtime(MOD_FREQUENCY)

Added by Andy Fiddaman 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:

Description

We encountered a problem in OmniOS when using the chrony (https://chrony.tuxfamily.org/) time synchronisation daemon.

One report of this problem is tracked at https://github.com/omniosorg/omnios-build/issues/2576

chrony's author commented:

I suspect this is caused by the kernel resetting or adjusting the system clock when it drifts too much from the real time clock. chronyd has no idea something else is touching the system clock, which completely breaks its control loop.
To confirm that, please try it with the kernel parameter dosynctodr set to 0. If it works, we'll need to figure out what chronyd needs to do to suppress the kernel adjustment.

Manually setting dosynctodr to 0 did indeed resolve the problem (replicated across several systems).

It appears, therefore, that chrony and the kernel are competing for control of the software clock, with the kernel adjusting it back. We see this in the chrony logs as a sudden time offset of up to 2 seconds. Since, stepping aside, chrony controls time via adjusting the frequency offset only, it quickly sets the offset to the maximum in one direction or another, and never recovers.
(The configuration as shipped in OmniOS means that it will never step the time after the first initial start-up period.)

The kernel tod_needsync variable (introduced in Solaris 2.6 afaik) was added to remove the requirement for setting the dosynctodr kernel variable to 0 whenever running time synchronisation software. tod_needsync is set to 1 for every call into adjtime() and for every call into ntp_adjtime() that changes the clock offset. This causes the hardware clock to be kept synchronised to the software clock, which is being managed by NTP software. However, it is not set for clock frequency offset adjustments, such as those made by chrony and this seems like an oversight.

I've updated the kernel on some test machines to do this synchronisation, and chrony appears completely stable.

Actions #1

Updated by Electric Monk 8 months ago

  • Gerrit CR set to 1868
Actions #2

Updated by Andy Fiddaman 8 months ago

I've tested this change by running ntpsec and chrony separately on a patched system and confirming that the clock remains in sync over time, including recovering from when an upstream host is unreachable (via blackhole routes) and manual clock steps via `date` and using the `step` action in chrony.

I also monitored the timedelta and tod_needsync variables with dtrace during calls to tod_set() - either periodic or directly as a result of a call into ntp_adjtime() to confirm that things were behaving as I expected, and that the tod clock is being synchronised with the software clock rather than the other way around.

Actions #3

Updated by Electric Monk 8 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit c538cdc56b01e46a42335d3f8d315c44ecf91ac9

commit  c538cdc56b01e46a42335d3f8d315c44ecf91ac9
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
Date:   2021-12-17T21:54:50.000Z

    14305 Sync the tod chip following ntp_adjtime(MOD_FREQUENCY)
    Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF