Bug #9705
closedNTP client crashing
100%
Description
Hi.
I installed/enabled the NTP client on OpenIndiana. It usually starts when I start it by hand. Sometimes it will be in maintenance mode after a reboot and I can clear it to start it again. Even after running for a short while it will fail and go into maintenance mode again.
I'm using an up-to-date OpenIndiana. I did a TextInstall in a VM and only added tmux and enabled NTP. I updated everything to the latest as of 7Aug18.
I just enabled ntp using 'svcadm enable ntp' without changing any configuration.
harry@vultr-oi:~$ date 6 August 2018 at 23:57:04 AEST harry@vultr-oi:~$ ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== +ntp2.ds.network 202.46.177.18 2 u 12 64 37 44.955 -22.107 17.201 *45.76.113.31 (d 17.253.66.125 2 u 18 64 37 0.428 -20.438 15.068 +y.ns.gin.ntt.ne 249.224.99.213 2 u 18 64 37 1.156 -22.878 16.041 +ntp3.ds.network 27.124.125.250 3 u 18 64 37 44.871 -20.451 16.153 ... ... harry@vultr-oi:~$ ntpq -p ntpq: read: Connection refused harry@vultr-oi:~$ svcs -xv ntp svc:/network/ntp:default (Network Time Protocol (NTP) Version 4) State: maintenance since 7 August 2018 at 00:24:01 AEST Reason: Restarting too quickly. See: http://illumos.org/msg/SMF-8000-L5 See: man -M /usr/share/man -s 1M ntpd See: man -M /usr/share/man -s 4 ntp.conf See: man -M /usr/share/man -s 1M ntpq See: /var/svc/log/network-ntp:default.log Impact: This service is not running. $ grep ntpd /var/adm/messages* /var/adm/messages.0:Aug 6 23:52:16 vultr-oi ntpd[512]: [ID 702911 ntp.error] /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed /var/adm/messages.0:Aug 6 23:52:16 vultr-oi ntpd[512]: [ID 702911 ntp.error] exiting (due to assertion failure) /var/adm/messages.0:Aug 7 00:09:13 vultr-oi ntpd[1150]: [ID 702911 ntp.error] /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed /var/adm/messages.0:Aug 7 00:09:13 vultr-oi ntpd[1150]: [ID 702911 ntp.error] exiting (due to assertion failure) /var/adm/messages.0:Aug 7 00:13:48 vultr-oi ntpd[1201]: [ID 702911 ntp.error] /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed /var/adm/messages.0:Aug 7 00:13:48 vultr-oi ntpd[1201]: [ID 702911 ntp.error] exiting (due to assertion failure) /var/adm/messages.0:Aug 7 00:24:01 vultr-oi ntpd[1218]: [ID 702911 ntp.error] /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed /var/adm/messages.0:Aug 7 00:24:01 vultr-oi ntpd[1218]: [ID 702911 ntp.error] exiting (due to assertion failure)
Related issues
Updated by Harry Liebel about 4 years ago
Sorry, put it in the wrong section should be in the OI tracker. Can I move it?
Updated by Marcel Telka about 4 years ago
- Project changed from illumos gate to OpenIndiana Distribution
Updated by Michal Nowak about 4 years ago
I think I've seen this bug on SmartOS in Linux KVM before, usually after resuming from suspension to memory.
Anyway, we should update to 4.2.8p11, although I haven't seen anything encouraging in p11 changelog.
Filed upstream bug http://bugs.ntp.org/show_bug.cgi?id=3522.
Updated by Harry Liebel about 4 years ago
I ran ntp by itself and got a stacktrace.
root@vultr-oi:~# pstack core core 'core' of 1659: /usr/lib/inet/ntpd -n fffffd7feede6a0a _lwp_kill () + a fffffd7feed78ee0 raise (6) + 20 fffffd7feed51618 abort () + 98 0000000000442139 ???????? () 00000000004c52da isc_assertion_failed () + a 00000000004c1cb4 ???????? () 00000000004c1730 get_free_recv_buffer () 000000000043a0b3 input_handler () + 9f3 00000000004b9cd2 sigio_handler () + 52 fffffd7feeddf836 __sighndlr () + 6 fffffd7feedd24db call_user_handler (16, 0, fffffd7fffdeec40) + 1db fffffd7feedd284e sigacthandler (16, 0, fffffd7fffdeec40) + 10e --- called from signal handler with signal 22 (SIGPOLL) --- fffffd7feede68ba __sigsuspend () + a 00000000004ba1dd wait_for_signal () + 4d 0000000000443191 ntpdmain () + f41 00000000004434b9 ???????? () 000000000042fa43 _start_crt () + 83 000000000042f9a8 _start () + 18
Updated by Michal Nowak about 4 years ago
I prepared a pull request to oi-userland (https://github.com/OpenIndiana/oi-userland/pull/4363) but before that integrates to hipster, you may want to try it from a p5p archive. If you know how to install it (I don't...), you can find it in the pull request. Feedback welcome.
Updated by Michal Nowak about 4 years ago
Nope v4.2.8p11 is failing the very same way.
Updated by Geoff Weiss about 4 years ago
I applied the patches in #4363. ntp no longer goes into maintenance mode. Up until this fix, ntp would dump core after running for around 5-8 minutes. The ntp service has been up for just over three days now and the ntp svc log has no new entries.
Updated by Michal Nowak about 4 years ago
Thanks for the confirmation Geoff! Can you confirm Harry?
Updated by Harry Liebel about 4 years ago
Hi Michal.
Apologies for the late response.
I was trying to get the install from p5p working, but couldn't get the dependency versions right. Is there a trick to it?
So I went down the path to setup a local build environment to try the patch.
I setup a OI-1804 Text install into a virtualbox VM. After enabling NTP on the fresh install it failed as expected.
After updating the VM and setting up the build environment the current (8p10) NTP fails much slower for me. This caused some confusion while I was trying to ensure I could actually replicate the bug before applying your patch.
I have now built and installed the updated version (8p11) of NTP (after a bit of learning and a few false starts).
$ ntpq --help ntpq - standard NTP query program - Ver. 4.2.8p11 $ ntpdc --help ntpdc - vendor-specific NTPD control program - Ver. 4.2.8p11
I still see the failure:
Aug 15 02:18:48 oi1804-vm ntpd[1690]: [ID 702911 ntp.error] /export/home/harry/work/oi-userland/components/network/ntp/ntp-4.2.8p11/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed Aug 15 02:18:48 oi1804-vm ntpd[1690]: [ID 702911 ntp.error] exiting (due to assertion failure)
Harry
Updated by Udo Grabowski almost 4 years ago
I get the same problem when using ntpdate 4.2.8p10@1.3728-o with more than 10 arguments. Someone decided
that 10 hosts are enough (does not happen in old 4.2.7p411@1.2483-o):
ro sunth7 /var/crash/crashcore # ntpdate q timehost imksuns{11,12,13,14,15,16,17,18,19,20,21}>pptail || pptail pf->pptail) failed.
/jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE0) pf
Abort (core dumped)
ro sunth7 /var/crash/crashcore # pstack core-ntpdate-1535118141.16780
core 'core-ntpdate-1535118141.16780' of 16780: ntpdate -q timehost imksuns11 imksuns12 imksuns13 imksuns14 imksuns15
ffffdd7fefa76a0a _lwp_kill () + a
ffffdd7fefa08ee0 raise (6) + 20
ffffdd7fef9e1618 abort () + 98
000000000041b9bf ???? ()
000000000041a814 ???? ()
000000000041a290 get_free_recv_buffer ()
000000000040f33d input_handler () + 11d
000000000041062d ntpdatemain () + 129d
000000000040ed63 _start_crt () + 83
000000000040ecc8 _start () + 18
Updated by Michal Nowak almost 4 years ago
- Category set to OI-Userland
- Status changed from New to In Progress
- Assignee set to Michal Nowak
- Target version set to Hipster
This bug seems to be fixed in 4.2.8p12:
$ ntpdate -q {0,1,2,3}.{cz,ba,at,de,dk}.pool.ntp.org server 37.157.198.150, stratum 2, offset -0.001707, delay 0.03090 server 80.211.195.36, stratum 2, offset -0.001676, delay 0.03139 ... server 77.68.158.228, stratum 2, offset -0.001885, delay 0.06885 server 85.204.137.77, stratum 2, offset -0.001784, delay 0.05257 server 92.246.24.228, stratum 1, offset -0.001474, delay 0.06552 24 Aug 21:31:40 ntpdate[13265]: adjust time server 193.171.23.163 offset -0.002182 sec
I'll update the PR accordingly, would be cool if anyone can confirm it's fixed.
Updated by Geoff Weiss almost 4 years ago
I updated to p12. The ntp service has been stable for over four hours. i do use a customized ntp.conf file; I don't use the default ntp servers provided for openindiana (this may explain why p11 was stable for me).
Geoff
Updated by Harry Liebel almost 4 years ago
Hi.
I've been trying to test it out but I am stuck. On my VPS I did a pkg update to get the new ntp as it went into OI proper recently.
If I check the installed version I get version p12:
harry@vultr-oi:~$ pkg info ntp Name: service/network/ntp Summary: Network Time Protocol Daemon v4 Description: Network Time Protocol v4, NTP Daemon and Utilities Category: System/Services State: Installed Publisher: openindiana.org Version: 4.2.8.12 Branch: 2018.0.0.0 Packaging Date: August 28, 2018 at 06:02:39 AM Last Install Time: August 30, 2018 at 04:55:04 AM Size: 4.66 MB FMRI: pkg://openindiana.org/service/network/ntp@4.2.8.12-2018.0.0.0:20180828T060239Z Project URL: http://www.ntp.org/ Source URL: http://archive.ntp.org/ntp4/ntp-4.2/ntp-4.2.8p12.tar.gz
harry@vultr-oi:~$ pkg list -v ntp FMRI IFO pkg://openindiana.org/service/network/ntp@4.2.8.12-2018.0.0.0:20180828T060239Z i--
But if I run ntp I get version p10:
harry@vultr-oi:~$ ntpdate -v 30 Aug 15:05:10 ntpdate[1471]: ntpdate 4.2.8p10@1.3728-o Sun Feb 11 07:50:49 UTC 2018 (1)
Is there something wrong with the package or with my machine? I've run a pkg verify and it all checks out. I've also done an uninstall and reinstall of ntp but is does the same.
The result is that I still get the failure, but probably because my machine is failing to update. What am I doing wrong?
Thanks,
Harry
Updated by Geoff Weiss almost 4 years ago
I can echo that after a pkg update, which removes my version of p12, I also see the same result of ntpdate -v . The ntp service crashes immediately now, instead of staying up for ~5 minutes.
Sorry, I have no time to troubleshoot the contents of this package.
Updated by Michal Nowak almost 4 years ago
Harry, Geoff you have the right package, the version the binary reports is wrong.
Can you post the error, please?
Updated by Harry Liebel almost 4 years ago
Hi Michal.
The build/package looks suspect. The assert error message also claims to be p10.
harry@vultr-oi:~$ pkg list -v ntp FMRI IFO pkg://openindiana.org/service/network/ntp@4.2.8.12-2018.0.0.0:20180828T060239Z i-- harry@vultr-oi:~$ harry@vultr-oi:~$ harry@vultr-oi:~$ ntpdate -q {0,1,2,3}.{cz,ba,at,de,dk}.pool.ntp.org /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE(((void *)0) == pf->pptail || pptail == pf->pptail) failed. Abort (core dumped) harry@vultr-oi:~$
harry@vultr-oi:~$ pstack core core 'core' of 1781: ntpdate -q 0.cz.pool.ntp.org 0.ba.pool.ntp.org 0.at.pool.ntp.org 0.de. fffffd7fef026fba _lwp_kill () + a fffffd7feefb9330 raise (6) + 20 fffffd7feef91998 abort () + 98 000000000041b9bf ???????? () 000000000041a814 ???????? () 000000000041a290 get_free_recv_buffer () 000000000040f33d input_handler () + 11d 000000000041062d ntpdatemain () + 129d 000000000040ed63 _start_crt () + 83 000000000040ecc8 _start () + 18
Updated by Harry Liebel almost 4 years ago
Looking at the manifest files for ntp p10 and p12 to see what the hashes look like...
I'm going through:
http://pkg.openindiana.org/hipster/en/advanced_search.shtml?token=ntp&show=p&sav=1&rpp=50&v=&action=Advanced+Search
The hashes for these files are the same:
usr/sbin/ntpdate
usr/lib/inet/ntpd
usr/sbin/ntp-keygen
And doing a 'strings' on ntpdate does have the 4.2.8p10 version number in there so it's not pulled from a library..
But for other files are not.
Perhaps a misbehaving clean on the build machine? Or ccache..?
Updated by Harry Liebel almost 4 years ago
If i build ntp p10 and then build p12 in the same tree without doing a clean the hashes for ntpdate and the other binaries stay the same in the published manifest. If I remove the prototype directory, touch the Makefile and build again the hashes are different.. Does that mean the prototype directory is not updated properly or perhaps something that triggers after that? I dont know enough about the build system to know what I'm supposed to see..
Updated by Michal Nowak almost 4 years ago
Problem with our build system was resolved, may I Harry ask you to re-test with latest ntp
from our repos, please?
Updated by Harry Liebel almost 4 years ago
Hi Michal.
Repo version looks to be p12 now. I've updated and done the ntpdate test and that looks good. I've enabled ntp on my VPS again and looking good so far. Thanks :)
Updated by Harry Liebel almost 4 years ago
Is there something in the build we can improve to prevent the need for human intervention when this is updated again in the future?
Updated by Harry Liebel almost 4 years ago
Still working without any problems. I think we can close this one for now :) Thanks!
Updated by Michal Nowak almost 4 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Updated by Michal Nowak almost 4 years ago
- Status changed from Closed to In Progress
- Assignee deleted (
Michal Nowak) - % Done changed from 100 to 0
It actually hasn't been fixed by the latest update. I and other see it. Cloned the original bug: http://bugs.ntp.org/show_bug.cgi?id=3552.
Updated by Michal Nowak over 3 years ago
- Related to Feature #9983: Replace ntp with ntpsec added
Updated by Geoff Weiss over 3 years ago
I resynced my ntp build with userland, and was able to replicate ntp crashing after a few minutes. I don't have a solution to prevent the assertion fail when the service starts on boot, but here is a change to oi-userland/components/network/ntp to prevent the service from going into maintenance a few minutes after being started:
--- Makefile.old 2018-12-14 13:08:03.214030850 +0000 +++ Makefile 2019-01-03 11:54:26.317608534 +0000 @@ -53,7 +53,7 @@ COMPONENT_PREP_ACTION = ( cd $(@D) && autoreconf --force --install ) -CFLAGS += -std=c99 -D_XOPEN_SOURCE=600 -D__EXTENSIONS__ +CFLAGS += -std=c99 -D__EXTENSIONS__ CONFIGURE_BINDIR.64 = $(USRSBINDIR) CONFIGURE_SBINDIR.64 = $(USRSBINDIR)
I prefer the solution of having ntpsec available, though.
Updated by Michal Nowak over 3 years ago
Thanks for the suggestion, Geoff.
Can you describe, what happens with the process (and service) after assertion fails (when compiled without _XOPEN_SOURCE=600
)?
Updated by r a over 3 years ago
I looked at replacing the default ntp with ntp-4.2.8p12 and checking the logs for ntp and used
CC=gcc CFLAGS='-m64 -O3' LDFLAGS=-m64 PKG_CONFIG_PATH=/usr/lib/pkgconfig ./configure --prefix=/usr --mandir=/usr/share/man --bindir=/usr/lib/inet --sysconfdir=/etc/inet --localstatedir=/var/ntp --enable-shared=yes --enable-static=no --with-crypto --enable-solarisprivs --enable-ipv6 --enable-dst-minutes --enable-thread-support --without-sntp --without-ntpsnmpd
4 Jan 13:22:45 ntpd1854: /jenkins/jobs/oi-userland/workspace/components/network/ntp/ntp-4.2.8p10/libntp/recvbuff.c:326: REQUIRE0) pf->pptail || pptail pf->pptail) failed
4 Jan 13:22:45 ntpd1854: exiting (due to assertion failure)
After replacing with ntp-4.2.8p12
4 Jan 14:50:00 ntpd19341: Listen and drop on 0 v6wildcard [::]:123
4 Jan 14:50:00 ntpd19341: Listen and drop on 1 v4wildcard 0.0.0.0:123
4 Jan 14:50:00 ntpd19341: Listen normally on 2 lo0 127.0.0.1:123
4 Jan 14:50:00 ntpd19341: Listen normally on 3 vboxnet0 192.168.56.1:123
4 Jan 14:50:00 ntpd19341: Listen normally on 4 e1000g1 172.16.26.110:123
4 Jan 14:50:00 ntpd19341: Listen normally on 5 lo0 [::1]:123
4 Jan 14:50:00 ntpd19341: Listen normally on 6 e1000g1 [fe80::215:17ff:fe91:7e46%3]:123
4 Jan 14:50:00 ntpd19341: Listen normally on 7 e1000g1 [2a02:8010:d00d:1:8563:3393:d477:5981]:123
4 Jan 14:50:00 ntpd19341: Listen normally on 8 e1000g1 [fd22:d18a:1556:1:6723:1023:d1cc:1355]:123
4 Jan 14:50:00 ntpd19341: Listening on routing socket on fd #29 for interface updates
4 Jan 14:50:00 ntpd19341: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
4 Jan 14:50:00 ntpd19341: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
4 Jan 14:54:26 ntpd19341: recvbuff.c:326: REQUIRE0) pf->pptail || pptail pf->pptail) failed
The ecvbuff.c:326: REQUIRE0) pf->pptail || pptail pf->pptail) failed refers to ntp-4.2.8p12/libntp/recvbuff.c line 326, column 5 which relates the following code
#ifdef NTP_DEBUG_LISTS_H
void
check_gen_fifo_consistency(void *fifo)
{
gen_fifo *pf;
gen_node *pthis;
gen_node **pptail;
pf = fifo;
REQUIRE((NULL pf->phead && NULL pf->pptail) ||
(NULL != pf->phead && NULL != pf->pptail));
pptail = &pf->phead;
for (pthis = pf->phead;
pthis != NULL;
pthis = pthis->link)
if (NULL != pthis->link)
pptail = &pthis->link;
REQUIRE(NULL pf->pptail || pptail pf->pptail);
}
#endif /* NTP_DEBUG_LISTS_H */
Not sure why this code causes a problem.
Updated by Michal Nowak over 3 years ago
Good point r a. We can disable the offending code by disabling debugging code via --disable-debugging
.
Hence: https://github.com/OpenIndiana/oi-userland/pull/4695. Whoever can please build it yourself and test before it gets to the distribution.
Geoff: I think you are right, with your suggested change ntpd
crashes but the ntp
service regenerates. Will see how the one without debugging behaves.
Updated by Geoff Weiss over 3 years ago
I'm happy with the changes in PR4695. The ntp service does not crash any more. The service is now stable, even after initial start during boot.
Updated by Michal Nowak over 3 years ago
- Status changed from In Progress to Closed
- Assignee set to Michal Nowak
- % Done changed from 0 to 100
Now we have service/network/ntp@4.2.8.12,5.11-2018.0.0.1:20190107T160118Z
, that should fix it for everyone. Let us know otherwise.