Bug #8923
closedtcpListenDrop counter continusly increases when we put our production webserver on OI
0%
Description
On running following command we see tcpListenDrop at rate of 1000/sec
netstat -sP tcp 1 |grep -i listendrop
tcpListenDrop = 3321 tcpListenDropQ0 = 0
tcpListenDrop = 2494 tcpListenDropQ0 = 0
tcpListenDrop = 2473 tcpListenDropQ0 = 0
tcpListenDrop = 1321 tcpListenDropQ0 = 0
tcpListenDrop = 1518 tcpListenDropQ0 = 0
How to fix this issue
Related issues
Updated by anil choudhary over 5 years ago
initially tcpListenDrop is 0 for five to six hour.
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
If we reboot the webserver Then again it works for five to six hour.
We tired to tune following tcp parameter
ndd -set /dev/tcp tcp_conn_req_max_q 8192
ndd -set /dev/tcp tcp_conn_req_max_q0 8192
ndd -set /dev/tcp tcp_xmit_hiwat 8192
ndd -set /dev/tcp tcp_recv_hiwat 8192
ndd -set /dev/tcp tcp_max_buf 1048576
ndd -set /dev/tcp tcp_smallest_anon_port 2048
ndd -set /dev/tcp tcp_ecn_permitted 1
ndd -set /dev/tcp tcp_keepalive_interval 60000
ndd -set /dev/tcp tcp_ip_abort_interval 10000
ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 30000
but it didn't help
It is appreciated if you look into
As we are running production webserver in it.
Updated by anil choudhary over 5 years ago
We are using latest OI release
pkg info kernel
Name: system/kernel
Summary: Core Solaris Kernel
Description: core kernel software for a specific instruction-set architecture
Category: System/Core
State: Installed
Publisher: openindiana.org
Version: 0.5.11
Branch: 2017.0.0.16782
Packaging Date: October 28, 2017 at 04:51:58 AM
Size: 42.37 MB
FMRI: pkg://openindiana.org/system/kernel@0.5.11-2017.0.0.16782:20171028T045158Z
webserver is working fine initially but after 8-16 hour it start giving latency and what we observe during this time is that it hogs one cpu and tcpListendrop also increases.
and output of below dtrace command is following
dtrace -n 'profile-1002 /cpu == 18/ { @calls[curthread , stack(5)] = count(); }'
-1914828104640
ip`dce_lookup_and_add_v4+0x98 ip`
ip_set_destination_v4+0x2d2
ip`ip_attr_connect+0xd6
ip`conn_connect+0x11a
ip`tcp_set_destination+0x6e
6665
It is really appreciated if you help us to sort it out.
Updated by anil choudhary over 5 years ago
even after setting following
ndd -set /dev/ip ip_dce_reclaim_fraction 1default is 3ndd -get /dev/ip ip_dce_reclaim_fraction 3
kmem_cache!grep dce keep on increasing at rate of 200/sec
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15194244
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15198612
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15200510
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15202538
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15204540
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15206126
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15208232
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15210364
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15213588
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15220764
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15222870
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15224300
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15226900
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15229422
::kmem_cache!grep dce
fffffea39a483008 dce_cache 0000 000000 152 15231918
can you please suggest how to reduce it
Updated by Robert Bailey over 5 years ago
It would be good to know what hardware you are running this on. Also what web server software is installed. Maybe it is a problem with the NIC or the NIC driver?
Thank You
Updated by anil choudhary over 5 years ago
Hardware is HP blade server with bnxe driver
Updated by Avnindra Singh over 5 years ago
Robert Bailey wrote:
It would be good to know what hardware you are running this on. Also what web server software is installed. Maybe it is a problem with the NIC or the NIC driver?
Thank You
Webserver is a proprietary application.
Updated by anil choudhary over 5 years ago
even after increasing ip_dce_hash_size=2048, problem shifted for couple of days but not resolved.
and it is resurfacing after couple of days.
we have followed work around mentioned in https://www.illumos.org/issues/3925
Bug #3925: IP DCE does not scale - illumos gate - illumos.org
www.illumos.org
From the Joyent bug report and evaluation: The IP DCE is a cache of dce_t structures (see usr/src/uts/common/inet/ip/ip_dce.c, specifically the dce_lookup_and_add_v4 ...
we have configure cornjob for every 3 hours to put memory pressure using following script
#!/bin/bash
ndd -set /dev/ip ip_dce_reclaim_fraction 1
{
echo '%/l/lotsfree >oldlotsfree'
echo 'lotsfree/Z ((%/l/freemem)*4)'
sleep 2
echo 'lotsfree/Z <oldlotsfree'
} | exec mdb -kw
but reclaim stopped after some time in this condition we take savecore -L and using scat tool we got this stack
and reclaim worker thread hangs in cv_wait
following is the stack
thread name: [ fffffe4228d759b0 _resume_from_idle+0x112() ]
fffffe4228d759e0 swtch+0x141()
fffffe4228d75a20 cv_wait+0x70(fffffea3900acd6a, fffffea3900acd60)
fffffe4228d75a80 tcp_ixa_cleanup_getmblk+0x93(fffffea408a6e080)
fffffe4228d75ad0 conn_ixa_cleanup+0x8d(fffffea408a6e080, 0)
fffffe4228d75b40 ipcl_walk+0xc3(fffffffff7c03140, 0, fffffea35ec5b000)
fffffe4228d75b80 ip_dce_reclaim_stack+0x91(fffffea35ec5b000)
fffffe4228d75bc0 ip_dce_reclaim+0x5c()
fffffe4228d75c20 dce_reclaim_worker+0xf0(0)
fffffe4228d75c30 thread_start+8()
Updated by anil choudhary over 5 years ago
This bug can be closed it is fixed by Bug #5123
Updated by Marcel Telka over 5 years ago
- Is duplicate of Bug #5123: IP DCE does not scale - part deux added