illumos watch for May

illumos news for May

Flag Day from Dan McDonald

FLAG DAY - 4719 affects nightly, package, and poold mail came to several mailing lists. Dan McDonald announces that #4719 update gate build environment to [open]jdk7 commited at commit 4d0eb50e691de4c20b1dd9976ad6839fede8a42d to illumos-gate introduces necessity to install and use JDK7. As noted by him, OpenIndiana 151a9 does not have this version by default. Besides installing [open]JDK7 on OpenIndiana 151a9 you'll also need several other steps (quoting mail from Dan):

  • "either set JAVA_ROOT to a source of JDK7, or must have /usr/java populated with JDK7"
  • "because poold defines JAVA_ROOT in its binaries, you must set JAVA_ROOT when building poold to match the runtime java on your ONU or otherwised-packaged target"
  • "IMPORTANT --> If you are an OI 151a9 user, and wish to use poold, installing openjdk7 in instances is not sufficient. You will need to set /usr/java to point to the openjdk7 instance as well. Illumos bug 5851 tracks this"
As described by Dan, it's one of last in series of changes that will allow to build illumos-gate on disitrubutions other than OpenIndiana.
Right after this mail, Dan sent another one: HEADS UP -- illumos-gate can now be built on OmniOS r151014 or later. Be sure to read the mail as there are important pointers within.

For everybody facing a problem with NFS exhausting worker threads and system memory

there's a cross post from OmniOS mailing list from Dan McDonald.

I've seen similar behaviour while at Nexenta. Marcel Telka confirmed that the fix is: commit 38a08eded6638c97b482cd2556751142ff8b6fe5 at illumos-nexenta gate.

On the toes of this discussion Marcel sent mail asking for review of 3783: Flow control is needed in rpcmod when the NFS server is unable to keep up with the network. As noted by Marcel, it has been reviewed internally at Nexenta about two years ago. It got pretty enthusiastic answer from Dan McDonald:

This fix has also helped at least on OmniOS user unscrew his 
previously untenable NFS situation.

I reviewed it in Nexenta, and I reiterate now: Ship it!

Dan

With this commit from Gordon Ross:

1527 SMB server in non-global zones

smb service can now be run from within a non-global zone. The announce mail from Gordon is under this link.

More from Gordon Ross on SMB:

Gordon created issue 5917: User-mode SMB server. It's a follow up on work done at Nexenta to allow for easier development of smb code:

Development of the SMB server kernel code can be accelerated
significantly by allowing the code to run in user space, where one
can use not only the usual dtrace and mdb tools, but also full
source-level debugging etc.

This work was developed at Nexenta in preparation for major work
in the SMB server to add SMB2 support and other modernization.
Architecturally, there are just three major parts of this work[...]

You can read whole issue description for more details.

A big thing came from Dan McDonald:

4719 enables illumos-gate building on OmniOS. Announce mail under the link. It's quite a thing for illumos to not be tied to a particular distribution for building. Following the commit, Dan edited How To Build illumos wiki page to include instructions for using OmniOS as build OS for illumos.

For all that were looking forward for vanity naming in zones, here goes 5877 from Robert Mustacchi:

5877: Want zone aware vanity naming and transient devices in dladm and flowadm 

As a reminder, vanity naming allows to give network devices arbitrary names. 

Robert Mustacchi also prototyped two new mutexes for the userland, basing their behaviour on kernel mutexes.

Excerpt from his idea proposal mail (which you can read for yourself):

Another proposed change by Robert Mustacchi is increasing the value of IOV_MAX from 16 to 1024 (mail).

It raised some questions about breaking currentl applications from Garrett d'Amore:

My question here becomes one of concern about breakage & fallout that can occur with this change.  Programs & libraries compiled with the old value may become incompatible with those built from the new.  Therefore, I see this change as creating a potential flag day.

I don’t see any work done here to mitigate this risk.  Am I missing something?

At a minimum, a transition period, with a message explaining what we are doing and the potential concerns for developers, and perhaps a way to get the old value (-DOLD_IOV_MAX ?) would probably be helpful.  (Btw, the kernel bits can always enjoy the new larger value, so I think there isn’t a problem *there*, and I’m happy with the approach not to heap allocate unless a larger number of iovs is really needed.)

(Out of curiosity, what programs have you seen that actually *use* more than 16 elements in the S/G array?  It feels like 16 *ought* to be plenty for normal sane programs.   The most S/G elements I’ve ever seen is 17, and that was for in-kernel DMA because the elements could be broken up on page boundaries, but iovs don’t need to worry about page alignment.  

Robert answered that the change is actually following default for Linux, OSX and FreeBSD and gave QEMU and tmux as two examples of applications that break within lx brand with limit of 16.

I asked Robert for explanation on IRC and here's the answer:

16:43 <@rmustacc> madwizard: pong 
16:47 < danmcd> ping alp and Woodstock
17:12 < madwizard> rmustacc: What does this IOV_MAX do?
17:12 <@rmustacc> Are you familiar with the concept of I/O vectors?
17:13 < madwizard> Not much.
17:14 <@rmustacc> So the traditional unix interfaces take a buffer and a length.
17:15 <@rmustacc> So, the signature of read or write is to take a file descriptor, a pointer to a buffer, and a length of that buffer.
17:16 <@rmustacc> Eventually, folks wanted to be able to read data into more than one buffer or write from more than one buffer using a single system call.
17:16 <@rmustacc> So they introduced the struct iovec.
17:16 <@rmustacc> Which has two members, the pointer to a buffer and a length for that buffer.
17:17 <@rmustacc> With that, new interfaces were introduced called readv and writev.
17:17 <@rmustacc> Which take a file descriptor, a series of struct iovec and a number of these iovecs.
17:18 <@rmustacc> Now, back in the day, they wanted to limit the number of such vectors.
17:19 <@rmustacc> So a macro IOV_MAX described the maximum number of such vectors that the kernel supports.
17:20 <@rmustacc> On most other platforms today that value was 1024.
17:20 <@rmustacc> On illumos, the value has historically been set to 16 because of the System V interface test suite.
17:20 < tsoome> so basically you get like records type file api instead of plain byte stream
17:20 < tsoome> ?
17:20 <@rmustacc> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/syscall/rw.c#611
17:21 <@rmustacc> tsoome: Not quite. You always consume the entirety of one of those buffers before going onto the next.
17:21 < tsoome> ok, i see
17:22 <@rmustacc> Put differently, if you were using readv for say UDP, you couldn't say put a UDP datagram in each of them.
17:22 <@rmustacc> It's really meant for doing scatter/gather I/O.
17:22 <@rmustacc> So for example, I want to read a header into one buffer and a payload into another.
17:22 < tsoome> yea
17:23 < patdk-wk> for say like, database reads, after you already searched the index and know what rows yo uwant
17:23 <@rmustacc> madwizard: I realize that's a bit and might not make the most sense, but let me know what questions you have.
17:23 < madwizard> rmustacc: I wanted to understand significance of this change in illumos
17:24 <@rmustacc> Well, it hasn't landed yet.
17:24 < madwizard> rmustacc: I suppose it has high chance?
17:24 <@rmustacc> I hope so.
17:24 <@rmustacc> But basically all the change itself is doing is raising that limit from 16->1024.
17:25 < tsoome> old limits from time with little memory and slow disks
17:25 <@rmustacc> Which brings what we have in line with other platforms.
17:25 < madwizard> Well, seeing as you mentioned qemu and tmux as apps failing in lx brand with current limit, I presume the change is something we would like to have
17:25 <@rmustacc> Well, the limit is for a bit more specific limit.
17:25 <@rmustacc> Well, QEMU fails natively because at least the version I had didn't check IOV_MAX at all. :/
17:25 < madwizard> rmustacc: Most code today is developed for Linux, I think
17:26 < tsoome> lot
17:26 <@rmustacc> But, raising the kernel limit will be required for lx.
17:26 < madwizard> So it makes sense to at least consider their policies
17:26 <@rmustacc> Regardless of whether or not we want to raise it for user applications.

Another proposed change from Joyent relayed to mailing list by Robert Mustacchi:

5886 and 5887 provide read only bootfs filesystem that allows to provide kernel with arbitrary objects during boot time and later on. Read the Redmine ticket (5886 want ability to privide additional objects at boot)for exemple uses. Sounds very interesting.

DTrace

From the dtrace discuss mailing list: Nan Xiao wanted to pick up implementing dtrace probes for python (mail here), ut found only old blog entries from John Levon. He was pointed by Chris Ridd (mail) and Thijs Metsch (mail) to Metsch's work on getting dtrace probes upstreamed into python itself: https://www.jcea.es/artic/python_dtrace.htm

What I've prototyped and have found quite useful and successful in the
varpd daemon are adding two new locking routines to the illumos lock
interfaces in libc -- mutex_enter() and mutex_exit() that have the same
semantics as in the kernel, except they abort a process as opposed to
panic the system. They require that the lock be of type LOCK_ERRORCHECK
and the interfaces don't attempt to do anything for robust mutexes,
where you need to be paying attention to all the error codes anyways

So far Garrett d'Amore and Marcel Telka voiced their support for the idea.

On OpenIndiana Discuss Apostolos Syropoulos posted interesting thread about java plugin for solaris version of Oracle's JDK 8 being removed. You can read the thread here.

dmake integration into illumos gate - proposed by Richard Lowe

Argument in favor from his announce mail. As you are most porbably aware, ASLR is a security technique helping to prevent buffer overflow attacks. Working support in illumos would be indeed great.

Alex Wilson and -fstack-protector patch

Issue 5922: Want support for building with -fstack-protector is first in series of proposed changes to help against stack smash attacks. I caught it through this tweet. You can read whole annoucement to illumos-developer here. A short introduction from Alex's announce mail:

Short version: this flag tells the compiler to pick a certain subset of
functions (explanation in comments) and add a "stack guard" or "canary"
-- a magic value placed below the %rbp and return pointer, which is
written at function entry and then checked again at function exit. If
the value doesn't match, we do not return and instead jump straight into
panic().

In this way, when a stack smashing attack is in progress, we avoid
giving an attacker control of %rip and instead immediately crash in a
safe manner.
As explained in the issue itself, this is a flag in GCC compilers that is going to be set default by linux kernel compile process.

Toomas Soome filed an RFC for alternate boot loader.

Looking for some alternate way to boot illumos Toomas Soome started working on checking out FreeBSD loader. As we know, FreeBSD loader is capable of booting not only ZFS single and mirror but also RAID-Z configs. As Toomas wrote himself:

Old story, but from another angle. While grub2 is still one possible way to go, it has its cons (it has its own development goals which may or may not fit illumos needs, and possible licensing issues), so I have boxed it for time being and checked out about freebsd loader.

While work on loader is nowhere near being done, I have managed to boot both 64 and 32 bit illumos with it (vmware environment), and got few ideas about integration. The current code can boot just from GPT labeled disk (pmbr + gptzfsboot bootblocks), adding up MBR+SMI is easy to add and just question of time.
Seems like something to check out. It definately got noticed by others and got quite enthusiastic welcome from Garrett D'Amore:
Just to inject a few pennies here — this work is absolutely *awesome*.  I’ve been wanting to look at doing this myself for *years* now.  I’m so glad that you’re doing it Toomas.  I think also the fbsd folks have figured out things like proper ZFS support in their boot loader, which is probably a lot cleaner/better than what its in Grub, even without the licensing “issues”.

New issues in May: 72, 26 are already closed.

5969

update illumos-gate to use python2.7

5968

libilb'ilb_close() misses mutex_unlock in some places

5967

Invalid multiline string in zfs-tests/tests/functional/cli_root/zpool_create/zpool_create_features_004_neg

5966

zfs-tests/tests/functional/acl/nontrivial/zfs_acl_chmod_rwx_002_pos incorrectly uses typeset builtin

5965

native svc.configd doesn't need audit support

5964

mkostemp() and mkostemps() need prototypes in stdlib.h

5963

rmis should be removed from arcstat manpage

5962

clean target fails in usr/src/common/avs

5961

Fix stack overflow in zfs_create_fs

5960

zfs recv should prefetch indirect blocks

5959

clean up per-dataset feature count code

5958

rpc/xdr.h forces stdio.h on you

5957

File locking should be read-preferring

5956

orientate is not a word

5955

pat_sync is clever enough to check for X86FSET_PAT

5954

ffs(3c) manpage is a little confusing

5953

zfs sharenfs property can result in incorrect sharetab

5952

dd should be able to suppress printing status information

5950

assertion failed for thread 0x7fffff152a40, thread-id 1: object_count usedobjs (0x447 0x435), file ../zdb.c, line 1922

5948

drmach_init should support 12 cpus (not a power of 2)

5947

NULL is not a character in asm10k.c

5946

zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots

5945

zfs_ioc_send_space must ensure that fromsnap refers to a snapshot

5944

Related to 5943 (please merge)

5943

grub fails to boot when too many boot environments are present

5941

The rfscallnames_v4 array should contain less entries

5940

Missing release_lockowner NFSv4 client stats

5939

NFS_ACL version 4 leftovers should be removed

5938

tmpfs locks processes in paging loop when there's no swap partition

5937

sockaddr_in6 synopsis incorrect

5934

localedef needs a native build

5932

pkg fails with an Abort signal when trying to set a publisher

5931

after svccfg import is killed, subsequent import fails

5930

fasttrap_pid_enable() panics when prfind() fails in forking process

5929

stale v_path slows vfs lookups

5928

htable_walk strays into the VA hole

5926

insert buffer between zfs send's read from disk and write to fd

5925

zfs receive -o origin=

5924

System deadlocked in htable_steal_active

5923

Panic on ht->ht_lock_cnt 0 assertion in htable_release

5921

Want a netbios_enable parameter in sharectl_smb

5920

mpt_sas panic when a SAS cable is disconnected

5919

svccfg import returns before service can be used by svcadm

5918

Memory leak when zfs_destroy_snaps_nvl fails

5915

msgpullup(9f) behaves as if len is always -1

5914

pipe generates inodes with ino > 4G

5913

audit_syslog does not correctly parse exec events and spams the log file as a result.

5912

full stream can not be force-received into a dataset if it has a snapshot

5911

ZFS "hangs" while deleting file

5910

libnisdb won't build with modern GCC

5909

zfs_ioc_promote does not check new name length for shared snapshots

5908

asy should not adjust line control or speed on attach

5907

xdrmblk_getpos() is unreliable

5906

beadm destroy fails when running from live image

5905

wrong dn_maxblkid on dnode received to large_block dataset

5904

unable to save/modify files on CIFS share from Windows 7

5903

SMB server breaks an oplock on after an attribute (only) open

5902

Unable to take ownership, view permissions, or delete files...

5901

SMB clients get errors trying to delete non-empty directories

5900

read-raw, write-raw are dead code

5899

panic on bad mutex in smb_event_wait()

5898

SO_RCVTIMEO broken for AF_UNIX

5896

svccfg import returns before service can be used by svcadm

5895

mdb_alloc() succeeds for 4294967295 bytes in 32-bit process

5893

igmp timout logic induces panic

5892

CPU caps can throttle performance more than intended

5883

race in netstack_unregister and zone creation

5882

Temporary pool names

5881

taskq.c: maxall vs. maxalloc in comments

5879

Additional check_rtime exceptions needed for non-OI illumos-gate compilation

5878

Additional lint fixes for sunstudio12.1 and more modern OpenSSL

5876

intel sys/regset.h pollutes name space

Issues closed in May: 58

5964

mkostemp() and mkostemps() need prototypes in stdlib.h

5961

Fix stack overflow in zfs_create_fs

5955

pat_sync is clever enough to check for X86FSET_PAT

5954

ffs(3c) manpage is a little confusing

5947

NULL is not a character in asm10k.c

5946

zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots

5945

zfs_ioc_send_space must ensure that fromsnap refers to a snapshot

5944

Related to 5943 (please merge)

5937

sockaddr_in6 synopsis incorrect

5931

after svccfg import is killed, subsequent import fails

5930

fasttrap_pid_enable() panics when prfind() fails in forking process

5928

htable_walk strays into the VA hole

5926

insert buffer between zfs send's read from disk and write to fd

5912

full stream can not be force-received into a dataset if it has a snapshot

5909

zfs_ioc_promote does not check new name length for shared snapshots

5904

unable to save/modify files on CIFS share from Windows 7

5903

SMB server breaks an oplock on after an attribute (only) open

5902

Unable to take ownership, view permissions, or delete files...

5901

SMB clients get errors trying to delete non-empty directories

5900

read-raw, write-raw are dead code

5899

panic on bad mutex in smb_event_wait()

5898

SO_RCVTIMEO broken for AF_UNIX

5896

svccfg import returns before service can be used by svcadm

5893

igmp timout logic induces panic

5883

race in netstack_unregister and zone creation

5879

Additional check_rtime exceptions needed for non-OI illumos-gate compilation

5878

Additional lint fixes for sunstudio12.1 and more modern OpenSSL

5870

dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch

5852

remove SFW* from the gate's Makefiles

5840

cmd/ypcmd build failed without usr/lib/libsocket.so on build host

5818

zfs {ref}compressratio is incorrect with 4k sector size

5816

mdb '::status' truncates assertion messages

5815

libzpool's panic function doesn't set global panicstr, ::status not as useful

5763

missing va_end() on return from __fini_daemon_priv()

5753

libnsl: set_up_connection() over TCP does not adhere the specified timeout

5669

altroot not set in zpool create when specified with -o

5651

Missing header file sys/fs/s5ino.h referenced in /usr/src/cmd/acct/diskusg.c

5624

sed(1) should not refer to re_format(7)

5623

auditconfig(1M) man page incomplete

5612

lpadmin dumps core in getlist

5596

tar doesn't properly wait for its children

5515

dataset user hold doesn't reject empty tags

5438

zfs_blkptr_verify should continue after zfs_panic_recover

5269

zpool import slow

4744

remove traces of /var/sadm/system/admin/default_java

4743

Fix deprecated /usr/j2se usage in slp and remove from filesystem(5) manpage

4742

update manifests for javadoc7

4567

Long RPC messages could confuse kernel and cause panic

4492

zone hung in down state, stuck in ilb_stack_fini

4426

mci_rw_lock white-hot under load

4418

svc.startd crash with a long instance name and more than one dependency

4348

Assertion failed: rn->rn_nozpool B_FALSE, file ../common/libzfs_import.c, line 1080, function zpool_open_func

3914

ill_frag_hash_tbl not allocated for loopback interfaces

3783

Flow control is needed in rpcmod when the NFS server is unable to keep up with the network

3599

dtrace_dynvar tail calls can blow stack

2740

zfs_rmnode() fails to clean up znodes

1778

Assertion failed: rn->rn_nozpool == B_FALSE, file ../common/libzfs_import.c, line 1077, function zpool_open_func

1663

Non-empty directories can be marked delete on close in CIFS, but not actually deleted