Project

General

Profile

Actions

Bug #15591

closed

idmap becomes non-functional after system memory shortage

Added by Toomas Soome about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
smb - SMB server and client
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
racktop:BSR-11451

Description

I arranged for a "disk full" condition by mounting a small, fixed-size tmpfs file system on /var/run/idmap and once idmap is in operation, fill up /var/run/idmap by creating a "space_hog" file in there with dd.

This revealed a couple problems with idmap.

1: Once the disk is full, when idmap runs multi-part sqlite operations that use a transaction, if there's a failure at any point (eg. error = SQLITE_FULL) the only safe recovery is to end with "ROLLBACK TRANSACTION". Most places in idmap do that correctly, but idmap_get_mapped_ids_1_svc had error handling that lacked the rollback. Once an error happens that leaves that transaction open, all further operations that try to start a transaction get an error "cannot start a transaction while a transaction is open". That function is corrected so it always closes its transaction.

2: Once operations on an sqlite database handle return certain errors (not sure exactly which, but apparently SQLITE_NOMEM and SQLITE_FULL) it appears that all subsequent operations that try to use the same handle get errors as well. The idmap code caches sqlite database handles in per-thread data, generally trying to keep those handles open as long as the three lives. The threads are generally RPC service threads, and those typically have fairly long lifetimes. In order to avoid these cascaded failures, I've improved the design to get rid of the cached sqlite handle after "database" errors (which usually means that handle won't work anymore). The new functions: kill_cache_handke, kill_db_handle (companions to get_cache_handle, get_db_handle) are called to "kill" our cached database handle when we've seen a "database error". That causes the next operation done in that thread to reopen the database (if it can). This allows idmap to recover from disk full without a restart.


Related issues

Related to illumos gate - Bug #15590: smbd becomes unresponsive on memory allocation failuresClosedToomas Soome

Actions
Actions #1

Updated by Toomas Soome about 1 year ago

  • Description updated (diff)
Actions #2

Updated by Electric Monk about 1 year ago

  • Gerrit CR set to 2783
Actions #3

Updated by Toomas Soome about 1 year ago

  • Related to Bug #15590: smbd becomes unresponsive on memory allocation failures added
Actions #4

Updated by Toomas Soome about 1 year ago

  • Description updated (diff)
Actions #5

Updated by Toomas Soome about 1 year ago

  • Description updated (diff)
Actions #6

Updated by Toomas Soome about 1 year ago

  • Status changed from In Progress to Pending RTI
Actions #7

Updated by Electric Monk about 1 year ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 90 to 100

git commit e3d9e7f34632cca352dca2ca62a69c94804cbc4f

commit  e3d9e7f34632cca352dca2ca62a69c94804cbc4f
Author: Gordon Ross <gwr@racktopsystems.com>
Date:   2023-05-22T16:43:31.000Z

    15591 idmap becomes non-functional after system memory shortage
    Reviewed by: Jerry Jelinek <gjelinek@racktopsystems.com>
    Reviewed by: Dan Cross <cross@oxidecomputer.com>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF