Project

General

Profile

Feature #11488

add exclusion list functionality to test runner

Added by Kody Kantor 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
tests
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Background

The current system test suite will run all of the tests listed in a runfile.
Every test is executed with the result (pass/fail/skip) noted as they finish and
a summary given upon completion of the test suite.
This sounds great at first, but a couple rough edges have developed over the
years. In any test system it is important that a developer can rely on the
results of the tests to imply (in)correctness of their code. Tests returning
inconsistent results between runs or reporting failures due to unrelated bugs
can cause relevant test failures to be lost in the noise and ignored.

One current problem of ours is that some of the illumos system tests pass
only on a subset of illumos distributions. Usually this is unacceptable and
indicates a bug in the distribution. However due to the different goals and
designs used in illumos distributions it could also be acceptable to have some
tests fail. For example, SmartOS uses a mostly read-only filesystem in the
global zone, which can cause tests that attempt to modify certain files to fail.

This was recently discussed in #11305, both in the illumos ticket and in the
associated illumos-developer mailing list thread.

There are a few ways we tackle this problem today:

Re-write the test so it passes on all distributions

This seems good in theory, and should be done where possible without using
distribution-specific switch statements. However it may require a relatively
large lift and may lead to the tests becoming overly complicated. It could also
introduce distribution-specific logic to the test framework, which would ideally
be distribution agnostic, and may lead to confusing results.

The danger is that we introduce a switch that changes test behavior between
distributions in order to retain passing test status. Take the following
pseudocode for example:

if (smartos) {
    log_must echo done
    cleanup
    exit 0
} else if (omnios || openindiana) {
    log_must do_work
}

Have the downstream distribution carry platform-specific test changes

This is essentially the same thing as the previous approach, except the changes
are maintained in the distro's repo and not illumos-gate. This is a good option
for trivial changes that are irrelevant to other distributions, and is done
today to some extent. An example of this is when distributions ship binaries in
different locations within the filesystem.

Diverging on tests may lead to bugs being introduced that are only caught by
another distribution's version of the tests. This also complicates merges from
upstream.

Have the downstream distribution maintain a runfile in illumos-gate

The distribution could simply omit the failing test and never run it. This is
done today in the ZFS test suite. It's difficult to know which distribution runs
which tests using this system.

Have the downstream distribution maintain a runfile in its own repository

With this approach the downstream distribution must be aware of any test
additions upstream and take initiative to add these new tests to their
distro-specific runfile. If the tests are not added downstream then any new
tests will be unused and new functionality left untested.

General problems

With the current white-list based runfile system it is difficult to know which
tests are being excluded and for what reason. Usually this involves searching
through version control history or attempting to reproduce and diagnose the test
failure locally.

During integration review (RTI) it's difficult to know which tests are failing,
which tests always fail, and which tests were excluded from the test run.

Proposal

I propose that we move to a two-file system. One file is a list of tests
available to include in the test run for each test suite (zfs-tests, os-tests,
etc.). There may be more than one of these for each test suite. For example,
long-running tests designed to identify performance regressions are in a
separate file than quicker-to-execute integration tests. These 'inclusion lists'
are maintained in illumos-gate. These are the runfiles that we have today, but
expanded to include any tests that have been taken out over the years.

There will be a second file that contains a list of tests to exclude for
each test suite. These 'exclusion lists' are maintained by each distribution.
There may also be a template exclusion list maintained in illumos-gate that
contains tests that should not be run on any distribution. Ideally this list
would not exist, or would only exist for a very small amount of time while a bug
fix is being worked on. Some tests that exist today that are broken on all
platforms with no fix in progress.

A user must specify an inclusion list file when they run tests. A user may
optionally specify an exclusion list file. The tests in the inclusion list file
are run, less the tests in the exclusion list file. This maintains compatibility
with the existing system. When new tests are added they are added only to the
inclusion list file.

A list of failing, skipped, killed, and excluded tests will be printed in
the test summary.

When the test suite has completed, the number of excluded tests will be printed
in the test summary alongside the pass, fail, and skip counts.

Plan

The first target for the new exclude-list functionality will be the ZFS test
suite. A few of the ZFS tests are known to fail on all platforms, so we will
add an exclusion list to illumos-gate.

The exclusion list should include comments explaining why the excluded tests are
failing (no deep technical details necessary - only the symptoms, steps to
reproduce, or a pointer to an illumos bug).

Also available in: Atom PDF