Test Metadata

Directory Layout

Metadata files must be stored under the metadata directory passed to the test runner. The directory layout follows that of web-platform-tests with each test source path having a corresponding metadata file. Because the metadata path is based on the source file path, files that generate multiple URLs e.g. tests with multiple variants, or multi-global tests generated from an any.js input file, share the same metadata file for all their corresponding tests. The metadata path under the metadata directory is the same as the source path under the tests directory, with an additional .ini suffix.

For example a test with URL:

/spec/section/file.html?query=param

generated from a source file with path:

<tests root>/spec/section.file.html

would have a metadata file

<metadata root>/spec/section/file.html.ini

As an optimisation, files which produce only default results (i.e. PASS or OK), and which don’t have any other associated metadata, don’t require a corresponding metadata file.

Directory Metadata

In addition to per-test metadata, default metadata can be applied to all the tests in a given source location, using a __dir__.ini metadata file. For example to apply metadata to all tests under <tests root>/spec/ add the metadata in <tests root>/spec/__dir__.ini.

Metadata Format

The format of the metadata files is based on the ini format. Files are divided into sections, each (apart from the root section) having a heading enclosed in square braces. Within each section are key-value pairs. There are several notable differences from standard .ini files, however:

  • Sections may be hierarchically nested, with significant whitespace indicating nesting depth.

  • Only : is valid as a key/value separator

A simple example of a metadata file is:

root_key: root_value

[section]
  section_key: section_value

  [subsection]
     subsection_key: subsection_value

[another_section]
  another_key: [list, value]

Conditional Values

In order to support values that depend on some external data, the right hand side of a key/value pair can take a set of conditionals rather than a plain value. These values are placed on a new line following the key, with significant indentation. Conditional values are prefixed with if and terminated with a colon, for example:

key:
  if cond1: value1
  if cond2: value2
  value3

In this example, the value associated with key is determined by first evaluating cond1 against external data. If that is true, key is assigned the value value1, otherwise cond2 is evaluated in the same way. If both cond1 and cond2 are false, the unconditional value3 is used.

Conditions themselves use a Python-like expression syntax. Operands can either be variables, corresponding to data passed in, numbers (integer or floating point; exponential notation is not supported) or quote-delimited strings. Equality is tested using == and inequality by !=. The operators and, or and not are used in the expected way. Parentheses can also be used for grouping. For example:

key:
  if (a == 2 or a == 3) and b == "abc": value1
  if a == 1 or b != "abc": value2
  value3

Here a and b are variables, the value of which will be supplied when the metadata is used.

Web-Platform-Tests Metadata

When used for expectation data, metadata files have the following format:

  • A section per test URL provided by the corresponding source file, with the section heading being the part of the test URL following the last / in the path (this allows multiple tests in a single metadata file with the same path part of the URL, but different query parts). This may be omitted if there’s no non-default metadata for the test.

  • A subsection per subtest, with the heading being the title of the subtest. This may be omitted if there’s no non-default metadata for the subtest.

  • The following known keys:

    expected:

    The expectation value or values of each (sub)test. In the case this value is a list, the first value represents the typical expected test outcome, and subsequent values indicate known intermittent outcomes e.g. expected: [PASS, ERROR] would indicate a test that usually passes but has a known-flaky ERROR outcome.

    disabled:

    Any values apart from the special value @False indicates that the (sub)test is disabled and should either not be run (for tests) or that its results should be ignored (subtests).

    restart-after:

    Any value apart from the special value @False indicates that the runner should restart the browser after running this test (e.g. to clear out unwanted state).

    fuzzy:

    Used for reftests. This is interpreted as a list containing entries like <meta name=fuzzy> content value, which consists of an optional reference identifier followed by a colon, then a range indicating the maximum permitted pixel difference per channel, then semicolon, then a range indicating the maximum permitted total number of differing pixels. The reference identifier is either a single relative URL, resolved against the base test URL, in which case the fuzziness applies to any comparison with that URL, or takes the form lhs URL, comparison, rhs URL, in which case the fuzziness only applies for any comparison involving that specific pair of URLs. Some illustrative examples are given below.

    implementation-status:

    One of the values implementing, not-implementing or default. This is used in conjunction with the --skip-implementation-status command line argument to wptrunner to ignore certain features where running the test is low value.

    tags:

    A list of labels associated with a given test that can be used in conjunction with the --tag command line argument to wptrunner for test selection.

    In addition there are extra arguments which are currently tied to specific implementations. For example Gecko-based browsers support min-asserts, max-asserts, prefs, lsan-disabled, lsan-allowed, lsan-max-stack-depth, leak-allowed, and leak-threshold properties.

  • Variables taken from the RunInfo data which describe the configuration of the test run. Common properties include:

    product:

    A string giving the name of the browser under test

    browser_channel:

    A string giving the release channel of the browser under test

    debug:

    A Boolean indicating whether the build is a debug build

    os:

    A string the operating system

    version:

    A string indicating the particular version of that operating system

    processor:

    A string indicating the processor architecture.

    This information is typically provided by mozinfo, but different environments may add additional information, and not all the properties above are guaranteed to be present in all environments. The definitive list of available properties for a specific run may be determined by looking at the run_info key in the wptreport.json output for the run.

  • Top level keys are taken as defaults for the whole file. So, for example, a top level key with expected: FAIL would indicate that all tests and subtests in the file are expected to fail, unless they have an expected key of their own.

An simple example metadata file might look like:

[test.html?variant=basic]
  type: testharness

  [Test something unsupported]
     expected: FAIL

  [Test with intermittent statuses]
     expected: [PASS, TIMEOUT]

[test.html?variant=broken]
  expected: ERROR

[test.html?variant=unstable]
  disabled: http://test.bugs.example.org/bugs/12345

A more complex metadata file with conditional properties might be:

[canvas_test.html]
  expected:
    if os == "mac": FAIL
    if os == "windows" and version == "XP": FAIL
    PASS

Note that PASS in the above works, but is unnecessary since it’s the default expected result.

A metadata file with fuzzy reftest values might be:

[reftest.html]
  fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]

In this case the default fuzziness for any comparison would be to require a maximum difference per channel of less than or equal to 10 and less than or equal to 200 total pixels different. For any comparison involving ref1.html on the right hand side, the limits would instead be a difference per channel not more than 20 and a total difference count of not less than 200 and not more than 300. For the specific comparison subtest1.html == ref2.html (both resolved against the test URL) these limits would instead be 10 to 15 and 0 to 20, respectively.

Generating Expectation Files

wpt provides the tool wpt update-expectations command to generate expectation files from the results of a set of test runs. The basic syntax for this is:

./wpt update-expectations [options] [logfile]...

Each logfile is a wptreport log file from a previous run. These can be generated from wptrunner using the --log-wptreport option e.g. --log-wptreport=wptreport.json.

update-expectations takes several options:

--full

Overwrite all the expectation data for any tests that have a result in the passed log files, not just data for the same run configuration.

--disable-intermittent

When updating test results, disable tests that have inconsistent results across many runs. This can precede a message providing a reason why that test is disable. If no message is provided, unstable is the default text.

--update-intermittent

When this option is used, the expected key stores expected intermittent statuses in addition to the primary expected status. If there is more than one status, it appears as a list. The default behaviour of this option is to retain any existing intermittent statuses in the list unless --remove-intermittent is specified.

--remove-intermittent

This option is used in conjunction with --update-intermittent. When the expected statuses are updated, any obsolete intermittent statuses that did not occur in the specified log files are removed from the list.

Property Configuration

In cases where the expectation depends on the run configuration wpt update-expectations is able to generate conditional values. Because the relevant variables depend on the range of configurations that need to be covered, it’s necessary to specify the list of configuration variables that should be used. This is done using a json format file that can be specified with the --properties-file command line argument to wpt update-expectations. When this isn’t supplied the defaults from <metadata root>/update_properties.json are used, if present.

Properties File Format

The file is JSON formatted with two top-level keys:

properties:

A list of property names to consider for conditionals e.g ["product", "os"].

dependents:

An optional dictionary containing properties that should only be used as “tie-breakers” when differentiating based on a specific top-level property has failed. This is useful when the dependent property is always more specific than the top-level property, but less understandable when used directly. For example the version property covering different OS versions is typically unique amongst different operating systems, but using it when the os property would do instead is likely to produce metadata that’s too specific to the current configuration and more difficult to read. But where there are multiple versions of the same operating system with different results, it can be necessary. So specifying {"os": ["version"]} as a dependent property means that the version property will only be used if the condition already contains the os property and further conditions are required to separate the observed results.

So an example update-properties.json file might look like:

{
  "properties": ["product", "os"],
  "dependents": {"product": ["browser_channel"], "os": ["version"]}
}

Examples

Update all the expectations from a set of cross-platform test runs:

wpt update-expectations --full osx.log linux.log windows.log

Add expectation data for some new tests that are expected to be platform-independent:

wpt update-expectations tests.log

Why a Custom Format?

Introduction

Given the use of the metadata files in CI systems, it was desirable to have something with the following properties:

  • Human readable

  • Human editable

  • Machine readable / writable

  • Capable of storing key-value pairs

  • Suitable for storing in a version control system (i.e. text-based)

The need for different results per platform means either having multiple expectation files for each platform, or having a way to express conditional values within a certain file. The former would be rather cumbersome for humans updating the expectation files, so the latter approach has been adopted, leading to the requirement:

  • Capable of storing result values that are conditional on the platform.

There are few extant formats that clearly meet these requirements. In particular although conditional properties could be expressed in many existing formats, the representation would likely be cumbersome and error-prone for hand authoring. Therefore it was decided that a custom format offered the best tradeoffs given the requirements.