# Making a Testing Plan

When contributing to a project as large and open-ended as WPT, it's easy to get
lost in the details. It can be helpful to start by making a rough list of tests
you intend to write. That plan will let you anticipate how much work will be
involved, and it will help you stay focused once you begin.

Many people come to WPT with a general testing goal in mind:

- specification authors often want to test for new spec text
- browser maintainers often want to test new features or fixes to existing
  features
- web developers often want to test discrepancies between browsers on their web
  applications

(If you don't have any particular goal, we can help you get started. Check out
[the issues labeled with `type:missing-coverage` on
GitHub.com](https://github.com/web-platform-tests/wpt/labels/type%3Amissing-coverage).
Leave a comment if you'd like to get started with one, and don't hesitate to
ask clarifying questions!)

This guide will help you write a testing plan by:

1. showing you how to use the specifications to learn what kinds of tests will
   be most helpful
2. developing your sense for what *doesn't* need to be tested
3. demonstrating methods for figuring out which tests (if any) have already
   been written for WPT

The level of detail in useful testing plans can vary widely. From [a list of
specific
cases](https://github.com/web-platform-tests/wpt/issues/6980#issue-252255894),
to [an outline of important coverage
areas](https://github.com/web-platform-tests/wpt/issues/18549#issuecomment-522631537),
to [an annotated version of the specification under
test](https://rwaldron.github.io/webrtc-pc/), the appropriate fidelity depends
on your needs, so you can be as precise as you feel is helpful.

## Understanding the "testing surface"

Web platform specifications are instructions about how a feature should work.
They're critical for implementers to "build the right thing," but they are also
important for anyone writing tests. We can use the same instructions to infer
what kinds of tests would be likely to detect mistakes. Here are a few common
patterns in specification text and the kind of tests they suggest.

### Input sources

Algorithms may accept input from many sources. Modifying the input is the most
direct way we can influence the browser's behavior and verify that it matches
the specifications. That's why it's helpful to be able to recognize different
sources of input.

```eval_rst
================ ==============================================================
Type of feature  Potential input sources
================ ==============================================================
JavaScript       parameters, `context object <https://dom.spec.whatwg.org/#context-object>`_
HTML             element content, attributes, attribute values
CSS              selector strings, property values, markup
================ ==============================================================
```

Determine which input sources are relevant for your chosen feature, and build a
list of values which seem worthwhile to test (keep reading for advice on
identifying worthwhile values). For features that accept multiple sources of
input, remember that the interaction between values can often produce
interesting results. Every value you identify should go into your testing plan.

*Example:* This is the first step of the `Notification` constructor from [the
Notifications standard](https://notifications.spec.whatwg.org/#constructors):

> The Notification(title, options) constructor, when invoked, must run these steps:
>
> 1. If the [current global
>    object](https://html.spec.whatwg.org/multipage/webappapis.html#current-global-object)
>    is a
>    [ServiceWorkerGlobalScope](https://w3c.github.io/ServiceWorker/#serviceworkerglobalscope)
>    object, then [throw](https://webidl.spec.whatwg.org/#dfn-throw) a
>    `TypeError` exception.
> 2. Let *notification* be the result of [creating a
>    notification](https://notifications.spec.whatwg.org/#create-a-notification)
>    given *title* and *options*. Rethrow any exceptions.
>
> [...]

A thorough test suite for this constructor will include tests for the behavior
of many different values of the *title* parameter and the *options* parameter.
Choosing those values can be a challenge unto itself--see [Avoid Excessive
Breadth](#avoid-excessive-breadth) for advice.

### Browser state

The state of the browser may also influence algorithm behavior. Examples
include the current document, the dimensions of the viewport, and the entries
in the browsing history. Just like with direct input, a thorough set of tests
will likely need to control these values. Browser state is often more expensive
to manipulate (whether in terms of code, execution time, or system resources),
and you may want to design your tests to mitigate these costs (e.g. by writing
many subtests from the same state).

You may not be able to control all relevant aspects of the browser's state.
[The `type:untestable`
label](https://github.com/web-platform-tests/wpt/issues?q=is%3Aopen+is%3Aissue+label%3Atype%3Auntestable)
includes issues for web platform features which cannot be controlled in a
cross-browser way. You should include tests like these in your plan both to
communicate your intention and to remind you when/if testing solutions become
available.

*Example:* In [the `Notification` constructor referenced
above](https://notifications.spec.whatwg.org/#constructors), the type of "the
current global object" is also a form of input. The test suite should include
tests which execute with different types of global objects.

### Branches

When an algorithm branches based on some condition, that's an indication of an
interesting behavior that might be missed. Your testing plan should have at
least one test that verifies the behavior when the branch is taken and at least
one more test that verifies the behavior when the branch is *not* taken.

*Example:* The following algorithm from [the HTML
standard](https://html.spec.whatwg.org/) describes how the
`localStorage.getItem` method works:

> The `getItem`(*key*) method must return the current value associated with the
> given *key*. If the given *key* does not exist in the list associated with
> the object then this method must return null.

This algorithm exhibits different behavior depending on whether or not an item
exists at the provided key. To test this thoroughly, we would write two tests:
one test would verify that `null` is returned when there is no item at the
provided key, and the other test would verify that an item we previously stored
was correctly retrieved when we called the method with its name.

### Sequence

Even without branching, the interplay between sequential algorithm steps can
suggest interesting test cases. If two steps have observable side-effects, then
it can be useful to verify they happen in the correct order.

Most of the time, step sequence is implicit in the nature of the
algorithm--each step operates on the result of the step that precedes it, so
verifying the end result implicitly verifies the sequence of the steps. But
sometimes, the order of two steps isn't particularly relevant to the result of
the overall algorithm. This makes it easier for implementations to diverge.

There are many common patterns where step sequence is observable but not
necessarily inherent to the correctness of the algorithm:

- input validation (when an algorithm verifies that two or more input values
  satisfy some criteria)
- event dispatch (when an algorithm
  [fires](https://dom.spec.whatwg.org/#concept-event-fire) two or more events)
- object property access (when an algorithm retrieves two or more property
  values from an object provided as input)

*Example:* The following text is an abbreviated excerpt of the algorithm that
runs during drag operations (from [the HTML
specification](https://html.spec.whatwg.org/multipage/dnd.html#dnd)):

> [...]
> 4. Otherwise, if the user ended the drag-and-drop operation (e.g. by
>    releasing the mouse button in a mouse-driven drag-and-drop interface), or
>    if the `drag` event was canceled, then this will be the last iteration.
>    Run the following steps, then stop the drag-and-drop operation:
>    1. If the [current drag
>       operation](https://html.spec.whatwg.org/multipage/dnd.html#current-drag-operation)
>       is "`none`" (no drag operation) [...] Otherwise, the drag operation
>       might be a success; run these substeps:
>       1. Let *dropped* be true.
>       2. If the [current target
>          element](https://html.spec.whatwg.org/multipage/dnd.html#current-target-element)
>          is a DOM element, [fire a DND
>          event](https://html.spec.whatwg.org/multipage/dnd.html#fire-a-dnd-event)
>          named `drop` at it; otherwise, use platform-specific conventions for
>          indicating a drop.
>       3. [...]
>    2. [Fire a DND
>       event](https://html.spec.whatwg.org/multipage/dnd.html#fire-a-dnd-event)
>       named `dragend` at the [source
>       node](https://html.spec.whatwg.org/multipage/dnd.html#source-node).
>    3. [...]

A thorough test suite will verify that the `drop` event is fired as specified,
and it will also verify that the `dragend` event is fired as specified. An even
better test suite will also verify that the `drop` event is fired *before* the
`dragend` event.

In September of 2019, [Chromium accidentally changed the ordering of the `drop`
and `dragend`
events](https://bugs.chromium.org/p/chromium/issues/detail?id=1005747), and as
a result, real web applications stopped functioning. If there had been a test
for the sequence of these events, then this confusion would have been avoided.

When making your testing plan, be sure to look carefully for event dispatch and
the other patterns listed above. They won't always be as clear as the "drag"
example!

### Optional behavior

Specifications occasionally allow browsers discretion in how they implement
certain features. These are described using [RFC
2119](https://tools.ietf.org/html/rfc2119) terms like "MAY" and "OPTIONAL".
Although browsers should not be penalized for deciding not to implement such
behavior, WPT offers tests that verify the correctness of the browsers which
do. Be sure to [label the test as optional according to WPT's
conventions](file-names) so that people reviewing test results know how to
interpret failures.

*Example:* The algorithm underpinning
[`document.getElementsByTagName`](https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByTagName)
includes the following paragraph:

> When invoked with the same argument, and as long as *root*'s [node
> document](https://dom.spec.whatwg.org/#concept-node-document)'s
> [type](https://dom.spec.whatwg.org/#concept-document-type) has not changed,
> the same [HTMLCollection](https://dom.spec.whatwg.org/#htmlcollection) object
> may be returned as returned by an earlier call.

That statement uses the word "may," so even though it modifies the behavior of
the preceding algorithm, it is strictly optional. The test we write for this
should be designated accordingly.

It's important to read these sections carefully because the distinction between
"mandatory" behavior and "optional" behavior can be nuanced. In this case, the
optional behavior is never allowed if the document's type has changed. That
makes for a mandatory test, one that verifies browsers don't return the same
result when the document's type changes.

## Exercising Restraint

When writing conformance tests, choosing what *not* to test is sometimes just
as hard as finding what needs testing.

### Don't dive too deep

Algorithms are composed of many other algorithms which themselves are defined
in terms of still more algorithms. It can be intimidating to consider
exhaustively testing one of those "nested" algorithms, especially when they are
shared by many different APIs.

In general, you should plan to write "surface tests" for the nested algorithms.
That means only verifying that they exhibit the basic behavior you are
expecting.

It's definitely important to test exhaustively, but it's just as important to
do so in a structured way. Reach out to the test suite's maintainers to learn
if and how they have already tested those algorithms. In many cases, it's
acceptable to test them in just one place (and maybe through a different API
entirely), and rely only on surface-level testing everywhere else. While it's
always possible for more tests to uncover new bugs, the chances may be slim.
The time we spend writing tests is highly valuable, so we have to be efficient!

*Example:* The following algorithm from [the DOM
standard](https://dom.spec.whatwg.org/) powers
[`document.querySelector`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector):

> To **scope-match a selectors string** *selectors* against a *node*, run these
> steps:
>
> 1. Let *s* be the result of [parse a
>    selector](https://drafts.csswg.org/selectors-4/#parse-a-selector)
>    *selectors*.
> 2. If *s* is failure, then
>    [throw](https://webidl.spec.whatwg.org/#dfn-throw) a
>    "[`SyntaxError`](https://webidl.spec.whatwg.org/#syntaxerror)"
>    [DOMException](https://webidl.spec.whatwg.org/#idl-DOMException).
> 3. Return the result of [match a selector against a
>    tree](https://drafts.csswg.org/selectors-4/#match-a-selector-against-a-tree)
>    with *s* and *node*'s
>    [root](https://dom.spec.whatwg.org/#concept-tree-root) using [scoping
>    root](https://drafts.csswg.org/selectors-4/#scoping-root) *node*.

As described earlier in this guide, we'd certainly want to test the branch
regarding the parsing failure. However, there are many ways a string might fail
to parse--should we verify them all in the tests for `document.querySelector`?
What about `document.querySelectorAll`? Should we test them all there, too?

The answers depend on the current state of the test suite: whether or not tests
for selector parsing exist and where they are located. That's why it's best to
confer with the people who are maintaining the tests.

### Avoid excessive breadth

When the set of input values is finite, it can be tempting to test them all
exhaustively. When the set is very large, test authors can reduce repetition by
defining tests programmatically in loops.

Using advanced control flow techniques to dynamically generate tests can
actually *reduce* test quality. It may obscure the intent of the tests since
readers have to mentally "unwind" the iteration to determine what is actually
being verified. The practice is more susceptible to bugs. These bugs may not be
obvious--they may not cause failures, and they may exercise fewer cases than
intended. Finally, tests authored using this approach often take a relatively
long time to complete, and that puts a burden on people who collect test
results in large numbers.

The severity of these drawbacks varies with the complexity of the generation
logic. For example, it would be pronounced in a test which conditionally made
different assertions within many nested loops. Conversely, the severity would
be low in a test which only iterated over a list of values in order to make the
same assertions about each. Recognizing when the benefits outweigh the risks
requires discretion, so once you understand them, you should use your best
judgement.

*Example:* We can see this consideration in the very first step of the
`Response` constructor from [the Fetch
standard](https://fetch.spec.whatwg.org/)

> The `Response`(*body*, *init*) constructor, when invoked, must run these
> steps:
>
> 1. If *init*["`status`"] is not in the range `200` to `599`, inclusive, then
>    [throw](https://webidl.spec.whatwg.org/#dfn-throw) a `RangeError`.
>
> [...]

This function accepts exactly 400 values for the "status." With [WPT's
testharness.js](./testharness), it's easy to dynamically create one test for
each value. Unless we have reason to believe that a browser may exhibit
drastically different behavior for any of those values (e.g. correctly
accepting `546` but incorrectly rejecting `547`), then the complexity of
testing those cases probably isn't warranted.

Instead, focus on writing declarative tests for specific values which are novel
in the context of the algorithm. For ranges like in this example, testing the
boundaries is a good idea. `200` and `599` should not produce an error while
`199` and `600` should produce an error. Feel free to use what you know about
the feature to choose additional values. In this case, HTTP response status
codes are classified by the "hundred" order of magnitude, so we might also want
to test a "3xx" value and a "4xx" value.

## Assessing coverage

It's very likely that WPT already has some tests for the feature (or at least
the specification) that you're interesting in testing. In that case, you'll
have to learn what's already been done before starting to write new tests.
Understanding the design of existing tests will let you avoid duplicating
effort, and it will also help you integrate your work more logically.

Even if the feature you're testing does *not* have any tests, you should still
keep these guidelines in mind. Sooner or later, someone else will want to
extend your work, so you ought to give them a good starting point!

### File names

The names of existing files and folders in the repository can help you find
tests that are relevant to your work. [This page on the design of
WPT](../test-suite-design) goes into detail about how files are generally laid
out in the repository.

Generally speaking, every conformance tests is stored in a subdirectory
dedicated to the specification it verifies. The structure of these
subdirectories vary. Some organize tests in directories related to algorithms
or behaviors. Others have a more "flat" layout, where all tests are listed
together.

Whatever the case, test authors try to choose names that communicate the
behavior under test, so you can use them to make an educated guess about where
your tests should go.

*Example:* Imagine you wanted to write a test to verify that headers were made
immutable by the `Request.error` method defined in [the Fetch
standard](https://fetch.spec.whatwg.org). Here's the algorithm:

> The static error() method, when invoked, must run these steps:
>
> 1. Let *r* be a new [Response](https://fetch.spec.whatwg.org/#response)
>    object, whose
>    [response](https://fetch.spec.whatwg.org/#concept-response-response) is a
>    new [network error](https://fetch.spec.whatwg.org/#concept-network-error).
> 2. Set *r*'s [headers](https://fetch.spec.whatwg.org/#response-headers) to a
>    new [Headers](https://fetch.spec.whatwg.org/#headers) object whose
>    [guard](https://fetch.spec.whatwg.org/#concept-headers-guard) is
>    "`immutable`".
> 3. Return *r*.

In order to figure out where to write the test (and whether it's needed at
all), you can review the contents of the `fetch/` directory in WPT. Here's how
that looks on a UNIX-like command line:

    $ ls fetch
    api/                           DIR_METADATA  OWNERS
    connection-pool/               h1-parsing/   private-network-access/
    content-encoding/              http-cache/   range/
    content-length/                images/       README.md
    content-type/                  metadata/     redirect-navigate/
    corb/                          META.yml      redirects/
    cross-origin-resource-policy/  nosniff/      security/
    data-urls/                     origin/       stale-while-revalidate/

This test is for a behavior directly exposed through the API, so we should look
in the `api/` directory:

    $ ls fetch/api
    abort/  cors/         headers/           policies/  request/    response/
    basic/  credentials/  idlharness.any.js  redirect/  resources/

And since this is a static method on the `Response` constructor, we would
expect the test to belong in the `response/` directory:

    $ ls fetch/api/response
    multi-globals/                   response-static-error.html
    response-cancel-stream.html      response-static-redirect.html
    response-clone.html              response-stream-disturbed-1.html
    response-consume-empty.html      response-stream-disturbed-2.html
    response-consume.html            response-stream-disturbed-3.html
    response-consume-stream.html     response-stream-disturbed-4.html
    response-error-from-stream.html  response-stream-disturbed-5.html
    response-error.html              response-stream-disturbed-6.html
    response-from-stream.any.js      response-stream-with-broken-then.any.js
    response-init-001.html           response-trailer.html
    response-init-002.html

There seems to be a test file for the `error` method:
`response-static-error.html`. We can open that to decide if the behavior is
already covered. If not, then we know where to [write the
test](https://github.com/web-platform-tests/wpt/pull/19601)!

### Failures on wpt.fyi

There are many behaviors that are difficult to describe in a succinct file
name. That's commonly the case with low-level rendering details of CSS
specifications. Test authors may resort to generic number-based naming schemes
for their files, e.g. `feature-001.html`, `feature-002.html`, etc. This makes
it difficult to determine if a test case exists judging only by the names of
files.

If the behavior you want to test is demonstrated by some browsers but not by
others, you may be able to use the *results* of the tests to locate the
relevant test.

[wpt.fyi](https://wpt.fyi) is a website which publishes results of WPT in
various browsers. Because most browsers pass most tests, the pass/fail
characteristics of the behavior you're testing can help you filter through a
large number of highly similar tests.

*Example:* Imagine you've found a bug in the way Safari renders the top CSS
border of HTML tables. By searching through directory names and file names,
you've determined the probable location for the test: the `css/CSS2/borders/`
directory. However, there are *three hundred* files that begin with
`border-top-`! None of the names mention the `<table>` element, so any one of
the files may already be testing the case you found.

Luckily, you also know that Firefox and Chrome do not exhibit this bug. You
could find such tests by visual inspection of the [wpt.fyi](https://wpt.fyi)
results overview, but [the website's "search" feature includes operators that
let you query for this information
directly](https://github.com/web-platform-tests/wpt.fyi/blob/master/api/query/README.md).
To find the tests which begin with `border-top-`, pass in Chrome, pass in
Firefox, and fail in Safari, you could write [`border-top- chrome:pass
firefox:pass
safari:fail](https://wpt.fyi/results/?label=master&label=experimental&aligned&q=border-top-%20safari%3Afail%20firefox%3Apass%20chrome%3Apass).
The results show only three such tests exist:

- `border-top-applies-to-005.xht`
- `border-top-color-applies-to-005.xht`
- `border-top-width-applies-to-005.xht`

These may not describe the behavior you're interested in testing; the only way
to know for sure is to review their contents. However, this is a much more
manageable set to work with!

### Querying file contents

Some web platform features are enabled with a predictable pattern. For example,
HTML attributes follow a fairly consistent format. If you're interested in
testing a feature like this, you may be able to learn where your tests belong
by querying the contents of the files in WPT.

You may be able to perform such a search on the web. WPT is hosted on
GitHub.com, and [GitHub offers some basic functionality for querying
code](https://help.github.com/en/articles/about-searching-on-github). If your
search criteria are short and distinctive (e.g. all files containing
"querySelectorAll"), then this interface may be sufficient. However, more
complicated criteria may require [regular
expressions](https://www.regular-expressions.info/). For that, you can
[download the WPT
repository](https://web-platform-tests.org/writing-tests/github-intro.html) and
use [git](https://git-scm.com) to perform more powerful searches.

The following table lists some common search criteria and examples of how they
can be expressed using regular expressions:

<div class="table-container">

```eval_rst
================================= ================== ==========================
Criteria                          Example match      Example regular expression
================================= ================== ==========================
JavaScript identifier references  ``obj.foo()``      ``\bfoo\b``
JavaScript string literals        ``x = "foo";``     ``(["'])foo\1``
HTML tag names                    ``<foo attr>``     ``<foo(\s|>|$)``
HTML attributes                   ``<div foo=3>``    ``<[a-zA-Z][^>]*\sfoo(\s|>|=|$)``
CSS property name                 ``style="foo: 4"`` ``([{;=\"']|\s|^)foo\s+:``
================================= ================== ==========================
```

</div>

Bear in mind that searches like this are not necessarily exhaustive. Depending
on the feature, it may be difficult (or even impossible) to write a query that
correctly identifies all relevant tests. This strategy can give a helpful
guide, but the results may not be conclusive.

*Example:* Imagine you're interested in testing how the `src` attribute of the
`iframe` element works with `javascript:` URLs. Judging only from the names of
directories, you've found a lot of potential locations for such a test. You
also know many tests use `javascript:` URLs without describing that in their
name. How can you find where to contribute new tests?

You can design a regular expression that matches many cases where a
`javascript:` URL is assigned to the `src` property in HTML. You can use the
`git grep` command to query the contents of the `html/` directory:

    $ git grep -lE "src\s*=\s*[\"']?javascript:" html
    html/browsers/browsing-the-web/navigating-across-documents/javascript-url-query-fragment-components.html
    html/browsers/browsing-the-web/navigating-across-documents/javascript-url-return-value-handling.html
    html/dom/documents/dom-tree-accessors/Document.currentScript.html
    html/dom/self-origin.sub.html
    html/editing/dnd/target-origin/114-manual.html
    html/semantics/embedded-content/media-elements/track/track-element/cloneNode.html
    html/semantics/scripting-1/the-script-element/execution-timing/040.html
    html/semantics/scripting-1/the-script-element/execution-timing/080.html
    html/semantics/scripting-1/the-script-element/execution-timing/108.html
    html/semantics/scripting-1/the-script-element/execution-timing/109.html
    html/webappapis/dynamic-markup-insertion/opening-the-input-stream/document-open-cancels-javascript-url-navigation.html

You will still have to review the contents to know which are relevant for your
purposes (if any), but compared to the 5,000 files in the `html/` directory,
this list is far more approachable!

## Writing the Tests

With a complete testing plan in hand, you now have a good idea of the scope of
your work. It's finally time to write the tests! There's a lot to say about how
this is done technically. To learn more, check out [the WPT "reftest"
tutorial](./reftest-tutorial) and [the testharness.js
tutorial](./testharness-tutorial).