google / tachometer Goto Github PK

View Code? Open in Web Editor NEW

550.0 26.0 22.0 2.42 MB

Statistically rigorous benchmark runner for the web

License: BSD 3-Clause "New" or "Revised" License

JavaScript 0.09% TypeScript 98.19% HTML 1.72%

tachometer's Issues

Tachometer integration test

The Travis build for the tachometer repo should include an end-to-end test that actually launches a browser.

JSON output file doesn't work when there is a config file

cc @bicknellr

macOS integration test

Travis now supports macOS, and I was able to get it to launch Safari, but it seemed flaky.

Ability to filter rows and columns

Since we display an NxN matrix of benchmark results, the size of the table can get too large to fit in a normal terminal window. Also, in cases where we want to draw attention to only a subset of the results (e.g. for CI integration we usually only care about the results that compare the current PR to master and released, but not the other way around)

Add the ability to filter rows and/or columns of the NxN matrix.

Chrome WebDriver Windows is super slow

It's extremely slow compared to other browsers on Windows, and to Chrome other operating systems.

I filed https://bugs.chromium.org/p/chromedriver/issues/detail?id=2963. The responder has reproduced the slowdown, but doesn't see to consider it significant enough to warrant further investigation.

Needs will need some more work into demonstrating the significance of the problem.

Allow user to pause/resume sampling

Right now If you want to stop sampling before the timeout, you can Ctrl-C, but that will kill the process without showing any results at all. And if you want to keep going after the results are in (e.g. because you want a tighter confidence interval), you have to start over entirely which means throwing away the samples you've already done.

Ideas:

If the user sends SIGINT (Ctrl-C) at any time, then we should show the results we have so far, gracefully shut down, and exit.
After the initial results come in, we should pause rather than exit, so that the user has the option to keep sampling.
If the user presses [p]ause while sampling, we should show the results so far, and pause for more input.
If the user presses [r]esume while paused, we should continue sampling (maybe for 60 seconds each time? maybe you can type how many minutes to go?)
When you're running a non-headless browser, it might be nice to pause for a few seconds every minute or so, because if the browser is stealing focus, it can be hard/impossible to send any input to the runner terminal.

(6. This should probably be another issue for another time, but the entire UI could also be re-worked to fully control the terminal, and update a single table continuously using ANSI clear and cursor control sequences, so you'd just watch the intervals tighten up, and press [p] or quit when you want. My concern would be that it might make it tempting to mentally extrapolate a false convergence and draw a conclusion too early).

cc @justinfagnani @sorvell for comment

Add global reporting API

Currently the bench.js library is an ES module, and does a fetch call when bench.stop() is called, to send the timing data back to the server. Users can also send this response themselves if they know the right JSON format and address (though this isn't documented).

There are a few problems with this:

If you aren't using the built-in server, then you don't know what host/port to send your response to.
IE doesn't support ES modules or fetch.

So, we can add a new reporting API (or replace bench.js?), which is simply that the benchmark puts the timing data onto window somewhere (e.g. window.tachometer = 123). The runner would then just poll using WebDriver until it finds the data. We already find first contentful paint in a similar way, by polling the performance.getEntriesByName API.

Another option might be to have the user add a performance entry, but then the runner would need to know which entry to look for (so it would need to be an additional config parameter). It would also not be possible to put arbitrary numbers in there, since the performance API doesn't support that AFAIK. The global approach seems like the simplest solution that covers all the use-cases.

A follow on, related to #34, could be to support sending back key/value pairs (e.g. window.tachometer = {foo:123, bar:456}), so that multiple timing numbers could be reported from the same benchmark.

cc @sorvell @frankiefu

Support labels for GitHub checks

To support lit/lit#920, we'll need the ability to specify custom labels to use when reporting GitHub check data.

errors logged about favicon.ico response type

  Error: Unknown response type undefined for /favicon.ico
      at Server.cache (/usr/local/google/home/bicknellr/Code/test/ce-polyfill-perf-test/node_modules/tachometer/src/server.ts:194:13)

cc @bicknellr

FCP gets measured even with no HTTP response

When when the browser never gets an HTTP response (e.g. a server that is down), we still get an FCP measurement, presumably from the built-in error page. We'll need to detect these cases somehow to make sure we aren't giving very misleading results.

Ability to control browser window size and placement

cc @sorvell

We should also have a smaller default window size, and also maybe try to tile out the different browsers, because being visible on the desktop can affect whether the browser considers itself in the foreground or not (which can affect timer throttling etc.)

Support for dropping outliers

One would think that the more samples we take, the more stable is the result. However, taking more samples also means there is a higher chance of getting an interference from the system (some daemon doing expensive work, flushing of caches, etc).

This feature request is about having a statistically rigorous way of dropping outliers before computing the confidence interval, so that one or two crazy measurements don't cause an "unsure" result, and adding more samples guarantees getting a more stable result.

This should be optional, not hard-coded, because outliers are not always independent from the page being tested (e.g. if a page has a 1% chance of hitting an expensive GC).

Display time remaining until timeout during auto-sampling.

I always forget to note when I started running a benchmark with a really long timeout, so I don't know how long I'll be waiting. It would be nice if the time remaining until the timeout expires was printed during auto-sampling.

node_modules sometimes gets mounted to //node_modules

Happens when node_modules is a direct child of the root directory, and package versions are in use. Possibly happens in other scenarios too. Probably just some error in path construction somewhere.

cc @bicknellr

Tachometer node module resolution doesn't work when a packageVersions variant has a new module

Support Internet Explorer

Since IE doesn't support modules, fetch, or first-contentful-paint, there are currently no measurements that could be done from IE even if we added launching support. So, this is blocked by #60 to add a new global-based API that IE benchmarks could use.

Add --version flag

chrome window size e2e test is flaky in Travis

Maybe 50% of the time, the end-to-end test that checks whether we can control Chrome's window size fails in Travis:

  68 passing (1m)
  1 failing
  1) e2e
       chrome-headless
         window size:
      AssertionError: expected 50000 to equal 100000
      + expected - actual
      -50000

Cannot run benchmarks on non-chrome browsers

Error: Browser firefox does not support the first contentful paint (FCP) measurement

Support GitHub package versions with monorepos

NPM's github: protocol for specifying package versions only works when the top-level of the git repository maps to the NPM package (see npm/npm#2974 which is won't-fix). In the case of monorepos, such as the one for Material Web Components, packages are organized into sub-directories. This means there's no way to use tachometer's package-version feature for benchmarking commits of repos like this.

The simplest solution is to write a bash script that makes a few clones of the repo, and then sets the package version to the path of the local clone (plus the package directory). We should at least document this pattern if that's the answer.

We could also think about building some kind of support into tachometer. It would basically do the same thing as above, with temp directories. Not convinced yet it's worth the complexity, maybe we should start with documenting the manual pattern.

cc @e111077

Support for setting browser preferences (i.e. about:config flags) in browser config

At least for Firefox and Chrome

Simpler top-level JSON configuration style for browser and measurement

It's not obvious how to use the generic "expand" feature in the JSON config file. People expect to just put the browser/measurement type at the top-level of the config file. We should support this, and maybe just remove the "expand" feature in the interest of simplicity. It's always possible to explicitly enumerate any more complex sets of benchmarks you might want.

package version install dirs should cache less aggressively

Currently, when you use the package version feature, a deterministic temporary directory is used per-label, which is always re-used. We should definitely invalidate this cache if the package.json dependencies changes. Possibly we should always re-install (and we could then auto-delete the temp directory), but need to think about whether that's required for real scenarios (since installing can be slow).

cc @bicknellr

EPIPE and ECONNRESET errors

Errors like this get logged to the terminal sometimes:

Error: write EPIPE
      at WriteWrap.afterWrite (net.js:836:14)

Error: read ECONNRESET
      at TCP.onread (net.js:660:25)

It doesn't crash the runner, and doesn't seem to affect the result, just logs. The reported case was a large site, with many files, some of which were large.

Use case: Polymer vs LitElement based components benchmarks

Hi, I have a question about using this tool for our purpose at Vaadin.

We would like to setup a sample app similar to shack for using in benchmarks test.
But what we need is not to test against older commit, but agains the older version.

So we'd like to test LitElement-based app agains Polymer-based app.
Both apps would be implemented using exactly the same components.

Is this doable with tachometer at all (using another app as baseline)?

@aomarks I would appreciate any advice on this.

browser shows as [object Object] in result table

Since refactoring the internal representation of a browser from a string to an object, I forgot to update the result table output to show the name instead of trying to just print the object, which displays as [object Object].

Crash on reporting results

$ tach --config tach.json 
Running benchmarks

[==========================================================] done
TypeError: Reduce of empty array with no initial value
    at Array.reduce (<anonymous>)
    at sumOf (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:142:15)
    at Object.summaryStats (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:51:15)
    at /Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:635:44
    at Array.map (<anonymous>)
    at makeResults (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:633:31)
    at automaticMode (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:673:27)
    at processTicksAndRejections (internal/process/task_queues.js:89:5)
    at Object.main (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:390:14)

Config:

{
  "$schema": "https://raw.githubusercontent.com/Polymer/tachometer/master/config.schema.json",
  "resolveBareModules": false,
  "sampleSize": 10,
  "timeout": 0,
  "benchmarks": [
    {
      "browser": "chrome-headless",
      "measurement": "global",
      "expand": [
        {
          "name": "v1 ytcp-video-row",
          "url": "http://localhost:8001/?tag=ytcp-video-row"
        },
        {
          "name": "v2 ytcp-video-row",
          "url": "http://localhost:8002/?tag=ytcp-video-row"
        },
        {
          "name": "v1 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8001/?tag=ytcp-video-metadata-editor-section"
        },
        {
          "name": "v2 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8002/?tag=ytcp-video-metadata-editor-section"
        }
      ]
    }
  ]
}

chrome screen size doesn't work on macOS

Maybe firefox too. Probably just settings flags wrong.

JSON output file should include confidence interval statistics

Currently it only includes the raw data. I think it should actually not include the raw data, and only include the statistics.

cc @bicknellr

javascript syntax error in HTML file causes empty response

Might be something koa-node-resolve does

cc @bicknellr

Server should not cache responses in --manual mode

Cache and warmup

Now that we do bare module rewriting, we're spending time parsing every HTML and JS request. This could distort first-contentful-paint measurements, and increases time waiting for benchmark results in all cases.

The best solution might be to add an in-memory cache to the Koa server for all requests, and to always do a single warmup run for each benchmark configuration before starting measurements.

Default measurement should always be global

Currently the default measurement is bench.start/stop when using the built-in server, and first-contentful-paint when using an external URL.

FCP is a particularly bad default, because if you implement either of the other two measurement styles in your code, FCP will still work and return some number, and it's easy to think you are measuring what you implemented instead of FCP.

Let's make the window.tachometerResult global the default in all cases, since it's the most universal (works in all browsers and with both internal/external URLs).

Also, the CLI flag is called "measure", but the JSON config file property is "measurement". These should be the same.

Can't use config file and manual mode at the same time

cc @sorvell

Support Windows and Edge

We should be able to launch tachometer from Windows, and control Edge.

Note IE is tracked separately at #61 because it has an additional blocker.

Support a custom global measurement expression

Currently the result for a global measurement is read from window.tachometerResult.

It would be convenient to be able to specify an custom, arbitrary expression to poll for to retrieve the result, which opens up flexibility for measuring pages in production without modifying the code.

HTML files sometimes truncated

It appears that HTML files are sometimes truncated when served from the built-in static server. Could be related to koa-node-resolve. Needs more investigation.

Don't show query params when benchmark has a label

When you use the built in server with query params, then the output table always displays query params, even if you gave it a concise label. This is not the case for external URLs.

--resolve-bare-modules flag can turn off bare module resolution

If you set the flag --resolve-bare-modules, then the flag will be set to empty string, which be falsey (actually we check === true anyway) and turn off bare module resolution. No value should mean on, and anything other than no value, "true", or "false" should be an error.

cc discoverer @e111077

Support multiple measurements per benchmark

I think we'd like to return multiple labeled measurements from a benchmark, so that we can see metrics like FCP, TTFI, idle time, GC time, memory, etc.

To view multiple measures we probably want a table that shows results relative to a chosen baseline, rather than the NxN matrix.

CSV output

We often put tachometer results into a spreadsheet for sharing. A CSV output format would make this easier.

Resolve bare module specifiers

Once https://github.com/Polymer/koa-node-resolve is released, we should integrate it into tachometer, possibly with a flag that can disable it.

Support JSON configuration file

We should support a JSON configuration file as an alternative to using command-line flags.

Lets you check in and share benchmark suite configurations.
Will allow more complex and readable configurations, e.g. where it is unclear when a flag like --package-version will do a cross product, just apply to one benchmark, etc.

Windows integration test

We now have an end to end integration test running on Linux with Travis. We should also have one for Windows. Travis actually now has Windows support, but it is immature according to their own docs. I was not able to get it to launch Edge (not even sure if the images have Edge). We should probably use AppVeyor instead for now.

Provide a way to enable CPU throttling to emulate slow CPUs

Similar to CPU throttling in Chrome DevTools

Move browser drivers to peer dependencies

Hi,

I am currently focusing on performance testing on Google Chrome.

In order to improve build times, it would help me to move the browser driver dependencies to peer dependencies. There is not going to be a need to test performance on any other browser but Chrome for me.

Support remote benchmarking on mobile devices

Remote benchmarking

It would be useful to be able to launch a browser on a remote machine. In particular, we would like to be able to launch Edge or IE on a remote Windows testing machine (on the same network), and point it at a tachometer server running on the local dev machine.

Bytes keep coming up as 0.00 KiB on remote server

I'm using the node main function to run the tests and the size in bytes keeps comings up as zero. Here is an example of what I'm running:

await main([
  '$button:test-basic=test/benchmark/bench-runner.html?bench=test-basic&package=button',
  '--measure=fcp',
  '--browser=chrome@http://localhost:4444/wd/hub',
  '--sample-size=5'
])

It results in the following:

┌─────────────┬───────────────────────────────┐
│   Benchmark │ button:test-basic             │
├─────────────┼───────────────────────────────┤
│     Version │ <none>                        │
├─────────────┼───────────────────────────────┤
│     Browser │ chrome                        │
│             │ @http://localhost:4444/wd/hub │
├─────────────┼───────────────────────────────┤
│ Sample size │ 5                             │
├─────────────┼───────────────────────────────┤
│       Bytes │ 0.00 KiB                      │
└─────────────┴───────────────────────────────┘

It is important to note that I'm dynamically loading the tests. Here is an example of what I'm doing in bench-runner.html

<html>
<head>
  <link rel="shortcut icon" href="data:image/x-icon;," type="image/x-icon">
</head>
<body>
  <script type="module">
    const params = new URLSearchParams(window.location.search);
    const pack = params.get('package');
    const bench = params.get('bench');
    import(`./${pack}/${bench}.js`);
  </script>
</body>
</html>

This then loads a self-running test.

Ability to control browser binary path, to control versions

cc @justinfagnani

google / tachometer Goto Github PK

tachometer's Issues

Recommend Projects

Recommend Topics

Recommend Org