google / tachometer Goto Github PK
View Code? Open in Web Editor NEWStatistically rigorous benchmark runner for the web
License: BSD 3-Clause "New" or "Revised" License
Statistically rigorous benchmark runner for the web
License: BSD 3-Clause "New" or "Revised" License
The Travis build for the tachometer repo should include an end-to-end test that actually launches a browser.
cc @bicknellr
Travis now supports macOS, and I was able to get it to launch Safari, but it seemed flaky.
Since we display an NxN matrix of benchmark results, the size of the table can get too large to fit in a normal terminal window. Also, in cases where we want to draw attention to only a subset of the results (e.g. for CI integration we usually only care about the results that compare the current PR to master and released, but not the other way around)
Add the ability to filter rows and/or columns of the NxN matrix.
It's extremely slow compared to other browsers on Windows, and to Chrome other operating systems.
I filed https://bugs.chromium.org/p/chromedriver/issues/detail?id=2963. The responder has reproduced the slowdown, but doesn't see to consider it significant enough to warrant further investigation.
Needs will need some more work into demonstrating the significance of the problem.
Right now If you want to stop sampling before the timeout, you can Ctrl-C, but that will kill the process without showing any results at all. And if you want to keep going after the results are in (e.g. because you want a tighter confidence interval), you have to start over entirely which means throwing away the samples you've already done.
Ideas:
If the user sends SIGINT (Ctrl-C) at any time, then we should show the results we have so far, gracefully shut down, and exit.
After the initial results come in, we should pause rather than exit, so that the user has the option to keep sampling.
If the user presses [p]ause while sampling, we should show the results so far, and pause for more input.
If the user presses [r]esume while paused, we should continue sampling (maybe for 60 seconds each time? maybe you can type how many minutes to go?)
When you're running a non-headless browser, it might be nice to pause for a few seconds every minute or so, because if the browser is stealing focus, it can be hard/impossible to send any input to the runner terminal.
(6. This should probably be another issue for another time, but the entire UI could also be re-worked to fully control the terminal, and update a single table continuously using ANSI clear and cursor control sequences, so you'd just watch the intervals tighten up, and press [p] or quit when you want. My concern would be that it might make it tempting to mentally extrapolate a false convergence and draw a conclusion too early).
cc @justinfagnani @sorvell for comment
Currently the bench.js
library is an ES module, and does a fetch
call when bench.stop()
is called, to send the timing data back to the server. Users can also send this response themselves if they know the right JSON format and address (though this isn't documented).
There are a few problems with this:
So, we can add a new reporting API (or replace bench.js
?), which is simply that the benchmark puts the timing data onto window
somewhere (e.g. window.tachometer = 123
). The runner would then just poll using WebDriver until it finds the data. We already find first contentful paint in a similar way, by polling the performance.getEntriesByName
API.
Another option might be to have the user add a performance
entry, but then the runner would need to know which entry to look for (so it would need to be an additional config parameter). It would also not be possible to put arbitrary numbers in there, since the performance API doesn't support that AFAIK. The global approach seems like the simplest solution that covers all the use-cases.
A follow on, related to #34, could be to support sending back key/value pairs (e.g. window.tachometer = {foo:123, bar:456}
), so that multiple timing numbers could be reported from the same benchmark.
To support lit/lit#920, we'll need the ability to specify custom labels to use when reporting GitHub check data.
Error: Unknown response type undefined for /favicon.ico
at Server.cache (/usr/local/google/home/bicknellr/Code/test/ce-polyfill-perf-test/node_modules/tachometer/src/server.ts:194:13)
cc @bicknellr
When when the browser never gets an HTTP response (e.g. a server that is down), we still get an FCP measurement, presumably from the built-in error page. We'll need to detect these cases somehow to make sure we aren't giving very misleading results.
cc @sorvell
We should also have a smaller default window size, and also maybe try to tile out the different browsers, because being visible on the desktop can affect whether the browser considers itself in the foreground or not (which can affect timer throttling etc.)
One would think that the more samples we take, the more stable is the result. However, taking more samples also means there is a higher chance of getting an interference from the system (some daemon doing expensive work, flushing of caches, etc).
This feature request is about having a statistically rigorous way of dropping outliers before computing the confidence interval, so that one or two crazy measurements don't cause an "unsure" result, and adding more samples guarantees getting a more stable result.
This should be optional, not hard-coded, because outliers are not always independent from the page being tested (e.g. if a page has a 1% chance of hitting an expensive GC).
I always forget to note when I started running a benchmark with a really long timeout, so I don't know how long I'll be waiting. It would be nice if the time remaining until the timeout expires was printed during auto-sampling.
Happens when node_modules is a direct child of the root directory, and package versions are in use. Possibly happens in other scenarios too. Probably just some error in path construction somewhere.
cc @bicknellr
Since IE doesn't support modules, fetch, or first-contentful-paint, there are currently no measurements that could be done from IE even if we added launching support. So, this is blocked by #60 to add a new global-based API that IE benchmarks could use.
Maybe 50% of the time, the end-to-end test that checks whether we can control Chrome's window size fails in Travis:
68 passing (1m)
1 failing
1) e2e
chrome-headless
window size:
AssertionError: expected 50000 to equal 100000
+ expected - actual
-50000
Error: Browser firefox does not support the first contentful paint (FCP) measurement
NPM's github:
protocol for specifying package versions only works when the top-level of the git repository maps to the NPM package (see npm/npm#2974 which is won't-fix). In the case of monorepos, such as the one for Material Web Components, packages are organized into sub-directories. This means there's no way to use tachometer's package-version
feature for benchmarking commits of repos like this.
The simplest solution is to write a bash script that makes a few clones of the repo, and then sets the package version to the path of the local clone (plus the package directory). We should at least document this pattern if that's the answer.
We could also think about building some kind of support into tachometer. It would basically do the same thing as above, with temp directories. Not convinced yet it's worth the complexity, maybe we should start with documenting the manual pattern.
cc @e111077
At least for Firefox and Chrome
It's not obvious how to use the generic "expand" feature in the JSON config file. People expect to just put the browser/measurement type at the top-level of the config file. We should support this, and maybe just remove the "expand" feature in the interest of simplicity. It's always possible to explicitly enumerate any more complex sets of benchmarks you might want.
Currently, when you use the package version feature, a deterministic temporary directory is used per-label, which is always re-used. We should definitely invalidate this cache if the package.json dependencies changes. Possibly we should always re-install (and we could then auto-delete the temp directory), but need to think about whether that's required for real scenarios (since installing can be slow).
cc @bicknellr
Errors like this get logged to the terminal sometimes:
Error: write EPIPE
at WriteWrap.afterWrite (net.js:836:14)
Error: read ECONNRESET
at TCP.onread (net.js:660:25)
It doesn't crash the runner, and doesn't seem to affect the result, just logs. The reported case was a large site, with many files, some of which were large.
Hi, I have a question about using this tool for our purpose at Vaadin.
We would like to setup a sample app similar to shack
for using in benchmarks test.
But what we need is not to test against older commit, but agains the older version.
So we'd like to test LitElement-based app agains Polymer-based app.
Both apps would be implemented using exactly the same components.
Is this doable with tachometer
at all (using another app as baseline)?
@aomarks I would appreciate any advice on this.
Since refactoring the internal representation of a browser from a string to an object, I forgot to update the result table output to show the name instead of trying to just print the object, which displays as [object Object]
.
$ tach --config tach.json
Running benchmarks
[==========================================================] done
TypeError: Reduce of empty array with no initial value
at Array.reduce (<anonymous>)
at sumOf (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:142:15)
at Object.summaryStats (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:51:15)
at /Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:635:44
at Array.map (<anonymous>)
at makeResults (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:633:31)
at automaticMode (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:673:27)
at processTicksAndRejections (internal/process/task_queues.js:89:5)
at Object.main (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:390:14)
Config:
{
"$schema": "https://raw.githubusercontent.com/Polymer/tachometer/master/config.schema.json",
"resolveBareModules": false,
"sampleSize": 10,
"timeout": 0,
"benchmarks": [
{
"browser": "chrome-headless",
"measurement": "global",
"expand": [
{
"name": "v1 ytcp-video-row",
"url": "http://localhost:8001/?tag=ytcp-video-row"
},
{
"name": "v2 ytcp-video-row",
"url": "http://localhost:8002/?tag=ytcp-video-row"
},
{
"name": "v1 ytcp-video-metadata-editor-section",
"url": "http://localhost:8001/?tag=ytcp-video-metadata-editor-section"
},
{
"name": "v2 ytcp-video-metadata-editor-section",
"url": "http://localhost:8002/?tag=ytcp-video-metadata-editor-section"
}
]
}
]
}
Maybe firefox too. Probably just settings flags wrong.
Currently it only includes the raw data. I think it should actually not include the raw data, and only include the statistics.
cc @bicknellr
Might be something koa-node-resolve
does
cc @bicknellr
Now that we do bare module rewriting, we're spending time parsing every HTML and JS request. This could distort first-contentful-paint measurements, and increases time waiting for benchmark results in all cases.
The best solution might be to add an in-memory cache to the Koa server for all requests, and to always do a single warmup run for each benchmark configuration before starting measurements.
Currently the default measurement is bench.start/stop when using the built-in server, and first-contentful-paint when using an external URL.
FCP is a particularly bad default, because if you implement either of the other two measurement styles in your code, FCP will still work and return some number, and it's easy to think you are measuring what you implemented instead of FCP.
Let's make the window.tachometerResult
global the default in all cases, since it's the most universal (works in all browsers and with both internal/external URLs).
Also, the CLI flag is called "measure", but the JSON config file property is "measurement". These should be the same.
cc @sorvell
We should be able to launch tachometer from Windows, and control Edge.
Note IE is tracked separately at #61 because it has an additional blocker.
Currently the result for a global
measurement is read from window.tachometerResult
.
It would be convenient to be able to specify an custom, arbitrary expression to poll for to retrieve the result, which opens up flexibility for measuring pages in production without modifying the code.
It appears that HTML files are sometimes truncated when served from the built-in static server. Could be related to koa-node-resolve. Needs more investigation.
When you use the built in server with query params, then the output table always displays query params, even if you gave it a concise label. This is not the case for external URLs.
If you set the flag --resolve-bare-modules
, then the flag will be set to empty string, which be falsey (actually we check === true anyway) and turn off bare module resolution. No value should mean on, and anything other than no value, "true", or "false" should be an error.
cc discoverer @e111077
I think we'd like to return multiple labeled measurements from a benchmark, so that we can see metrics like FCP, TTFI, idle time, GC time, memory, etc.
To view multiple measures we probably want a table that shows results relative to a chosen baseline, rather than the NxN matrix.
We often put tachometer results into a spreadsheet for sharing. A CSV output format would make this easier.
Once https://github.com/Polymer/koa-node-resolve is released, we should integrate it into tachometer, possibly with a flag that can disable it.
We should support a JSON configuration file as an alternative to using command-line flags.
We now have an end to end integration test running on Linux with Travis. We should also have one for Windows. Travis actually now has Windows support, but it is immature according to their own docs. I was not able to get it to launch Edge (not even sure if the images have Edge). We should probably use AppVeyor instead for now.
Similar to CPU throttling in Chrome DevTools
Hi,
I am currently focusing on performance testing on Google Chrome.
In order to improve build times, it would help me to move the browser driver dependencies to peer dependencies. There is not going to be a need to test performance on any other browser but Chrome for me.
It would be useful to be able to launch a browser on a remote machine. In particular, we would like to be able to launch Edge or IE on a remote Windows testing machine (on the same network), and point it at a tachometer server running on the local dev machine.
I'm using the node main
function to run the tests and the size in bytes keeps comings up as zero. Here is an example of what I'm running:
await main([
'$button:test-basic=test/benchmark/bench-runner.html?bench=test-basic&package=button',
'--measure=fcp',
'--browser=chrome@http://localhost:4444/wd/hub',
'--sample-size=5'
])
It results in the following:
┌─────────────┬───────────────────────────────┐
│ Benchmark │ button:test-basic │
├─────────────┼───────────────────────────────┤
│ Version │ <none> │
├─────────────┼───────────────────────────────┤
│ Browser │ chrome │
│ │ @http://localhost:4444/wd/hub │
├─────────────┼───────────────────────────────┤
│ Sample size │ 5 │
├─────────────┼───────────────────────────────┤
│ Bytes │ 0.00 KiB │
└─────────────┴───────────────────────────────┘
It is important to note that I'm dynamically loading the tests. Here is an example of what I'm doing in bench-runner.html
<html>
<head>
<link rel="shortcut icon" href="data:image/x-icon;," type="image/x-icon">
</head>
<body>
<script type="module">
const params = new URLSearchParams(window.location.search);
const pack = params.get('package');
const bench = params.get('bench');
import(`./${pack}/${bench}.js`);
</script>
</body>
</html>
This then loads a self-running test.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.