Coder Social home page Coder Social logo

tachometer's Issues

Tachometer integration test

The Travis build for the tachometer repo should include an end-to-end test that actually launches a browser.

macOS integration test

Travis now supports macOS, and I was able to get it to launch Safari, but it seemed flaky.

Ability to filter rows and columns

Since we display an NxN matrix of benchmark results, the size of the table can get too large to fit in a normal terminal window. Also, in cases where we want to draw attention to only a subset of the results (e.g. for CI integration we usually only care about the results that compare the current PR to master and released, but not the other way around)

Add the ability to filter rows and/or columns of the NxN matrix.

Allow user to pause/resume sampling

Right now If you want to stop sampling before the timeout, you can Ctrl-C, but that will kill the process without showing any results at all. And if you want to keep going after the results are in (e.g. because you want a tighter confidence interval), you have to start over entirely which means throwing away the samples you've already done.

Ideas:

  1. If the user sends SIGINT (Ctrl-C) at any time, then we should show the results we have so far, gracefully shut down, and exit.

  2. After the initial results come in, we should pause rather than exit, so that the user has the option to keep sampling.

  3. If the user presses [p]ause while sampling, we should show the results so far, and pause for more input.

  4. If the user presses [r]esume while paused, we should continue sampling (maybe for 60 seconds each time? maybe you can type how many minutes to go?)

  5. When you're running a non-headless browser, it might be nice to pause for a few seconds every minute or so, because if the browser is stealing focus, it can be hard/impossible to send any input to the runner terminal.

(6. This should probably be another issue for another time, but the entire UI could also be re-worked to fully control the terminal, and update a single table continuously using ANSI clear and cursor control sequences, so you'd just watch the intervals tighten up, and press [p] or quit when you want. My concern would be that it might make it tempting to mentally extrapolate a false convergence and draw a conclusion too early).

cc @justinfagnani @sorvell for comment

Add global reporting API

Currently the bench.js library is an ES module, and does a fetch call when bench.stop() is called, to send the timing data back to the server. Users can also send this response themselves if they know the right JSON format and address (though this isn't documented).

There are a few problems with this:

  • If you aren't using the built-in server, then you don't know what host/port to send your response to.
  • IE doesn't support ES modules or fetch.

So, we can add a new reporting API (or replace bench.js?), which is simply that the benchmark puts the timing data onto window somewhere (e.g. window.tachometer = 123). The runner would then just poll using WebDriver until it finds the data. We already find first contentful paint in a similar way, by polling the performance.getEntriesByName API.

Another option might be to have the user add a performance entry, but then the runner would need to know which entry to look for (so it would need to be an additional config parameter). It would also not be possible to put arbitrary numbers in there, since the performance API doesn't support that AFAIK. The global approach seems like the simplest solution that covers all the use-cases.

A follow on, related to #34, could be to support sending back key/value pairs (e.g. window.tachometer = {foo:123, bar:456}), so that multiple timing numbers could be reported from the same benchmark.

cc @sorvell @frankiefu

FCP gets measured even with no HTTP response

When when the browser never gets an HTTP response (e.g. a server that is down), we still get an FCP measurement, presumably from the built-in error page. We'll need to detect these cases somehow to make sure we aren't giving very misleading results.

Ability to control browser window size and placement

cc @sorvell

We should also have a smaller default window size, and also maybe try to tile out the different browsers, because being visible on the desktop can affect whether the browser considers itself in the foreground or not (which can affect timer throttling etc.)

Support for dropping outliers

One would think that the more samples we take, the more stable is the result. However, taking more samples also means there is a higher chance of getting an interference from the system (some daemon doing expensive work, flushing of caches, etc).

This feature request is about having a statistically rigorous way of dropping outliers before computing the confidence interval, so that one or two crazy measurements don't cause an "unsure" result, and adding more samples guarantees getting a more stable result.

This should be optional, not hard-coded, because outliers are not always independent from the page being tested (e.g. if a page has a 1% chance of hitting an expensive GC).

Support Internet Explorer

Since IE doesn't support modules, fetch, or first-contentful-paint, there are currently no measurements that could be done from IE even if we added launching support. So, this is blocked by #60 to add a new global-based API that IE benchmarks could use.

chrome window size e2e test is flaky in Travis

Maybe 50% of the time, the end-to-end test that checks whether we can control Chrome's window size fails in Travis:

  68 passing (1m)
  1 failing
  1) e2e
       chrome-headless
         window size:
      AssertionError: expected 50000 to equal 100000
      + expected - actual
      -50000

Support GitHub package versions with monorepos

NPM's github: protocol for specifying package versions only works when the top-level of the git repository maps to the NPM package (see npm/npm#2974 which is won't-fix). In the case of monorepos, such as the one for Material Web Components, packages are organized into sub-directories. This means there's no way to use tachometer's package-version feature for benchmarking commits of repos like this.

The simplest solution is to write a bash script that makes a few clones of the repo, and then sets the package version to the path of the local clone (plus the package directory). We should at least document this pattern if that's the answer.

We could also think about building some kind of support into tachometer. It would basically do the same thing as above, with temp directories. Not convinced yet it's worth the complexity, maybe we should start with documenting the manual pattern.

cc @e111077

Simpler top-level JSON configuration style for browser and measurement

It's not obvious how to use the generic "expand" feature in the JSON config file. People expect to just put the browser/measurement type at the top-level of the config file. We should support this, and maybe just remove the "expand" feature in the interest of simplicity. It's always possible to explicitly enumerate any more complex sets of benchmarks you might want.

package version install dirs should cache less aggressively

Currently, when you use the package version feature, a deterministic temporary directory is used per-label, which is always re-used. We should definitely invalidate this cache if the package.json dependencies changes. Possibly we should always re-install (and we could then auto-delete the temp directory), but need to think about whether that's required for real scenarios (since installing can be slow).

cc @bicknellr

EPIPE and ECONNRESET errors

Errors like this get logged to the terminal sometimes:

Error: write EPIPE
      at WriteWrap.afterWrite (net.js:836:14)

Error: read ECONNRESET
      at TCP.onread (net.js:660:25)

It doesn't crash the runner, and doesn't seem to affect the result, just logs. The reported case was a large site, with many files, some of which were large.

Use case: Polymer vs LitElement based components benchmarks

Hi, I have a question about using this tool for our purpose at Vaadin.

We would like to setup a sample app similar to shack for using in benchmarks test.
But what we need is not to test against older commit, but agains the older version.

So we'd like to test LitElement-based app agains Polymer-based app.
Both apps would be implemented using exactly the same components.

Is this doable with tachometer at all (using another app as baseline)?

@aomarks I would appreciate any advice on this.

browser shows as [object Object] in result table

Since refactoring the internal representation of a browser from a string to an object, I forgot to update the result table output to show the name instead of trying to just print the object, which displays as [object Object].

Crash on reporting results

$ tach --config tach.json 
Running benchmarks

[==========================================================] done
TypeError: Reduce of empty array with no initial value
    at Array.reduce (<anonymous>)
    at sumOf (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:142:15)
    at Object.summaryStats (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:51:15)
    at /Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:635:44
    at Array.map (<anonymous>)
    at makeResults (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:633:31)
    at automaticMode (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:673:27)
    at processTicksAndRejections (internal/process/task_queues.js:89:5)
    at Object.main (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:390:14)

Config:

{
  "$schema": "https://raw.githubusercontent.com/Polymer/tachometer/master/config.schema.json",
  "resolveBareModules": false,
  "sampleSize": 10,
  "timeout": 0,
  "benchmarks": [
    {
      "browser": "chrome-headless",
      "measurement": "global",
      "expand": [
        {
          "name": "v1 ytcp-video-row",
          "url": "http://localhost:8001/?tag=ytcp-video-row"
        },
        {
          "name": "v2 ytcp-video-row",
          "url": "http://localhost:8002/?tag=ytcp-video-row"
        },
        {
          "name": "v1 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8001/?tag=ytcp-video-metadata-editor-section"
        },
        {
          "name": "v2 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8002/?tag=ytcp-video-metadata-editor-section"
        }
      ]
    }
  ]
}

Cache and warmup

Now that we do bare module rewriting, we're spending time parsing every HTML and JS request. This could distort first-contentful-paint measurements, and increases time waiting for benchmark results in all cases.

The best solution might be to add an in-memory cache to the Koa server for all requests, and to always do a single warmup run for each benchmark configuration before starting measurements.

Default measurement should always be global

Currently the default measurement is bench.start/stop when using the built-in server, and first-contentful-paint when using an external URL.

FCP is a particularly bad default, because if you implement either of the other two measurement styles in your code, FCP will still work and return some number, and it's easy to think you are measuring what you implemented instead of FCP.

Let's make the window.tachometerResult global the default in all cases, since it's the most universal (works in all browsers and with both internal/external URLs).

Also, the CLI flag is called "measure", but the JSON config file property is "measurement". These should be the same.

Support Windows and Edge

We should be able to launch tachometer from Windows, and control Edge.

Note IE is tracked separately at #61 because it has an additional blocker.

Support a custom global measurement expression

Currently the result for a global measurement is read from window.tachometerResult.

It would be convenient to be able to specify an custom, arbitrary expression to poll for to retrieve the result, which opens up flexibility for measuring pages in production without modifying the code.

HTML files sometimes truncated

It appears that HTML files are sometimes truncated when served from the built-in static server. Could be related to koa-node-resolve. Needs more investigation.

--resolve-bare-modules flag can turn off bare module resolution

If you set the flag --resolve-bare-modules, then the flag will be set to empty string, which be falsey (actually we check === true anyway) and turn off bare module resolution. No value should mean on, and anything other than no value, "true", or "false" should be an error.

cc discoverer @e111077

Support multiple measurements per benchmark

I think we'd like to return multiple labeled measurements from a benchmark, so that we can see metrics like FCP, TTFI, idle time, GC time, memory, etc.

To view multiple measures we probably want a table that shows results relative to a chosen baseline, rather than the NxN matrix.

CSV output

We often put tachometer results into a spreadsheet for sharing. A CSV output format would make this easier.

Support JSON configuration file

We should support a JSON configuration file as an alternative to using command-line flags.

  • Lets you check in and share benchmark suite configurations.
  • Will allow more complex and readable configurations, e.g. where it is unclear when a flag like --package-version will do a cross product, just apply to one benchmark, etc.

Windows integration test

We now have an end to end integration test running on Linux with Travis. We should also have one for Windows. Travis actually now has Windows support, but it is immature according to their own docs. I was not able to get it to launch Edge (not even sure if the images have Edge). We should probably use AppVeyor instead for now.

Move browser drivers to peer dependencies

Hi,

I am currently focusing on performance testing on Google Chrome.

In order to improve build times, it would help me to move the browser driver dependencies to peer dependencies. There is not going to be a need to test performance on any other browser but Chrome for me.

Remote benchmarking

It would be useful to be able to launch a browser on a remote machine. In particular, we would like to be able to launch Edge or IE on a remote Windows testing machine (on the same network), and point it at a tachometer server running on the local dev machine.

Bytes keep coming up as 0.00 KiB on remote server

I'm using the node main function to run the tests and the size in bytes keeps comings up as zero. Here is an example of what I'm running:

await main([
  '$button:test-basic=test/benchmark/bench-runner.html?bench=test-basic&package=button',
  '--measure=fcp',
  '--browser=chrome@http://localhost:4444/wd/hub',
  '--sample-size=5'
])

It results in the following:

┌─────────────┬───────────────────────────────┐
│   Benchmark │ button:test-basic             │
├─────────────┼───────────────────────────────┤
│     Version │ <none>                        │
├─────────────┼───────────────────────────────┤
│     Browser │ chrome                        │
│             │ @http://localhost:4444/wd/hub │
├─────────────┼───────────────────────────────┤
│ Sample size │ 5                             │
├─────────────┼───────────────────────────────┤
│       Bytes │ 0.00 KiB                      │
└─────────────┴───────────────────────────────┘

It is important to note that I'm dynamically loading the tests. Here is an example of what I'm doing in bench-runner.html

<html>
<head>
  <link rel="shortcut icon" href="data:image/x-icon;," type="image/x-icon">
</head>
<body>
  <script type="module">
    const params = new URLSearchParams(window.location.search);
    const pack = params.get('package');
    const bench = params.get('bench');
    import(`./${pack}/${bench}.js`);
  </script>
</body>
</html>

This then loads a self-running test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.