Coder Social home page Coder Social logo

hyperfine's Issues

hyperfine consistently measures short outlier that I cannot reproduce in the wild

I am trying to measure an incremental compilation benchmark (to reproduce something coming up on perf.rust-lang.org). Hyperfine consistently reports "Range (min … max): 11.0 ms … 765.1 ms" and "Time (mean ± σ): 590.1 ms ± 304.8 ms", which is effectively a useless result (it doesn't print a warning though, but still). However, when I time the same command myself (cargo prints the time and I also tried time), I never see anything below 700ms.

So my suspicion is that this 11ms result hyperfine is seeing is somehow caused by not calling the benchmark correctly, but I do not know how that could happen either.

To reproduce, clone https://github.com/rust-lang-nursery/rustc-perf/ and go to collector/benchmarks/coercions. Do an initial cargo +nightly build to fill the incremental cache. Now run touch src/main.rs && cargo +nightly build many times; for me it is pretty stable between 730ms and 770ms.

Now run

hyperfine -w 2 -p "touch src/main.rs" "cargo +nightly build"

This shows a range from 10ms to 790ms. Something is clearly odd -- but it's not -p, because

hyperfine -w 2 "touch src/main.rs && cargo +nightly build"

has all the same problems.

Allow override of shell

hyperfine can't be used to benchmark shell-specific shell functions without launching a new instance of a shell within the shell itself, breaking some of the underlying assumptions. As most shells support the -c argument, it would useful if it were possible to pass in a --shell SHELL option to override hyperfine's default.

e.g. I'm using hyperfine to reassess some assumptions made in the development of fish shell, and would like to be able to benchmark one version of a shell builtin against another, or benchmark the time a completion script takes to execute (which uses fish-specific language so would return an error under sh).

This would be a straightforward replacement of sh with whatever the user provided, but some might even find it useful to evaluate the performance of command1 executed under shell foo and command2 executed under shell bar (and not losing the benefit of the startup timing analysis that hyperfine provides).

Unresolved import `libc::rusage`

λ cargo install hyperfine
    Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading hyperfine v0.4.0
  Installing hyperfine v0.4.0
 Downloading colored v1.6.0
 Downloading [...]
 Downloading rustc-serialize v0.3.24
   Compiling strsim v0.6.0
   Compiling [...]
   Compiling hyperfine v0.4.0
error[E0432]: unresolved import `libc::getrusage`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:12
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |            ^^^^^^^^^ no `getrusage` in the root

error[E0432]: unresolved import `libc::rusage`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:23
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |                       ^^^^^^ no `rusage` in the root

error[E0432]: unresolved import `libc::RUSAGE_CHILDREN`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:31
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |                               ^^^^^^^^^^^^^^^ no `RUSAGE_CHILDREN` in the root

error: aborting due to 3 previous errors

error: failed to compile `hyperfine v0.4.0`, intermediate artifacts can be found at `C:\Users\dkter\AppData\Local\Temp\cargo-install.2sl6wvAhi3wL`

Caused by:
  Could not compile `hyperfine`.

To learn more, run the command again with --verbose.

Running with --verbose gives me this information:

   Compiling hyperfine v0.4.0
     Running `rustc --crate-name hyperfine .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\main.rs --crate-type bin --emit=dep-info,link -C opt-level=3 -C metadata=98f67aef2c923775 -C extra-filename=-98f67aef2c923775 --out-dir C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps -L dependency=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps --extern indicatif=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libindicatif-59764fc82c6811ce.rlib --extern libc=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\liblibc-f66ba3832bd58510.rlib --extern statistical=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libstatistical-50c68fb634eb9a96.rlib --extern colored=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libcolored-9098ef94b466db7a.rlib --extern clap=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libclap-04c95c98d9faa158.rlib --extern atty=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libatty-a371b164006500d6.rlib --cap-lints allow`
error[E0432]: unresolved import `libc::getrusage`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:12
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |            ^^^^^^^^^ no `getrusage` in the root

error[E0432]: unresolved import `libc::rusage`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:23
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |                       ^^^^^^ no `rusage` in the root

error[E0432]: unresolved import `libc::RUSAGE_CHILDREN`
 --> .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\hyperfine\cputime.rs:1:31
  |
1 | use libc::{getrusage, rusage, RUSAGE_CHILDREN};
  |                               ^^^^^^^^^^^^^^^ no `RUSAGE_CHILDREN` in the root

error: aborting due to 3 previous errors

error: failed to compile `hyperfine v0.4.0`, intermediate artifacts can be found at `C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi`

Caused by:
  Could not compile `hyperfine`.

Caused by:
  process didn't exit successfully: `rustc --crate-name hyperfine .cargo\registry\src\github.com-1ecc6299db9ec823\hyperfine-0.4.0\src\main.rs --crate-type bin --emit=dep-info,link -C opt-level=3 -C metadata=98f67aef2c923775 -C extra-filename=-98f67aef2c923775 --out-dir C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps -L dependency=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps --extern indicatif=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libindicatif-59764fc82c6811ce.rlib --extern libc=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\liblibc-f66ba3832bd58510.rlib --extern statistical=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libstatistical-50c68fb634eb9a96.rlib --extern colored=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libcolored-9098ef94b466db7a.rlib --extern clap=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libclap-04c95c98d9faa158.rlib --extern atty=C:\Users\dkter\AppData\Local\Temp\cargo-install.tjcNwOyjKYFi\release\deps\libatty-a371b164006500d6.rlib --cap-lints allow` (exit code: 101)

Rust version is 1.20.0 and I'm on Windows 10.

Parameters don't get expanded in preparation commands

It seems like parameters are only substituted for the actual command, not the preparation command:

$ hyperfine -P X 1 10 -p 'python -c "print 2**{X}"' --show-output -- ls
Benchmark #1: ls

Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'X' is not defined
Error: The preparation command terminated with a non-zero exit code. Append ' || true' to the command     if you are sure that this can be ignored.

Sequential vs Concurrent execution

I looked over the documentation, but, unless I'm missing something, I don't see an option for running commands sequentially versus concurrently (ie for load testing). If there is no option for specifying this, I'm assuming that commands run sequentially, is this correct?

Thanks.

Add possibility to ignore setup phase

I have a script that generates some cryptographic keys in the initial phase and only then starts the actual benchmark. The more keys I generate, the slower that initial phase will obviously be, but I only want to benchmark the second phase. Could it be possible to ignore the time spent until a specific string is printed to STDOUT?

Provide finer-grained statistics

It'd be nice to be able to show other statistics such as the 95th percentile or median runtime. HdrHistogram can record this with relatively little overhead, and there's a pretty good official Rust implementation here (I'm one of the maintainers).

Output number of runs

It would be nice, by default, to know how many runs hyperfine did for a benchmark.

Detect if first timing-run was significantly slower

Automatically detect if the first timing-run was significantly slower than the remaining runs and suggest usage of --warmup.

Possible implementation:

  • Let [t_1, t_2, ..., t_n] be the benchmarking results.
  • Let t_mean and t_stddev be the mean and standard-deviation for [t_2, ..., t_n]
  • Show the warning if t_1 > t_mean + 5 * t_stddev (for example)

Parametrized benchmarks

It could be interesting to run benchmarks where a (numerical) parameter is systematically changed, for example:

> hyperfine 'make -j{}' --parameter-range 1..8

(modulo syntax)

Add option to choose the units used for CLI & results output

(a follow up to #71)

Currently the CLI automatically selects the units (seconds, milliseconds) based on the size of the mean value, but the results export is always in seconds.

Choosing the units will force for the CLI and results export units to always match, and allow users to specify the units they are most familiar with/integrate with their reporting systems.

The option could be -u --units <Seconds|Milliseconds>, and maybe extended to also include minutes (and hours)?


The units option value would be passed through to both format::format_duration_units for the CLI and ExportManager::write_results for the results export.

Hyperfine only runs 1 benchmark, irrespective of --min-runs

After 5GB and 2 hours' worth of dependency-chasing, I finally managed to install hyperfine via cargo. After then manually adding the cargo install folder to the Windows PATH environment variable, I'm finally able to run hyperfine in Cygwin.

However, when I do the following as a quick preliminary test:

hyperfine -m 10 --show-output --export-csv hyperfinetest.csv 'sleep 1'

I get the following output:

Benchmark #1: sleep 1

Time (mean ± σ): 1.036 s ± 0.006 s [User: 4.1 ms, System: 17.4 ms]

Range (min … max): 1.021 s … 1.040 s

I presume this means that only one benchmark was run, and checking hyperfinetest.csv seems to confirm this. What's going on here?

Option to interleave benchmarks for multiple commands

CyberShadow on HN:

This would be useful when comparing two similar commands, as interleaving them makes it less likely that e.g. a load spike will unfavorably affect only one of them, or due to e.g. thermal throttling negatively affecting the last command.

Option to output the percentiles

It would be useful to output the time percentiles of the run. For instance with a flag like --percentiles '50, 90, 95, 99, 99.9'.

Proper Windows support

  • Properly spawn shell commands (this doesn't seem to work at all at the moment)
  • Detect when a process fails with exit code != 0 (I'm guessing that cmd.exe hides the real exit code from us)
  • Compute user/system time (or hide the message)
  • Colors and even progress bars actually work fine in PowerShell. We should not set --style basic by default.

Cygwin package?

Is there any chance of porting the utility as a Cygwin package? I was really looking forward to using the program to benchmark a few commands on Cygwin running on top of Windows, but it's unfortunately not yet in the Cygwin package repository, which means it would require installing a whole bunch of other, heavy dependencies to install the usual way rather than what would usually be a much simpler apt-cyg install hyperfine.

Thanks in advance.

Compliant time units in CLI report and results export

When passing --export-markdown the resulted Markdown file includes information about Mean and Min..Max measured in ms even when the original result report printed to stdout is presented in seconds. I think it might be better if the produced report would use the same units since it's hard to read very long ms numbers.

Instructions for the behavior reproduction:

hyperfine 'sleep 5' --export-markdown results.md

Command line output:

kbobyrev@kbobyrev ~/d/m/profile> hyperfine 'sleep 5' --export-markdown results.md
Benchmark #1: sleep 5
  Time (mean ± σ):      5.002 s ±  0.000 s    [User: 1.1 ms, System: 2.0 ms]
  Range (min … max):    5.002 s …  5.003 s

results.md:

Command Mean [ms] Min…Max [ms]
sleep 5 5002.3 ± 0.2 5002.1…5002.7

Export to Markdown

Add a --export-markdown option in analogy to #41 for CSV and #42 for JSON.

The output could look like this:

| Benchmark | Mean [ms]   | Min. [ms] | Max. [ms] |
|-----------|-------------|-----------|-----------|
| command 1 | 205.1 ± 1.5 | 201.1     | 207.6     |
| command 2 | 403.5 ± 2.4 | 400.3     | 407.4     |

Rendered:

Command Mean [ms] Min. [ms] Max. [ms]
command 1 205.1 ± 1.5 201.1 207.6
command 2 403.5 ± 2.4 400.3 407.4

Add export options

It would be great if we could export benchmark results in different formats:

  • Markdown for easy integration into README files
  • JSON for further processing
  • CSV for simple plotting (gnuplot) -- header + one line per benchmark with columns mean, stddev, min, max

Each of those could be new command line options (hyperfine --export-csv my-benchmark.csv).

Show summary / comparison

If there are multiple benchmarks, show a short summary of the results like command1 is a factor of 1.5 times faster than command2.

Increase unit test coverage

I was running some coverage using cargo-kcov and noted that currently coverage on the application is somewhat low. It may be good to identify points that should be placed under test and ensure that they are. For instance, with more options being added, extracting the processing of Clap's matches to a testable function may be prudent to ensure future changes do not alter the current correct state.

screenshot from 2018-03-24 15-18-47

Compact output format, finer control over the output

I am running hyperfine with many different commands at once. In order to not only see the factor between the different commands at the end but also (some) of the means I suggest one, some or all of these changes to the output (switchable by a command line option):

  • Do not print empty lines between mean and min-max and the headlines and so on. I can currently get quite close to what I want when I run hyperfine ... | sed '/^ *$/d'.
  • The above mentioned hyperfine ... | sed '/^ *$/d' is without color and when I run hyperfine -s full ... | sed '/^ *$/d' instead I also get the interactive graph which doesn't play nicely with the pipe and sometimes messes up a line. So why not add a color style to -s which has color but is not interactive?
  • All of these could be an additional value for -s and -s could be a comma separated list so that things like this would make sense: hyperfine -s compact,color, hyperfine -s interactive,compact,nocolor, hyperfine -s full,compact, hyperfne -s basic,compact...

Export to JSON

This covers exporting results to JSON as outlined in #38 and extends on the work done on #41.

The same format of --export-xxx should be used (in this case --export-json) with a filename parameter following. It should exist peacefully with simultaneous usage of --export-csv and multiple instances of both.

Support for benchmarking memory usage, etc.

This is definitely not in scope is hyperfine's goal is to be just a replacement for time, but the thing I most wish for when using time to benchmark programs is if it could measure other relevant program execution statistics, like memory usage. After all, optimization (which is what benchmarking is driving) is always about tradeoffs in some way. But it's kinda hard to know if you're making the right tradeoffs for your project if you're measuring only one thing (here, execution speed).

Would something measuring memory usage, etc. that be in scope for hyperfine a a benchmarking tool?

Checklist for hyperfine 1.0

A few things that I would like to do before releasing version 1.0:

  • Merge #53
  • Merge #51
  • Add a chapter about the export options in the "Usage" section in the README. Possibly show a Markdown table example. Possibly refer to the plot_benchmark_results.py script.
  • Add a chapter about parameterized benchmarks.

Usual release checklist

  • Update README (features, usage, ..).
  • Optional: update dependencies with cargo update.
  • rustup default nightly and cargo clippy
  • cargo fmt
  • cargo test
  • cargo install -f
  • Update version in Cargo.toml. Run cargo build to update Cargo.lock
  • cargo publish --dry-run --allow-dirty.
  • check if Travis & AppVeyor succeed
  • git tag vX.Y.Z; git push --tags
  • write GitHub release notes
  • check binaries (that were uploaded via Travis/AppVeyor)
  • publish to crates.io by cloning a fresh repo and calling cargo publish.
  • Flag hyperfine package as "out of date": https://aur.archlinux.org/packages/hyperfine/
  • Optional: updated demo video

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.