kubernetes / perf-tests Goto Github PK

Performance tests and benchmarks

License: Apache License 2.0

Makefile 1.11% Shell 2.24% Go 87.59% Python 7.22% CSS 0.08% HTML 0.27% JavaScript 0.86% Dockerfile 0.63%

perf-tests's Introduction

Kubernetes perf-tests

This repo is dedicated for storing various Kubernetes-related performance test related tools. If you want to add your own load-test, benchmark, framework or other tool please contact with one of the Owners.

Because in general tools are independent and have their own ways of being configured or run, each subdirectory needs a separate README.md file with the description of its contents.

Repository setup

To run all verify* scripts before pushing to remote branch (useful for catching problems quickly) execute:

cp _hook/pre-push .git/hooks/pre-push

perf-tests's People

Contributors

Stargazers

Watchers

Forkers

girishkalele gmarek timothysc bowei sjug m1093782566 orangefuzzball rflorenc shyamjvs alex8866 nielsole wojtek-t ravisantoshgudimetla ixdy alok87 tim-wilcoxson asifdxtreme fisherxu andrey01 crimsonfaith91 cargill johscheuer mrnonz finanzcheck bobbytables chen-andrew dgrans naterockwell spiffxp kozl yuntongjin mozhuli song-jiang thockin-tmp schylek gokulchandrap kevinjqiu dbenhur hassanshabbirahmed normanjoyner soltysh nibrown migueleliasweb listx lexfork rajansandeep chrisohaver sap-oc krzysied adamdang yastij mirake mattkelly kevin-fqh joshuakwan poothia warmchang leoghcode subratasen pivotlogix riverzhang hanhouchao orsoly prameshj mrueg rychne agilebot1 marcuslindfeldt paulff bathris joewrightss ajaycsse zhenyu-aws-lab mborsz namingwaysway shaokaiyang mm4tt jeremylin06 lospringliu not-a-dev0 betula-l rcbops rajadeepan amritlamsal123 tonytonyy joelsmith poojagidaveer oxddr wenjiaswe srivatsav123 val1715 jpbetz huang-wei binnyrs yonitapingo liggitt rjaini opendrm roycaihw hectorj2f

perf-tests's Issues

DNS perf crashing on simple test

Hi, I'm currently trying to benchmark our internal kubernetes dns with the perf tool and play with the dnsmasq settings. A simple run crashes instantaneously and I'm not sure if it's me that misread the docutmentation or if it's a bug. Here is a sample of the output:

❯ ./run --params params/default.yaml --use-cluster-dns --out-dir out
INFO 04-07 12:39:49 runner.py:60] Using cluster DNS for tests
INFO 04-07 12:39:49 runner.py:64] DNS service IP is 10.0.0.10
INFO 04-07 12:39:50 runner.py:217] Client node is ip-10-2-11-248.eu-central-1.compute.internal
INFO 04-07 12:39:50 runner.py:331] Created rundir out/run-1491561590
INFO 04-07 12:39:50 runner.py:337] Updated symlink out/latest
INFO 04-07 12:39:50 runner.py:277] Starting client teardown
INFO 04-07 12:39:50 runner.py:287] Client teardown complete
INFO 04-07 12:39:50 runner.py:140] Create Pod/kube-dns-perf-client ok
INFO 04-07 12:39:51 runner.py:254] Client pod to started on ip-10-2-11-248.eu-central-1.compute.internal
INFO 04-07 12:39:53 runner.py:262] Client pod ready for execution
INFO 04-07 12:39:53 runner.py:266] Copying query files to client
INFO 04-07 12:39:53 runner.py:233] Starting server teardown
INFO 04-07 12:39:54 runner.py:300] Waiting for server to be deleted (0 pods active)
INFO 04-07 12:39:54 runner.py:240] Server teardown ok
INFO 04-07 12:39:54 runner.py:277] Starting client teardown
INFO 04-07 12:39:55 runner.py:285] Waiting for client pod to terminate
INFO 04-07 12:39:56 runner.py:285] Waiting for client pod to terminate
INFO 04-07 12:39:58 runner.py:285] Waiting for client pod to terminate
INFO 04-07 12:39:59 runner.py:285] Waiting for client pod to terminate
INFO 04-07 12:39:59 runner.py:287] Client teardown complete
Traceback (most recent call last):
  File "py/run_perf.py", line 87, in <module>
    sys.exit(runner.go())
  File "/Users/jonas/Github/perf-tests/dns/py/runner.py", line 79, in go
    self._reset_client()
  File "/Users/jonas/Github/perf-tests/dns/py/runner.py", line 263, in _reset_client
    self._copy_query_files()
  File "/Users/jonas/Github/perf-tests/dns/py/runner.py", line 268, in _copy_query_files
    ['/bin/tar', '-czf', '-', self.args.query_dir])
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 212, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 390, in __init__
    errread, errwrite)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1024, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

I see the kube-dns-perf-client pod spinning up but it shuts down after 2 seconds.

dns perf test throws error

I'm running DNS perf test against existing kubedns installation. Client pod was created successfully.

DEBUG 02-15 11:03:32 runner.py:121] kubectl ['/Users/zyu/bin/kbe', 'exec', 'kube-dns-perf-client', '--', '/dnsperf', '-s', '10.0.0.10', '-l', '60', '-Q', '500', '-d', '/queries/nx-domain.txt']
DEBUG 02-15 11:03:32 runner.py:128] kubectl ret=127
DEBUG 02-15 11:03:32 runner.py:129] kubectl stdout
out |
DEBUG 02-15 11:03:32 runner.py:130] kubectl stderr
err | Error loading shared library libcrypto.so.1.0.0: No such file or directory (needed by /dnsperf)
err |
INFO 02-15 11:03:32 runner.py:102] Exception caught during run, cleaning up

Also, path to tar binary is hardcoded to /bin/tar. However on Mac it is /usr/bin/tar. Maybe it is a good idea to add some flexibility there.

Setup mungers for this repo

We need a couple of mungers running in this repo:

CLA bot
build/gofmt/golint bot
unit test bot (we'll probably have some unit tests for frameworks here)

cc @kubernetes/sig-scalability @apelisse

Update compare tool to read metrics from JSON files

With recent changes in our CI test framework, we now log all metrics data into individual JSON artifacts.
In PR #57 we updated benchmark tool to read metrics from JSONs, and it has eliminated almost the entire vendor code (-63 MB in size).
We should do the same for compare tool as well.

Make perf-tests small again!

cc @wojtek-t @gmarek

Update DNS perf tests to use RBAC

1.6 enables RBAC by default which causes non-kube-system kube-dns instances to no longer work. The test needs to create RBAC authorization for the test.

ClusterLoader: Profiles should gather periodically, instead of "on-demand"

Currently profile measurement allows to gather cpu/memory profile on-demand. There should be also an option to gather profiles periodically, for example every 10mins.

Netperf. Gather CPU usage for each job.

When running network performance tests one bottleneck for bandwidth can be CPU. Therefor it would be helpful to gather information of CPU usage for each job to identify related issues.

ClusterLoader: HA cluster support

Currently clusterloader supports only single-master cluster. This should be changed - clusterloader should handle multiple masters.

Plot scheduler throughput and latencies on perf-dash

With some recent changes, we've now started to capture scheduler metrics (e.g) in our scalability tests. We need to plot those values over time.

cc @kubernetes/sig-scalability-misc @kubernetes/sig-scheduling-misc

/assign @krzysied
Krzysiek - Would you be able to take this up?

CluasterLoader: Revert density test changes

With #309 density test was changed to compare pod startup time of 500 pods at once vs 5x100 pods iteration. This change should be reverted after 11/26/2018.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

CLusterLoader: DNS is using more memory than it used to

After changing tests to clusterloader's, coreDNS (and also kubeDNS) is using more memory than it used to.

This issue causes cordns on 5k nodes to OOM.

ClusterLoader: PodStartupLatency is flaky

In cluster loader density test pod startup latency (99 perc.) varies from 4s to 8s. The same metric from original tests always has 99 perc = ~4s.
This problem should be fix, so cluster loader PodStartupLatency has similar results to original test ones.

ClusterLoader: Combine provider specific code in a single place

There are multiple measurements that introduce provider specific code, e.g.:
https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/measurement/common/etcd_metrics.go#L95
https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/measurement/common/scheduler_latency.go#L178

Those should be extracted from measurements to a separate single location.

@shyamjvs

Basic needs for this repo to be useful

At least:

Which sig owns this repository
An architecture discussion
An agreed-upon automation framework
Modularized to support also running on distributions of Kubernetes
Modularized so there is no dependency on any cloud/infra
Some understanding of how the merge queue works...don't want delays.
Discussion around what tests to move from e2e
CI of the tests within the repo
Policy of unit tests for the tests themselves
CI of running the tests themselves
Does this repository depend on kubernetes e2e tests
- Can this repository import useful framework helpers/functions from e2e

Netperf overwrites results when running multiple iterations.

When running multiple iterations for the netperf test the results of each iteration get overwritten by the next one. Netperf only writes to one csv-file without respecting existing data in it.

ClusterLoader: Using shared podStrore

Creating multiple podStores at the same time result in some podStores not being created. Measurements should use shared podStore instead of creating multiple podStores to avoid this problem.

Network test: cannot resolve host ftp.netperf.org

Hello, in Dockerfile of network benchmark there is a link to download tar.gz with netperf code, this address seems not to exists anymore.

$ host ftp.netperf.org
Host ftp.netperf.org not found: 3(NXDOMAIN)

Dockerfile should be updated with new location.

Perf-dash graphs for test-phase times blank

Taking a look at perf-dash, I'm seeing that we're not generating any graphs for TestPhaseTimer metric even though we have the data available as JSON files in the job's artifacts.

For e.g consider graph gce-100Nodes-master-DensityTestPhaseTimer which is empty. But from a run of the job, I find the data available:

/assign @krzysied
Krzysiek - Could you PTAL into this when you find time? This metric is useful sometimes to spot performance regressions.

cc @kubernetes/sig-scalability-bugs @wojtek-t

Travis is unhappy

Seems like Travis is failing always due to go get ./... and go test ./.. commands run against the repo.
This is because we use glide as our dep manager instead of godep in clusterloader which works differently.
We need to fix this by doing one of the following:

Use glide's get and test equivalents for clusterloader and go for all the other directories
Move clusterloader to godep, so we have uniformity across the whole repo

cc @kubernetes/sig-scalability-misc @gmarek @spxtr

Distribute pods for network-tests based on nodelabels

For our tests cases we want to distribute the pods based on nodelabels e.g. failure-domain=rack. This allows us to test rack specific network performance and over rack network performance (or other combinations).

There are two possible ways to achieve this:

Use the PodAntiAffinity of Kubernetes
Implement a simple NodelabelSelector routine (this keeps the programs logic)

Approach 2 should be easier to integrate in the current code.

Clusterloader: Document purpose of load, density, chaosmonkey tests

It would be awesome to have a README for each of those tests in that explain what their test/benchmarking goal is

netperf Makefile builds wrong binaries under mac

Problem:

docker run girishkalele/netperf-latest
standard_init_linux.go:190: exec user process caused "exec format error"

To make this working nptest must be crosscompiled for linux and launch.go must be compiled for Mac.

ClusterLoader：need to suport get etcd mtrics by cert file

I have a v1.11.2 k8s cluster with an etcd cluster which enables auth by cert file.
When i run clusterloader2, EtcdMetrics measurement failed to get etcd metrics by http:

perf-tests/clusterloader2/pkg/measurement/common/etcd_metrics.go

Line 100 in f7ae863

cmd := "curl http://localhost:2379/metrics"

func getEtcdMetrics(provider, host string) ([]*model.Sample, error) {

	...
       cmd := "curl http://localhost:2379/metrics"
	...

}

so, i think maybe we need to suport get etcd mtrics by cert file.

ClusterLoading CI test failing

PR #55 broke it:

Error parsing config, open content/pod-pause.json: no such file or directory

cc @sjug @wojtek-t

Clusterloader2 is not compatible with go v1.11

I'm trying to upgrade go version to 1.11, but the Travis CI fails. The merge of pull request #229 created this problem.

The error output of ./verify/verify-gofmt.sh

!!! 'gofmt -s' needs to be run on the following files: 
./clusterloader2/pkg/measurement/common/resource_usage.go
./clusterloader2/pkg/measurement/util/gatherers/resource_gather_worker.go
The command "./verify/verify-gofmt.sh" exited with 1

These two files have problem with indentation.

resource_usage.go in line 83 and 84
resource_gather_worker.go in line 56

Move performance-related e2e tests to this repo

Is is currently possible to have e2e tests in a separate repo, or do they need to sit in main kubernetes one? @kubernetes/test-infra-maintainers

cc @timothysc @wojtek-t

ClusterLoader: Move ClusterConfig global to MeasurementManager structure

ClusterConfig is currently a global variable in pkg/measurement package. This variable should be a MeasurementManager field.

Need "key metrics" for Perf dashboard

Once #287 is resolved, the other thing we need for the Perf dashboard to be useful to the Release Team and the broad population of contributors would be a few "key metrics" that are things to check immediately after a merge for a snapshot of performance effects. These would consist of a test-metric-detail combination.

Currently, there are around 900 different combinations that can produce a graph. It's just not possible for contributors who aren't full time on scalability to know which ones to look at.

Take perf-dash to v1.0

We've made several improvements to perf-dash in the past few months (thanks a lot @krzysied for the work!). I think it's time to make some final improvements and cut the first major release (1.0). Mainly, let's make using it easier (even for non-scalability folks - for e.g release team has been asking for it). Some work-items that come to my mind:

Split the graphs based on job-name first (i.e gce-100, kubemark-500, etc) and then metric (i.e ApiserverResponsiveness, SchedulerMetrics, etc) so that navigation is easier
Add units to the axes in the graphs (otherwise the graphs aren't understandable)
Load individual graphs on demand rather than pre-loading everything at start (so we don't wait for tens of seconds before we can see anything)
sth else?

/assign @krzysied
cc @kubernetes/sig-scalability-misc @wojtek-t

ClusterLoader: Investigate kube-controller-manager cpu usage

Kube-controller-manager uses more cpu than it should. Limit is 0.8 core, however it happens that it uses ~0.9.

When this is issue is solved, clusterloader density test kube-controller-manager constraint should be updated or at least comment explaining the reason should be added.

Expose more information in clusterloader2 logs

There are a couple things that we definitely need:

more state about pods from a given controlling object (number of pending, waiting, checking if something was deleted, etc.). Mostly copying this logic:
https://github.com/kubernetes/kubernetes/blob/master/test/utils/runners.go#L803
pod-startup-time latency should output thing that is somewhat similar to what we currently do (for debugging purposes)
show more clearly where a given test finished:

W1112 13:18:48.029] I1112 13:18:48.029322    9960 clusterloader.go:127] Test testing/density/config.yaml ran successfully!"

is not very visible in those logs

We are currently printing about the information about nodes that is extremely helpful for debugging (this is currently part of density). It would be useful to add that too (it should probably be part of initialization of cluster loader)
You need to audit logs in measurements - a bunch of glog`s should actually be real failures and fail the test at the end (though not immediately). I can imagine this as something like: kubernetes/kubernetes#66239 (comment), but also as a special measurement that inside is collecting errors (and gloging them when they happen) and at the end fails if any logs were reported (should be simpler than a separate flakes.txt file).

I guess there may be more, but let's start with those.

/assign @krzysied

Outline incubation of a synthetic API component generator.

Scheduler_perf as well as this repo use generic, opionated template based approaches to making pods.

In scheduler_perf, we've come up w/ an initial struct idea to that can be mutated in a pipeline.
In this repo, templates are used and modified/spread around the cluster.

A software library that generates arbitrary cluster setups w/ pods and nodes , maybe other stuff (services), that is fully configurable might be a cool project we could commonly utilize.

Thoughts @gmarek @sjug @jeremyeder @ravisantoshgudimetla

Network README needs Usage

There's not a clear usage section in the Network perf test. Unless you know how to write Go, it's pretty hard to understand.

ClusterLoader: adding overrides to test configs

Cluster loader test configs should support overrides:

There should be a way to pass value to variable through the test call.
If value is not provided, default (specified in config) should be used.

ClusterLoader: ResourceUsageSummary should support constraints

ResourceUsageSummary should take resource constraints as a parameters.
Handling for constrains should be done in the same way as is original scalability tests.

404, network/benchmarks/netperf/images/BenchmarkingKubernetesNetworkingPerformance.svg

In network/benchmarks/netperf/README.md, the urls of
svg, e.g. netperf.svg & BenchmarkingKubernetesNetworkingPerformance.svg are 404

Thanks 😄

ClusterLoader: Fix error propagation in measurements

Some errors in measurement should be logged, some cause measurement to fail and some cause test to fail.

Listed measurement should have error propagation fixed before moving test from /test/framework to clusterloader2:

ClusterLoader2 bundle should assume that "bundle is bundle"

Context: https://github.com/kubernetes/perf-tests/pull/259/files#r226558746

Timescale for Perf dashboards needs to be configurable

Yesterday, we were trying to determine if the golang upgrade had adversely affected performance. The Perf dashboard was not in any way useful for this, which was disappointing and effectively led to a 1-day delay in releasing the beta.

The primary problem is that the timescale for the perf dash is not configurable, so it's not possible to get useful information out of it to examine specific changes. For example, look at gce-100Nodes, E2E, DensityPodStartup. The lines are so dense that you can't read anything at all.

We should be able to choose start and end dates for the graph data, which would allow (for example) looking at the most recent 24 hours of 100nodes.

ClusterLoader: Separate rows in testgrid

Each test case should have separate row in testgrid.

netperf: result interpretation

Hello, I have run the netperf with following results.

MSS                                          , Maximum, 96, 160, 224, 288, 352, 416, 480, 544, 608, 672, 736, 800, 864, 928, 992, 1056, 1120, 1184, 1248, 1312, 1376, 1440,
1 iperf TCP. Same VM using Pod IP            ,19033.000000,15292,18193,18309,14759,16767,16466,19033,16214,15723,18003,15816,17526,17670,16205,15926,16363,16899,15639,16742,15341,16303,16823,
2 iperf TCP. Same VM using Virtual IP        ,15870.000000,14622,14634,13309,15622,13274,13683,14700,14963,13445,13520,15722,14016,14195,13502,13704,15870,13694,14258,14538,14285,13303,12895,
3 iperf TCP. Remote VM using Pod IP          ,899.000000,855,858,889,889,893,891,894,894,895,898,899,884,893,864,891,896,893,894,893,896,894,896,
4 iperf TCP. Remote VM using Virtual IP      ,903.000000,892,891,892,888,863,862,893,897,894,883,902,898,894,892,889,889,875,893,897,897,894,903,
5 iperf TCP. Hairpin Pod to own Virtual IP   ,15874.000000,15222,15131,14236,14574,13799,14207,14739,13283,14318,14943,13056,14949,15388,14740,14584,13997,14203,15874,15550,14164,15616,15016,
6 iperf UDP. Same VM using Pod IP            ,4838.000000,4838,
7 iperf UDP. Same VM using Virtual IP        ,3604.000000,3604,
8 iperf UDP. Remote VM using Pod IP          ,2934.000000,2934,
9 iperf UDP. Remote VM using Virtual IP      ,3989.000000,3989,
10 netperf. Same VM using Pod IP             ,5525.360000,5525.36,
11 netperf. Same VM using Virtual IP         ,0.000000,0.00,
12 netperf. Remote VM using Pod IP           ,897.600000,897.60,
13 netperf. Remote VM using Virtual IP       ,0.000000,0.00,

I struggle with result interpretation and have a few questions:

Why is UDP so much slower than TCP?
How can UDP using Virtual IP can be faster from remote than from same VM? (lines 7 and 9)

Thank you for help

Missing some files

In dnsperf:

Change the completions and parallelism parameters in the dnsperf-job.yaml file to increase the number of test pods.

But no such file.

In netperf the images directory is missing so the graphics in the readme don't show up.

ClusterLoader e2e tests are not uploading artifacts

I guess these might be some permission issues.

ClusterLoader: add String() to measurements

Measurement interface should have String() method. All measurements should implement String() method that returns string representation of a given measurement instance.

Missing dependencies- Client-go

I tried to perform network performance test module. In launch.go there were dependencies of client-go.I tried to get client-go from https://github.com/kubernetes/client-go . It has various version issues.

Can someone help to get the correct dependencies needed to perform the test ?

Docker image ``girishkalele/netperf-latest`` has mssStepSize set to 1 not 64

Attempting to run the netperf tests, the image girishkalele/netperf-latest is pulled. If you look at the output, mssStepSize appears to be set to 1, not 64. That will make the tests take forever.

Received TCP output from worker netperf-w1 for test 2 iperf TCP. Same VM using Virtual IP from netperf-w1 to netperf-w2 MSS: 117
Connecting to host 100.70.29.66, port 5201
[  4] local 100.96.2.16 port 60848 connected to 100.70.29.66 port 5201
[  6] local 100.96.2.16 port 60850 connected to 100.70.29.66 port 5201
[  8] local 100.96.2.16 port 60852 connected to 100.70.29.66 port 5201
[ 10] local 100.96.2.16 port 60854 connected to 100.70.29.66 port 5201
[ 12] local 100.96.2.16 port 60856 connected to 100.70.29.66 port 5201
[ 14] local 100.96.2.16 port 60858 connected to 100.70.29.66 port 5201
[ 16] local 100.96.2.16 port 60860 connected to 100.70.29.66 port 5201
[ 18] local 100.96.2.16 port 60862 connected to 100.70.29.66 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec   12    166 KBytes       
[  6]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec    2    175 KBytes       
[  8]   0.00-10.00  sec  2.90 GBytes  2491 Mbits/sec    5    166 KBytes       
[ 10]   0.00-10.00  sec  2.90 GBytes  2490 Mbits/sec    6    227 KBytes       
[ 12]   0.00-10.00  sec  2.90 GBytes  2487 Mbits/sec    1    166 KBytes       
[ 14]   0.00-10.00  sec  2.89 GBytes  2486 Mbits/sec    6    227 KBytes       
[ 16]   0.00-10.00  sec  2.89 GBytes  2485 Mbits/sec    2    166 KBytes       
[ 18]   0.00-10.00  sec  2.89 GBytes  2484 Mbits/sec    5    227 KBytes       
[SUM]   0.00-10.00  sec  23.2 GBytes  19909 Mbits/sec   39             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec   12             sender
[  4]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec                  receiver
[  6]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec    2             sender
[  6]   0.00-10.00  sec  2.90 GBytes  2492 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  2.90 GBytes  2491 Mbits/sec    5             sender
[  8]   0.00-10.00  sec  2.90 GBytes  2491 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  2.90 GBytes  2490 Mbits/sec    6             sender
[ 10]   0.00-10.00  sec  2.90 GBytes  2490 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.90 GBytes  2488 Mbits/sec    1             sender
[ 12]   0.00-10.00  sec  2.90 GBytes  2488 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.89 GBytes  2486 Mbits/sec    6             sender
[ 14]   0.00-10.00  sec  2.89 GBytes  2486 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.89 GBytes  2485 Mbits/sec    2             sender
[ 16]   0.00-10.00  sec  2.89 GBytes  2485 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.89 GBytes  2484 Mbits/sec    5             sender
[ 18]   0.00-10.00  sec  2.89 GBytes  2484 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.2 GBytes  19909 Mbits/sec   39             sender
[SUM]   0.00-10.00  sec  23.2 GBytes  19909 Mbits/sec                  receiver

iperf Done.
Received TCP output from worker netperf-w1 for test 2 iperf TCP. Same VM using Virtual IP from netperf-w1 to netperf-w2 MSS: 118
Connecting to host 100.70.29.66, port 5201
[  4] local 100.96.2.16 port 60890 connected to 100.70.29.66 port 5201
[  6] local 100.96.2.16 port 60892 connected to 100.70.29.66 port 5201
[  8] local 100.96.2.16 port 60894 connected to 100.70.29.66 port 5201
[ 10] local 100.96.2.16 port 60896 connected to 100.70.29.66 port 5201
[ 12] local 100.96.2.16 port 60898 connected to 100.70.29.66 port 5201
[ 14] local 100.96.2.16 port 60900 connected to 100.70.29.66 port 5201
[ 16] local 100.96.2.16 port 60902 connected to 100.70.29.66 port 5201
[ 18] local 100.96.2.16 port 60904 connected to 100.70.29.66 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.00  sec  2.98 GBytes  2560 Mbits/sec    5    166 KBytes       
[  6]   0.00-10.00  sec  2.98 GBytes  2558 Mbits/sec    6    166 KBytes       
[  8]   0.00-10.00  sec  2.98 GBytes  2557 Mbits/sec   11    166 KBytes       
[ 10]   0.00-10.00  sec  2.97 GBytes  2555 Mbits/sec   13    175 KBytes       
[ 12]   0.00-10.00  sec  2.98 GBytes  2555 Mbits/sec    4    166 KBytes       
[ 14]   0.00-10.00  sec  2.97 GBytes  2554 Mbits/sec   12    166 KBytes       
[ 16]   0.00-10.00  sec  2.97 GBytes  2551 Mbits/sec    1    227 KBytes       
[ 18]   0.00-10.00  sec  2.97 GBytes  2550 Mbits/sec   10    166 KBytes       
[SUM]   0.00-10.00  sec  23.8 GBytes  20439 Mbits/sec   62             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.98 GBytes  2560 Mbits/sec    5             sender
[  4]   0.00-10.00  sec  2.98 GBytes  2560 Mbits/sec                  receiver
[  6]   0.00-10.00  sec  2.98 GBytes  2558 Mbits/sec    6             sender
[  6]   0.00-10.00  sec  2.98 GBytes  2558 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  2.98 GBytes  2557 Mbits/sec   11             sender
[  8]   0.00-10.00  sec  2.98 GBytes  2557 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  2.97 GBytes  2555 Mbits/sec   13             sender
[ 10]   0.00-10.00  sec  2.97 GBytes  2555 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.98 GBytes  2555 Mbits/sec    4             sender
[ 12]   0.00-10.00  sec  2.98 GBytes  2555 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.97 GBytes  2554 Mbits/sec   12             sender
[ 14]   0.00-10.00  sec  2.97 GBytes  2554 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.97 GBytes  2551 Mbits/sec    1             sender
[ 16]   0.00-10.00  sec  2.97 GBytes  2551 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.97 GBytes  2550 Mbits/sec   10             sender
[ 18]   0.00-10.00  sec  2.97 GBytes  2550 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.8 GBytes  20439 Mbits/sec   62             sender
[SUM]   0.00-10.00  sec  23.8 GBytes  20439 Mbits/sec                  receiver

iperf Done.
Received TCP output from worker netperf-w1 for test 2 iperf TCP. Same VM using Virtual IP from netperf-w1 to netperf-w2 MSS: 119
Connecting to host 100.70.29.66, port 5201
[  4] local 100.96.2.16 port 60932 connected to 100.70.29.66 port 5201
[  6] local 100.96.2.16 port 60934 connected to 100.70.29.66 port 5201
[  8] local 100.96.2.16 port 60936 connected to 100.70.29.66 port 5201
[ 10] local 100.96.2.16 port 60938 connected to 100.70.29.66 port 5201
[ 12] local 100.96.2.16 port 60940 connected to 100.70.29.66 port 5201
[ 14] local 100.96.2.16 port 60942 connected to 100.70.29.66 port 5201
[ 16] local 100.96.2.16 port 60944 connected to 100.70.29.66 port 5201
[ 18] local 100.96.2.16 port 60946 connected to 100.70.29.66 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.01  sec  2.94 GBytes  2523 Mbits/sec    7    175 KBytes       
[  6]   0.00-10.01  sec  2.94 GBytes  2523 Mbits/sec    6    166 KBytes       
[  8]   0.00-10.01  sec  2.94 GBytes  2523 Mbits/sec    5    166 KBytes       
[ 10]   0.00-10.01  sec  2.94 GBytes  2521 Mbits/sec    1    227 KBytes       
[ 12]   0.00-10.01  sec  2.93 GBytes  2520 Mbits/sec    1    227 KBytes       
[ 14]   0.00-10.01  sec  2.93 GBytes  2517 Mbits/sec    4    227 KBytes       
[ 16]   0.00-10.01  sec  2.93 GBytes  2516 Mbits/sec   32    227 KBytes       
[ 18]   0.00-10.01  sec  2.93 GBytes  2514 Mbits/sec    1    175 KBytes       
[SUM]   0.00-10.01  sec  23.5 GBytes  20157 Mbits/sec   57             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
root@netperf-orch-m3cb3:/tmp# more output.txt  
Received TCP output from worker netperf-w1 for test 2 iperf TCP. Same VM using Virtual IP from netperf-w1 to netperf-w2 MSS: 96
Connecting to host 100.70.29.66, port 5201
[  4] local 100.96.2.16 port 59916 connected to 100.70.29.66 port 5201
[  6] local 100.96.2.16 port 59918 connected to 100.70.29.66 port 5201
[  8] local 100.96.2.16 port 59920 connected to 100.70.29.66 port 5201
[ 10] local 100.96.2.16 port 59922 connected to 100.70.29.66 port 5201
[ 12] local 100.96.2.16 port 59924 connected to 100.70.29.66 port 5201
[ 14] local 100.96.2.16 port 59926 connected to 100.70.29.66 port 5201
[ 16] local 100.96.2.16 port 59928 connected to 100.70.29.66 port 5201
[ 18] local 100.96.2.16 port 59930 connected to 100.70.29.66 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.00  sec  2.95 GBytes  2532 Mbits/sec    1    192 KBytes       
[  6]   0.00-10.00  sec  2.95 GBytes  2532 Mbits/sec    6    166 KBytes       
[  8]   0.00-10.00  sec  2.94 GBytes  2529 Mbits/sec   10    166 KBytes       
[ 10]   0.00-10.00  sec  2.94 GBytes  2528 Mbits/sec    3    166 KBytes       
[ 12]   0.00-10.00  sec  2.94 GBytes  2527 Mbits/sec   18    166 KBytes       
[ 14]   0.00-10.00  sec  2.94 GBytes  2524 Mbits/sec    8    210 KBytes       
[ 16]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec    3    210 KBytes       
[ 18]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec    6    166 KBytes       
[SUM]   0.00-10.00  sec  23.5 GBytes  20216 Mbits/sec   55             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.95 GBytes  2532 Mbits/sec    1             sender
[  4]   0.00-10.00  sec  2.95 GBytes  2532 Mbits/sec                  receiver
[  6]   0.00-10.00  sec  2.95 GBytes  2532 Mbits/sec    6             sender
[  6]   0.00-10.00  sec  2.95 GBytes  2531 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  2.94 GBytes  2529 Mbits/sec   10             sender
[  8]   0.00-10.00  sec  2.94 GBytes  2529 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  2.94 GBytes  2528 Mbits/sec    3             sender
[ 10]   0.00-10.00  sec  2.94 GBytes  2528 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.94 GBytes  2527 Mbits/sec   18             sender
[ 12]   0.00-10.00  sec  2.94 GBytes  2526 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.94 GBytes  2524 Mbits/sec    8             sender
[ 14]   0.00-10.00  sec  2.94 GBytes  2523 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec    3             sender
[ 16]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec    6             sender
[ 18]   0.00-10.00  sec  2.94 GBytes  2522 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.5 GBytes  20216 Mbits/sec   55             sender
[SUM]   0.00-10.00  sec  23.5 GBytes  20212 Mbits/sec                  receiver

iperf Done.

Cluster Loader setting image fails

Hi,

I wanted to use the cluster loader to stress test our cluster sadly I can't change the Docker image. I use the following config and just execute ./run_e2e.sh (notice that the docs here are wrong --> no glide is needed only the ./run_e2e.sh command)

$ cat config/test.yaml 
ClusterLoader:
  delete: true
  projects:
    - num: 1
      basename: clusterproject
      tuning: default
      pods:
        - num: 50
          image: nobody-cares
          basename: pausepods
          file: pod-pause.json
  tuningsets:
    - name: default
      pods:
        stepping:
          stepsize: 10
          pause: 30s
        ratelimit:
          delay: 100ms

I would now assume that the cluster loader uses the nobody-cares image but instead it uses the default image gcr.io/google_containers/pause-amd64:3.0 see:

$ kubectl -n e2e-tests-clusterproject0-94xd9 describe po pausepods-pod-0
Name: pausepods-pod-0
Namespace: e2e-tests-clusterproject0-94xd9
Node: my-node/10.240.1.10
Start Time: Wed, 13 Sep 2017 16:01:20 +0200
Labels: purpose=test
Annotations: <none>
Status: Pending
IP: 
Containers:
  pause-amd64:
    Container ID: 
    Image: gcr.io/google_containers/pause-amd64:3.0
    Image ID: 
    Port: 8080/TCP
    State: Waiting
      Reason: ImagePullBackOff
    Ready: False
    Restart Count: 0
    Environment: <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vcn44 (ro)
Conditions:
  Type Status
  Initialized True 
  Ready False 
  PodScheduled True 
Volumes:
  default-token-vcn44:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-vcn44
    Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  FirstSeen LastSeen
Count From
SubObjectPath Type
Reason Message
  --------- --------
----- ----
------------- --------
------ -------
  11m 11m
1 default-scheduler
Normal Scheduled
Successfully assigned pausepods-pod-0 to my-node
  11m 11m
1 kubelet, my-node
Normal SuccessfulMountVolumeMountVolume.SetUp succeeded for volume "default-token-vcn44" 
  9m 6m  2 kubelet,my-node
spec.containers{pause-amd64} Warning
Failed Failed to pull image "gcr.io/google_containers/pause-amd64:3.0": rpc error: code = 2 desc = Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 74.125.133.82:443: i/o timeout"}
  11m 1m  6 kubelet, my-node
spec.containers{pause-amd64} Normal
Pulling pulling image "gcr.io/google_containers/pause-amd64:3.0"
  10m 56s
4 kubelet, my-node
spec.containers{pause-amd64} Warning
Failed Failed to pull image "gcr.io/google_containers/pause-amd64:3.0": rpc error: code = 2 desc = Error response from daemon: {"message":"Get https://gcr.io/v1/_ping: dial tcp 66.102.1.82:443: i/o timeout"}
  10m 3s  31 kubelet, my-node
Warning FailedSync
Error syncing pod
  10m 3s  25 kubelet, my-node
spec.containers{pause-amd64} Normal
BackOff Back-off pulling image "gcr.io/google_containers/pause-amd64:3.0"

I will take a look how to fix this.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

netperf: MSS not passed to iperf3

How to reproduce

In 1st terminal:

cd network/benchmarks/netperf/
make runtests

In 2nd terminal, during the tests

kubectl  --namespace=netperf exec -it netperf-w1-nkbgj -- ps faux -ww
...
root       323 46.0  0.0   7408  2392 ?        S    16:28   0:00 /usr/bin/iperf3 -c 172.31.255.164 -N -i 30 -t 10 -f m -w 512M -Z -P 8 -M a

kubectl  --namespace=netperf exec -it netperf-w1-nkbgj -- ps faux -ww
...
root       345  0.0  0.0   7408  2456 ?        R    16:30   0:00 /usr/bin/iperf3 -c 172.31.255.164 -N -i 30 -t 10 -f m -w 512M -Z -P 8 -M e

kubectl  --namespace=netperf exec -it netperf-w1-nkbgj -- ps faux -ww
...
root       364  0.0  0.0   7408  2456 ?        R    16:30   0:00 /usr/bin/iperf3 -c 172.31.255.164 -N -i 30 -t 10 -f m -w 512M -Z -P 8 -M f

Note the last argument of iperf3 the -M argument. The value passed to iperf -M is meant to be a unsigned. iperf unfortunately does not fail, and silently sets MSS to default.

Expected behaviour

Argument to -M is passed as an integer.