Coder Social home page Coder Social logo

kubernetes-mesos's Introduction

kubernetes-mesos's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-mesos's Issues

restrict pods to the resource constraints declared in their manifest

Currently pods can declare cpu and memory constraints but we don't honor them. We should.

The current design maps a single mesos task to a k8s pod, which in reality represents multiple containers. One approach would be to place all pods managed by an executor into its container, growing and shrinking the resource constraints of the executor container on the fly to accommodate the pods that it's in charge of managing. That implies that we can actually control placement of the docker containers (well, their cgroup placement). Currently this is only possible by manually moving the processes from their default docker-determined cgroup to that of the executor.

There is a docker proposal on the table to allow for customizable cgroup placement: moby/moby#8551 (related prototype tooling here: https://github.com/ibuildthecloud/systemd-docker).

Assuming that we could convince docker to honor custom cgroup placement (at container launch time), then we'd also need to convince/hack the kubelet to apply custom placement rules.

TODO: Clean up go-binding, upgrade protocol, replace goprotobuf

Since the go-binding is written long time ago, some of the protobuf messages are depreciated(e.g. LaunchTask with just one OfferID), so we should do a cleanup to make it update.
Also we would want to replace goprotobuf with gogoprotobuf for faster encoding/decoding speed.

Vagrant-based setup needs some love

  • Current box uses an old version of mesos, marathon, docker.io
  • Need to add /etc/mesos-master/hostname (to point to the 10.141.141.X address)
  • All slaves get the same /etc/mesos-slave/hostname of 10.141.141.10
  • In /etc/hosts, mesos-X should resolve to its 10.141.141.X address
  • Would be nice to include packer sources to allow for rebuilding the box periodically

executor shutdown doesn't wait for pods to die before terminating

Upon shutdown killPodForTask is called for each running task, but since killPodForTask doesn't wait for confirmation that the "kill signal" has been processed, there's no guarantee that the pods are actually all shut down upon executor termination. The result is orphaned pods left running on a slave.

Container for executor 'KubeleteExecutorID' fails to start

I cannot start pods using kubernetes-on-mesos. Details below:

  • Package Versions:
    Mesos 0.20.1
    Docker 1.2.0
    etcd 0.4.6
    protobuf 2.5.0
    all running on Ubuntu 14.04
  • I started a Mesos Cluster with One Master and One Slave and verified that the Cluster can fire off docker containers via Marathon framework i.e. the mesos-slave runs docker containers successfully.
  • I then followed the Build instructions in the kubernetes-mesos README.md
  • Started the Kubernetes-on-Mesos Framework on mesos master node on port 9090. It registers with Mesos successfully as verified on the Mesos Master Frameworks page.
  • Using src/github.com/mesosphere/kubernetes-mesos/examples/pod-nginx.json, I executed the following on the mesos-master:
    curl -L http://localhost:9090/api/v1beta1/pods -XPOST -d @pod-nginx.json
  • Error on the slave:
    E1015 11:17:09.202877 4707 slave.cpp:2485] Container 'adae622c-6777-47e0-b87b-ee9f0f657b0d' for executor 'KubeleteExecutorID' of framework '20141015-105719-222307338-5050-13825-0005' failed to start: Failed to fetch URIs for container 'adae622c-6777-47e0-b87b-ee9f0f657b0d': exit status 256
    E1015 11:17:09.203619 4710 slave.cpp:2580] Termination of executor 'KubeleteExecutorID' of framework '20141015-105719-222307338-5050-13825-0005' failed: No container found
    E1015 11:17:09.204197 4709 slave.cpp:2866] Failed to unmonitor container for executor KubeleteExecutorID of framework 20141015-105719-222307338-5050-13825-0005: Not monitored
    E1015 11:17:09.206140 4710 slave.cpp:2205] Failed to update resources for container adae622c-6777-47e0-b87b-ee9f0f657b0d of executor KubeleteExecutorID running task 79a552a6-5497-11e4-aea8-0050568e7d8b on status update for terminal task, destroying container: No container found
    W1015 11:17:18.872786 4709 slave.cpp:1421] Cannot shut down unknown framework 20141015-
    105719-222307338-5050-13825-0005
  • Error from the Framework:
    I1015 11:17:08.564495 14010 scheduler.go:568] About to try and schedule pod nginx-id-02
    I1015 11:17:08.565317 14010 scheduler.go:526] Try to schedule pod nginx-id-02
    I1015 11:17:09.095642 14010 scheduler.go:324] Received status update task_id:<value:"79a
    552a6-5497-11e4-aea8-0050568e7d8b" > state:TASK_LOST message:"Abnormal executor terminat
    ion" slave_id:<value:"20141015-105719-222307338-5050-13825-0" > timestamp:1.413397029203
    833e+09
    E1015 11:17:09.095996 14010 scheduler.go:476] Task lost: 'task_id:<value:"79a552a6-5497-
    11e4-aea8-0050568e7d8b" > state:TASK_LOST message:"Abnormal executor termination" slave_
    id:<value:"20141015-105719-222307338-5050-13825-0" > timestamp:1.413397029203833e+09 '
  • I verified that the slave in question could run the nginx container with this command executed directly from the CLI:
    docker run -t -d -p 31000:80 -i dockerfile/nginx
    The container started up just fine, and I could get to the nginx homepage via port 31000

Failed to list services

While starting kubernetes-mesos stack on an ubuntu server, I am getting a following error.

E1016 19:30:16.882098 17212 endpoints_controller.go:51] Failed to list services: request [&http.Request{Method:"GET", URL:(_url.URL)(0xc210083460), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{}, Body:io.ReadCloser(nil), ContentLength:0, TransferEncoding:[]string(nil), Close:false, Host:"127.0.0.1:8080", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(_multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"", RequestURI:"", TLS:(*tls.ConnectionState)(nil)}] failed (500) 500 Internal Server Error: {"kind":"Status","creationTimestamp":null,"apiVersion":"v1beta1","status":"failure","message":"501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]","code":500}

The mesos cluster (both master and slave) are also running on this same server. I see that the kubernetes-mesos framework gets registered on the mesos ui successfully. However, due to the above error, I cannot launch a pod.

Here is the full error message.

I1016 19:30:16.869009 17212 sched.cpp:139] Version: 0.20.1
I1016 19:30:16.879240 17220 sched.cpp:235] New master detected at master@10...:5050
I1016 19:30:16.879422 17220 sched.cpp:243] No credentials provided. Attempting to register without authentication
I1016 19:30:16.880460 17220 sched.cpp:409] Framework registered with 20141016-190257-3979659530-5050-16694-0012
I1016 19:30:16.880915 17212 scheduler.go:269] Scheduler registered with the master: id:"20141016-190257-3979659530-5050-16694" ip:3979659530 port:5050 pid:"master@10.
..:5050" hostname:"10..._" with frameworkId: value:"20141016-190257-3979659530-5050-16694-0012"
I1016 19:30:16.881249 17212 scheduler.go:287] Received offers
E1016 19:30:16.882098 17212 endpoints_controller.go:51] Failed to list services: request [&http.Request{Method:"GET", URL:(_url.URL)(0xc210083460), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{}, Body:io.ReadCloser(nil), ContentLength:0, TransferEncoding:[]string(nil), Close:false, Host:"127.0.0.1:8080", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(_multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"", RequestURI:"", TLS:(_tls.ConnectionState)(nil)}] failed (500) 500 Internal Server Error: {"kind":"Status","creationTimestamp":null,"apiVersion":"v1beta1","status":"failure","message":"501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]","code":500}

Deprecate Makefile, simplify build process

The current Makefile is a hack intended to help lock us to a specific version of Kubernetes and make it easy to generate new builds. We really shouldn't need something as complex as it is currently implemented. Instead of manipulating GOPATH and forcing k8s and all of its deps into third_party we should probably just godep restore all dependencies to the top-level GOPATH and then go build.

file PR's against k8s

[[EDITED]]

  • kubelet/server.go :: should permit parameterized "channel" sources (which feed pods to kubelet); we want to only use a "mesos" channel, and scrap the others
  • kubelet/server.go :: should allow more customizable handler selection (looks like they've started down this path already with enableDebugging for /container*)

Not buildable due to missing fsouza/go-dockerclient-copiedstructs

Hi,

kubernetes-mesos seems to not be buildable due to missing go-dockerclient-copiedstructs, at the moment.

# cd .; git clone https://github.com/fsouza/go-dockerclient-copiedstructs /var/lib/go/src/github.com/fsouza/go-dockerclient-copiedstructs
Cloning into '/var/lib/go/src/github.com/fsouza/go-dockerclient-copiedstructs'...
remote: Repository not found.
fatal: repository 'https://github.com/fsouza/go-dockerclient-copiedstructs/' not found
package github.com/fsouza/go-dockerclient-copiedstructs: exit status 128
godep: restore: exit status 1

Checking on github reveals that the repo github.com/fsouza/go-dockerclient-copiedstructs does not exist.

services should not require pods and pod-templates to declare explicit hostPort's

since merging #54 services can now connect to pods but only if those pods (or pod templates) are declared with a hostPort != 0. not including a hostPort definition is the same as assigning a value of zero, which triggers the default k8s behavior of not publishing the containerPort to the host at all. specifying a hostPort allows a service to become attached to the hostIP:hostPort, but this (a) requires that users are explicitly maintaining a list of available/consumed host ports, and; (b) limits the number of instances of said pod/pod-template to 1 per-host.

the enhancement here is to somehow allow an end user to specify that their pod/pod-template wants a hostPort, but without explicitly naming that port. mesos supports this by offering up a range of available ports to the k8s-mesos scheduler, but there's no way for the scheduler to know when a user wants a hostPort dynamically assigned from the available port range.

i've previously suggested a magic hostPort of 1 to indicate to the scheduler that dynamic hostPort assignment is desired by the end-user. upon observing a value of 1 the scheduler would dynamically allocate a host port from the available offers and update the pod's desired state to reflect the newly allocated port.

this seems extremely hackish, and limits the real-world use of hostPort 1 by this particular mesos scheduler. however it is backwards compatible with k8s existing use of hostPort 0 as well as with the existing JSON API (no new fields are required).

an alternative approach is to require an end-user to apply a magic label (something like "dynamic-host-port" with, perhaps, a boolean value of true/false) to obtain this behavior. initially this seems cleaner than the magic hostPort=1 approach, but seems to start us down a path of using labels to obtain specific scheduler behavior -- and I'm not sure that's the direction we want to go.

@adam-mesos @ConnorDoyle Thoughts?

rebase to newer k8s

Something v0.4.3 or later would be nice (to pick up container garbage collection).

pods stuck indefinitely in Waiting status after much replicationController resizing

root@development-1823-209:~# grep -e 9764eff2-66d4-11e4-9c7f-04012f416701 master.log |grep -v Returning|grep -v -e 'Error synchronizing'
...
I1107 23:32:28.878234 17461 backoff.go:61] Backing off 8s for pod 9764eff2-66d4-11e4-9c7f-04012f416701
I1107 23:32:36.878665 17461 scheduler.go:685] Get pod '9764eff2-66d4-11e4-9c7f-04012f416701'
I1107 23:32:38.879434 17461 scheduler.go:699] Pending Pod '9764eff2-66d4-11e4-9c7f-04012f416701': &{{ 9764eff2-66d4-11e4-9c7f-04012f416701 2014-11-07 23:19:58 +0000 UTC  0 } map[name:nginx replicationController:nginxController] {{v1beta1 9764eff2-66d4-11e4-9c7f-04012f416701 9764eff2-66d4-11e4-9c7f-04012f416701 [] [{nginx dockerfile/nginx []  [{ 31001 80 TCP }] [] 0 0 [] <nil> <nil> false}] {0xd35a68 <nil> <nil>}} Running    map[]} {{   [] [] {<nil> <nil> <nil>}} Waiting    map[]}}
I1107 23:32:58.883119 17461 scheduler.go:161] Could not schedule pod 9764eff2-66d4-11e4-9c7f-04012f416701: 1 ports could not be allocated on slave 20141107-135407-4055729162-5050-2517-2
I1107 23:32:58.883396 17461 scheduler.go:161] Could not schedule pod 9764eff2-66d4-11e4-9c7f-04012f416701: 1 ports could not be allocated on slave 20141107-135407-4055729162-5050-2517-0
I1107 23:32:58.883517 17461 scheduler.go:161] Could not schedule pod 9764eff2-66d4-11e4-9c7f-04012f416701: 1 ports could not be allocated on slave 20141107-135407-4055729162-5050-2517-1
I1107 23:32:58.884015 17461 scheduler.go:602] About to try and schedule pod 9764eff2-66d4-11e4-9c7f-04012f416701
I1107 23:32:58.884023 17461 scheduler.go:562] Try to schedule pod 9764eff2-66d4-11e4-9c7f-04012f416701
E1107 23:33:08.884249 17461 scheduler.go:608] Error scheduling 9764eff2-66d4-11e4-9c7f-04012f416701: Schedule time out; retrying
I1107 23:33:08.888942 17461 backoff.go:61] Backing off 16s for pod 9764eff2-66d4-11e4-9c7f-04012f416701
I1107 23:33:24.889741 17461 scheduler.go:685] Get pod '9764eff2-66d4-11e4-9c7f-04012f416701'
I1107 23:33:28.891564 17461 scheduler.go:699] Pending Pod '9764eff2-66d4-11e4-9c7f-04012f416701': &{{ 9764eff2-66d4-11e4-9c7f-04012f416701 2014-11-07 23:19:58 +0000 UTC  0 } map[name:nginx replicationController:nginxController] {{v1beta1 9764eff2-66d4-11e4-9c7f-04012f416701 9764eff2-66d4-11e4-9c7f-04012f416701 [] [{nginx dockerfile/nginx []  [{ 31001 80 TCP }] [] 0 0 [] <nil> <nil> false}] {0xd35a68 <nil> <nil>}} Running    map[]} {{   [] [] {<nil> <nil> <nil>}} Waiting    map[]}}
W1107 23:33:58.899676 17461 scheduler.go:639] Scheduler detected pod no longer pending: 9764eff2-66d4-11e4-9c7f-04012f416701, will not re-queue
root@development-1823-209:~# bin/kubecfg list pods
ID                                     Image(s)            Host                            Labels                                             Status
----------                             ----------          ----------                      ----------                                         ----------
97650af0-66d4-11e4-9c7f-04012f416701   dockerfile/nginx    10.132.189.242/10.132.189.242   name=nginx,replicationController=nginxController   Running
9764eff2-66d4-11e4-9c7f-04012f416701   dockerfile/nginx    /                               name=nginx,replicationController=nginxController   Waiting
9764b5be-66d4-11e4-9c7f-04012f416701   dockerfile/nginx    /                               name=nginx,replicationController=nginxController   Waiting

multiple build errors

I'm attempting to run kubernetes-mesos on a Ubuntu 14.04 VM and get the following errors trying to build:

root@vagrant-ubuntu-trusty-64:~# go get github.com/mesosphere/kubernetes-mesos/kubernetes-mesos

github.com/mesosphere/kubernetes-mesos/kubernetes-mesos

go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:136: cannot use nil as type string in function argument
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:136: not enough arguments in call to client.New
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:164: undefined: runtime.DefaultCodec
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:165: undefined: runtime.DefaultResourceVersioner
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:194: unknown master.Config field 'EtcdServers' in struct literal
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:205: cannot use scheduler (type *"github.com/mesosphere/kubernetes-mesos/scheduler".KubernetesScheduler) as type pod.Registry in field value:
*"github.com/mesosphere/kubernetes-mesos/scheduler".KubernetesScheduler does not implement pod.Registry (missing ListPodsPredicate method)
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:206: cannot use etcdClient (type tools.EtcdClient) as type tools.EtcdHelper in function argument
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:206: not enough arguments in call to "github.com/GoogleCloudPlatform/kubernetes/pkg/registry/etcd".NewRegistry
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:207: cannot use etcdClient (type tools.EtcdClient) as type tools.EtcdHelper in function argument
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:207: not enough arguments in call to "github.com/GoogleCloudPlatform/kubernetes/pkg/registry/etcd".NewRegistry
go/src/github.com/mesosphere/kubernetes-mesos/kubernetes-mesos/main.go:207: too many errors

missing github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory

Does registry/memory exist somewhere? I see this:

go get github.com/mesosphere/kubernetes-mesos/kubernetes-mesos
package github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory
imports github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory
imports github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory: cannot find package "github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory" in any of:
/usr/lib64/golang/src/pkg/github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory (from $GOROOT)
/home/ec2-user/kubernetes-mesos/godep/src/github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory (from $GOPATH)
/home/ec2-user/kubernetes-mesos/godep/src/github.com/GoogleCloudPlatform/kubernetes/third_party/src/github.com/GoogleCloudPlatform/kubernetes/pkg/registry/memory

incorporate backoff function in scheduler error handler

when etcd is incorrectly configured the scheduler spins out into an infinite error handling loop. two problems with this:

  • it would be nice to warn the user about an etcd misconfiguration, if we can detect it
  • we should implement backoff between re-tries of scheduling the same pod. k8s has implemented a backoff in their reference implementation -- we should follow suit.

Scheduler HA

We need to test and verify scheduler HA

  • preliminary scheduler lifecycle support
  • preliminary graceful failover support
  • we probably suffer from a similar issue mesosphere/marathon#1063
    • potential solution: defer buildFrameworkInfo()/driver initialization until post-election
  • scheduler leader election
  • wire up leader election to graceful failover
  • executors commit suicide once they've "outlived their usefulness"
  • prevent multiple HA schedulers that will generate conflicting ExecutorInfo
    • hash potentially conflicting ExecutorInfo and prepend to leader UID
    • validate that current master has matching UID prefix, otherwise terminate
    • what if all schedulers crash (there is no leader) then we start a new scheduler w/ diff params?
  • pass DNS parameters from scheduler to executor
    • pass Cluster{DNS,Domain} from scheduler to executor
  • document how to make use of HA mode, requirements, etc.
    • --ha and --km_path=hdfs:///... is the easiest way to get started
    • if starting a secondary scheduler on the same host, use a different --port
    • all --auth_path, --etcd_*, --executor_*, and --proxy_* options must be identical
  • failover cleanup (lots of TODOs)
  • sync to latest version of osext to pick up kardianos/osext@e61421b

Related upstream discussion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.