nre-learning / antidote-core Goto Github PK
View Code? Open in Web Editor NEW๐ Core Services that make up the Antidote Platform
License: Apache License 2.0
๐ Core Services that make up the Antidote Platform
License: Apache License 2.0
If you see failures, you can return an error via the API much sooner.
Not needed anymore. Was going to be useful if we used read-only, shared topologies, but we abandoned that approach.
Syringe holds state of what lessons are provisioned in memory, which means if syringe restarts, it loses track of this. This isn't too bad, because now that GC is in place, and it works based on purely kubernetes calls, they'll get cleaned up eventually but it means everyone will need to start new stuff from scratch. Would be nice if when syringe gets a request to create a resource, it detects if one already exists, and if so, add it back up to the in-memory map.
Considering a few changes:
I was working on the NAPALM lesson, which is the first to use jupyter notebooks as the lab guide. It uses the notebook for the lab guide for stages 1, 2, and 4. For stage 3 it will revert back to a markdown file.
EVERY ONCE IN A WHILE (and I mean very rarely) it will fail to load the markdown version. It was very difficult to reproduce this. As a result of this, it's also not clear whether the problem is caused by starting at stage 1 and navigating to stage 3, or if it could happen any time. I only tried going directly to stage 3 a few times but it never happened when doing that. However, with the infrequency of the event, it's still inconclusive.
I still haven't gotten to the root cause but digging through the scheduler code caused me to realize how fragile and shitty it is the way I'm mapping kubelab to livelessons, handling kubelab state, applying state changes between stages, etc. etc. This whole thing needs re-vamped.
I hate discussing a solution before I've REALLY nailed down the problem, but after spending hours trying to reliably reproduce, I'm leaning towards just rebuilding the scheduler to not suck and assuming that will take care of it.
For instance, there's a global var in scheduler.go:
kubeLabs = map[string]*KubeLab{}
I'm only writing to this map in one place, and in my testing, I was only spinning up lessons for me, so there shouldn't have been concurrent writes (though obviously I should still be using a Mutex or something to control concurrent writes). I also am not sure of the implications of this global variable as opposed to a property of the scheduler.
Need a check in Syringe if there are no configs but there are devices.
Some config parameters point to something like /antidote
, others /antidote/lessons
.
Should reposition configuration input as "curriculum". This will allow you to incorporate collections resources underneath this as well.
time="2018-10-11T08:00:15Z" level=info msg="New KubeLab for lesson 19 is of TopologyType custom"
time="2018-10-11T08:00:15Z" level=debug msg="Creating devices and connections"
time="2018-10-11T08:00:15Z" level=error msg="Problem creating network vqfx1-vqfx2-net: network-attachment-definitions.k8s.cni.cncf.io \"vqfx1-vqfx2-net\" is forbidden: unable to create new content in namespace 19-rapsjjo3ypp6m1bc-ns because it is being terminated"
time="2018-10-11T08:00:15Z" level=error msg="network-attachment-definitions.k8s.cni.cncf.io \"vqfx1-vqfx2-net\" is forbidden: unable to create new content in namespace 19-rapsjjo3ypp6m1bc-ns because it is being terminated"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xf56869]
goroutine 447350 [running]:
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).createKubeLab(0xc420409520, 0xc4218010c0, 0xc422fb37a0, 0x2, 0x2)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:347 +0xd79
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).handleRequest(0xc420409520, 0xc4218010c0)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:120 +0xbcd
created by github.com/nre-learning/syringe/scheduler.(*LessonScheduler).Start
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:110 +0xb5
might be solved by #5
Kind of a niche problem, but requests for lessons immediately after startup, and before the nuke has completed, gives a 408 error. We should probably cache all incoming requests while nuking is in progress, and wait to process them until finished.
time="2019-01-13T22:40:27Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
10.47.0.0 - - [13/Jan/2019:22:40:31 +0000] "GET / HTTP/1.1" 200 2
time="2019-01-13T22:40:32Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
time="2019-01-13T22:40:37Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
10.47.0.0 - - [13/Jan/2019:22:40:41 +0000] "GET / HTTP/1.1" 200 2
time="2019-01-13T22:40:42Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
time="2019-01-13T22:40:47Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
10.47.0.0 - - [13/Jan/2019:22:40:51 +0000] "GET / HTTP/1.1" 200 2
time="2019-01-13T22:40:52Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
ERROR: 2019/01/13 22:40:53 grpc: server failed to encode response: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
10.40.0.3 - - [13/Jan/2019:22:40:56 +0000] "GET /exp/lessondef/all HTTP/1.1" 200 143850
10.40.0.3 - - [13/Jan/2019:22:40:56 +0000] "GET /exp/lessondef/15 HTTP/1.1" 200 27172
time="2019-01-13T22:40:57Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
10.47.0.0 - - [13/Jan/2019:22:41:01 +0000] "GET / HTTP/1.1" 200 2
time="2019-01-13T22:41:02Z" level=debug msg="Waiting for namespace 15-jjtigg867ghr3gye-ns to delete..."
10.40.0.3 - - [13/Jan/2019:22:40:57 +0000] "POST /exp/livelesson HTTP/1.1" 408 66
10.40.0.3 - - [13/Jan/2019:22:41:07 +0000] "GET /exp/syringeinfo HTTP/1.1" 200 114
Should also consider handling this error better in antidote-web
. The behavior on that side is that the loading screen constantly shows, when in fact it broke on the initial error within seconds. Should go through all requests on that side and make sure nothing is able to fail silently like that.
10.38.0.38 - - [17/Feb/2019:18:22:41 +0000] "GET /exp/lessondef/17 HTTP/1.1" 200 4235
time="2019-02-17T18:22:41Z" level=debug msg="Scheduler received new request. Sending to handle function." Operation=3 Stage=2 Uuid=17-wxd1zqhqmej6qoiv
panic: runtime error: index out of range
goroutine 180539 [running]:
github.com/nre-learning/syringe/scheduler.(*KubeLab).ToLiveLesson(0xc420c6eae0, 0xc421310a20)
/go/src/github.com/nre-learning/syringe/scheduler/kubelab.go:105 +0xb7f
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).handleRequestMODIFY(0xc4201edc70, 0xc420b699c0)
/go/src/github.com/nre-learning/syringe/scheduler/requests.go:176 +0x146
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).(github.com/nre-learning/syringe/scheduler.handleRequestMODIFY)-fm(0xc420b699c0)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:93 +0x34
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).Start.func2(0xc4206d2810, 0xc420b699c0)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:107 +0x6d
created by github.com/nre-learning/syringe/scheduler.(*LessonScheduler).Start
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:106 +0x3ff
Something changed in the last release. Ever since last Monday (v0.2.0), the metrics have been askew:
This contradicts the current namespaces (the only session is me):
kubectl get ns
NAME STATUS AGE
14-jjtigg867ghr3gye-ns Active 46m
default Active 25d
kube-public Active 25d
kube-system Active 25d
prod Active 25d
ptr Active 25d
Previously, there was no generic iframe resource - it was specific to jupyter notebooks. Thus, it was easy to load the letsencrypt cert into the docker image because I controlled the build.
However, if we're planning to allow generic web resources to be embedded via iframe, this means in order to display the content, it needs to be a trusted cert. So we need to put some thought into how we're going to do this - I can't always slipstream a cert into every image that needs it.
panic: Post https://10.96.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: dial tcp 10.96.0.1:443: connect: connection refused
goroutine 22 [running]:
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).createNetworkCrd(0xc4204c12c0, 0x1386d40, 0xc420568360)
/go/src/github.com/nre-learning/syringe/scheduler/networks.go:33 +0x94
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).Start(0xc4204c12c0, 0x0, 0x0)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:83 +0x3d
main.main.func2(0xc4204c12c0, 0xc42036aec0)
/go/src/github.com/nre-learning/syringe/cmd/syringed/main.go:65 +0x2f
created by main.main
/go/src/github.com/nre-learning/syringe/cmd/syringed/main.go:64 +0x538
To prepare for the upcoming move to external state, we should prepare an abstraction for state management. Support internal state as we do today, but using an Interface that could also be satisfied by an external driver like etcd
Should continue to support both for different use cases. Be able to configure the plugin choice via env
Config logs don't contain any context of what they're configuring.
We should also error out earlier than 10 or so failures. Like 3 should be fine.
10.38.0.38 - - [11/Mar/2019:00:48:07 +0000] "GET /exp/lessondef HTTP/1.1" 200 181938
time="2019-03-11T00:48:09Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:09Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:48:11 +0000] "GET /exp/livelesson/21-qofps64daw5eqzf6 HTTP/1.1" 200 2778
10.38.0.38 - - [11/Mar/2019:00:48:11 +0000] "GET /exp/lessondef HTTP/1.1" 200 181938
10.38.0.0 - - [11/Mar/2019:00:48:13 +0000] "GET / HTTP/1.1" 200 2
time="2019-03-11T00:48:14Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:14Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:48:15 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
time="2019-03-11T00:48:19Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:19Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.0 - - [11/Mar/2019:00:48:23 +0000] "GET / HTTP/1.1" 200 2
10.38.0.38 - - [11/Mar/2019:00:48:23 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
time="2019-03-11T00:48:24Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:24Z" level=error msg="Problem configuring with config-vqfx2"
time="2019-03-11T00:48:29Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:29Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:48:32 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
10.38.0.0 - - [11/Mar/2019:00:48:33 +0000] "GET / HTTP/1.1" 200 2
time="2019-03-11T00:48:34Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:34Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:48:39 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
time="2019-03-11T00:48:39Z" level=info msg="Job Status" active=1 failed=2 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:39Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.0 - - [11/Mar/2019:00:48:43 +0000] "GET / HTTP/1.1" 200 2
time="2019-03-11T00:48:44Z" level=info msg="Job Status" active=1 failed=3 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:44Z" level=error msg="Problem configuring with config-vqfx2"
time="2019-03-11T00:48:47Z" level=debug msg="No old namespaces found. No need to GC."
10.38.0.38 - - [11/Mar/2019:00:48:48 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
time="2019-03-11T00:48:49Z" level=info msg="Job Status" active=1 failed=3 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:49Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.0 - - [11/Mar/2019:00:48:53 +0000] "GET / HTTP/1.1" 200 2
time="2019-03-11T00:48:54Z" level=info msg="Job Status" active=1 failed=3 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:54Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:48:58 +0000] "GET /exp/lessondef/16 HTTP/1.1" 200 23674
time="2019-03-11T00:48:58Z" level=debug msg="Scheduler received new request. Sending to handle function." Operation=4 Stage=0 Uuid=16-qofps64daw5eqzf6
time="2019-03-11T00:48:58Z" level=debug msg="Booping 16-qofps64daw5eqzf6-ns"
10.38.0.38 - - [11/Mar/2019:00:48:58 +0000] "POST /exp/livelesson HTTP/1.1" 200 28
10.38.0.38 - - [11/Mar/2019:00:48:58 +0000] "GET /exp/livelesson/16-qofps64daw5eqzf6 HTTP/1.1" 200 4451
10.38.0.38 - - [11/Mar/2019:00:48:58 +0000] "GET /exp/lessondef HTTP/1.1" 200 181938
time="2019-03-11T00:48:59Z" level=info msg="Job Status" active=1 failed=3 jobName=config-vqfx2 successful=0
time="2019-03-11T00:48:59Z" level=error msg="Problem configuring with config-vqfx2"
10.38.0.38 - - [11/Mar/2019:00:49:00 +0000] "GET /exp/livelesson/15-qofps64daw5eqzf6 HTTP/1.1" 200 7144
time="2019-03-11T00:49:02Z" level=debug msg="Recording periodic influxdb metrics"
time="2019-03-11T00:49:02Z" level=debug msg="Creating influxdb point: ID: 17 | NAME: Version Control with Git | ACT
If the configs are the same between stages don't try reconfiguring
Also should clean up the approach with merge/overwrite
time="2019-01-15T06:49:08Z" level=debug msg="Connectivity testing endpoint selfservice via :0"
time="2019-01-15T06:49:08Z" level=debug msg="Connectivity testing endpoint selfservice via 10.107.156.241:5000"
It works right now so be careful removing anything important
Should use the same functionality regardless of the resource type. Build it once, and iterate over the fields of whatever resource type is specified.
Should probably embed questions and choices into the data model somehow.
Should also update docs with a page on creating a new lesson/collection/etc that uses this.
time="2019-01-29T22:07:59Z" level=info msg="Deleted namespace 19-ne14ch9gqwobnrr4-ns"
time="2019-01-29T22:07:59Z" level=info msg="Deleted namespace 19-ne14ch9gqwobnrr4-ns"
time="2019-01-29T22:07:59Z" level=info msg="Deleted namespace 19-ne14ch9gqwobnrr4-ns"
time="2019-01-29T22:07:59Z" level=info msg="Deleted namespace 19-ne14ch9gqwobnrr4-ns"
time="2019-01-29T22:07:59Z" level=info msg="Deleted namespace 19-ne14ch9gqwobnrr4-ns"
time="2019-01-29T22:07:59Z" level=info msg="Finished garbage-collecting 5 old lessons"
time="2019-01-29T22:07:59Z" level=debug msg="Received result from scheduler." Operation=3
We need to be able to configure a syslog destination, or a file. Currently just statically outputting to stdout.
Should also standardize the logging output itself. Like, for every message, lesson ID and session ID should be implicitly provided so it's always there.
Need a new resource type: collection.
You can refer to these collections in the lesson definition (should load collections first from YAML files, and then add check to make sure the collection ID exists
The collection resource will have all of the metadata needed to construct a page describing that collection.
Follow up to nre-learning/antidote#43
Related to https://github.com/nre-learning/antidote/issues/128
Also add to grafana. Then write stress tests and watch this graph.
If the configs are the same between stages don't try reconfiguring
Also should clean up the approach with merge/overwrite
[mierdin@antidote-controller-lklr ~]$ kubectl exec -n=prod syringe-fbc65bdf5-zf4l4 syrctl wl remove m9eujuuzvqiq47b4
rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil
command terminated with exit code 1
10.47.0.0 - - [12/Feb/2019:02:40:52 +0000] "GET / HTTP/1.1" 200 2
10.47.0.0 - - [12/Feb/2019:02:41:02 +0000] "GET / HTTP/1.1" 200 2
10.47.0.0 - - [12/Feb/2019:02:41:12 +0000] "GET / HTTP/1.1" 200 2
10.47.0.0 - - [12/Feb/2019:02:41:22 +0000] "GET / HTTP/1.1" 200 2
10.47.0.0 - - [12/Feb/2019:02:41:32 +0000] "GET / HTTP/1.1" 200 2
time="2019-02-12T02:41:34Z" level=debug msg="No old namespaces found. No need to GC."
10.47.0.0 - - [12/Feb/2019:02:41:42 +0000] "GET / HTTP/1.1" 200 2
time="2019-02-12T02:41:52Z" level=debug msg="Recording periodic influxdb metrics"
time="2019-02-12T02:41:52Z" level=debug msg="Creating influxdb point: ID: 15 | NAME: Event-Driven Network Automation with StackStorm | ACTIVE: 1"
time="2019-02-12T02:41:52Z" level=debug msg="Creating influxdb point: ID: 30 | NAME: Network Automation with Salt | ACTIVE: 1"
10.47.0.0 - - [12/Feb/2019:02:41:52 +0000] "GET / HTTP/1.1" 200 2
10.47.0.0 - - [12/Feb/2019:02:42:02 +0000] "GET / HTTP/1.1" 200 2
10.40.2.110 - - [12/Feb/2019:02:42:03 +0000] "GET /exp/lessondef/30 HTTP/1.1" 200 8746
time="2019-02-12T02:42:03Z" level=debug msg="Scheduler received new request. Sending to handle function." Operation=4 Stage=0 Uuid=30-neiflnh3stq6q9yl
time="2019-02-12T02:42:03Z" level=debug msg="Booping 30-neiflnh3stq6q9yl-ns"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xfce8a1]
goroutine 2372 [running]:
github.com/nre-learning/syringe/api/exp.(*server).recordRequestTSDB(0xc4204b5bc0, 0xc42035e900, 0x0, 0x0)
/go/src/github.com/nre-learning/syringe/api/exp/influxdb.go:106 +0x321
github.com/nre-learning/syringe/api/exp.(*server).RequestLiveLesson(0xc4204b5bc0, 0x142f560, 0xc420820210, 0xc42035e8c0, 0xc4204b5bc0, 0xc420820150, 0x1199de0)
/go/src/github.com/nre-learning/syringe/api/exp/livelessons.go:86 +0x651
github.com/nre-learning/syringe/api/exp/generated._LiveLessonsService_RequestLiveLesson_Handler(0x12d9060, 0xc4204b5bc0, 0x142f560, 0xc420820210, 0xc420333220, 0x0, 0x0, 0x0, 0xc420c0cc80, 0x16)
/go/src/github.com/nre-learning/syringe/api/exp/generated/livelesson.pb.go:1016 +0x241
github.com/nre-learning/syringe/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc42045ad80, 0x143b720, 0xc420516d80, 0xc420474b00, 0xc4204b5ce0, 0x1d9dad8, 0x0, 0x0, 0x0)
/go/src/github.com/nre-learning/syringe/vendor/google.golang.org/grpc/server.go:966 +0x4bc
github.com/nre-learning/syringe/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42045ad80, 0x143b720, 0xc420516d80, 0xc420474b00, 0x0)
/go/src/github.com/nre-learning/syringe/vendor/google.golang.org/grpc/server.go:1245 +0xd69
github.com/nre-learning/syringe/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc4203de070, 0xc42045ad80, 0x143b720, 0xc420516d80, 0xc420474b00)
/go/src/github.com/nre-learning/syringe/vendor/google.golang.org/grpc/server.go:685 +0x9f
created by github.com/nre-learning/syringe/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/go/src/github.com/nre-learning/syringe/vendor/google.golang.org/grpc/server.go:683 +0xa1
Will need to lock down a lesson when it's being changed, for instance you can send a request for stage 1 and then stage 2 in succession and cause a race condition.
LessonDir vs LessonsDir
antidote-web
to take advantage of thistime="2018-08-24T22:10:00Z" level=error msg="failed to create namespace, not creating kubelab"
time="2018-08-24T22:10:00Z" level=error msg="Error creating lesson: namespaces \"12-6viedvg5rctwdpcd-ns\" already exists"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xf0e5f9]
goroutine 6 [running]:
github.com/nre-learning/syringe/scheduler.(*KubeLab).ToLiveLesson(0x0, 0xc4201b9ef8)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:300 +0x79
github.com/nre-learning/syringe/scheduler.(*LessonScheduler).Start(0xc42033b4e0, 0x0, 0x0)
/go/src/github.com/nre-learning/syringe/scheduler/scheduler.go:98 +0x53e
main.main.func2(0xc42033b4e0, 0xc420367430)
/go/src/github.com/nre-learning/syringe/cmd/syringed/main.go:78 +0x2f
created by main.main
/go/src/github.com/nre-learning/syringe/cmd/syringed/main.go:77 +0x4c9
Should also add a message that indicates that a lesson was skipped because of tier
Can't just kill the NS with kubectl, since the state will still be in syringe. Need to trigger a syringe cleanup
Endpoint names currently have no validation on ingest that forces compliance with kubernetes standards. See below error when trying to create an endpoint name with capital letters.
time="2018-11-06T07:41:26Z" level=error msg="Problem creating pod StackStorm: Pod \"StackStorm\" is invalid: [metadata.name: Invalid value: \"StackStorm\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.containers[0].name: Invalid value: \"StackStorm\": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')]"
Validation should be added to all relevant fields in db types that will result in a kubernetes resource name that conforms to this spec.
Shouldn't be too tough to read the files from the expected location on request, and pass that data through the API itself.
Syringe isn't crashing, but the goroutine which periodically publishes usage data seems to be.
Should make sure we keep trying to connect, and should also improve logging.
Should support branching questions (with limits).
/cc @riw777 for his input on this
Should be easy to set on a kubelab basis, the number of spares to keep ready at all times. Default to 0. Tweak as demand rises and as costs can handle. Will need to figure out how to deal with namespaces. Maybe create the pod ahead of time and then add the service into the namespace? That will ensure DNS consistency with JIT provisioned stuff. Need an API for this, so that we can not only adjust these levels at runtime, but also to get metrics on how long lesson spares have remained unused, so we can optimize the configuration over time.
The problem you'll have to deal with is that resources can't have their namespaces changed. So if you are provisioning stuff ahead of time, when they're "called up" to be used, you'll have to access them where they live. Garbage collection will also need to be changed to clean up resource created JIT or in advance.
Currently, there are two places you have to update the lesson repo in Syringe, since Syringe must use an init container to clone it's own copy of the lesson directory, but then the environment variable is used to create init containers for all pods and jobs spawned by Syringe.
We should use the environment variable as the source of truth, and use something like https://github.com/src-d/go-git to take care of the initial clone, rather than an init container for the syringe pod(s). The pods/jobs spawned by syringe can and should continue to use init containers, however, as these are all informed by the env variable.
This will also make selfmedicate simpler.
NOTE that #75 introduces the use of local lesson directory for Syringe. When implementing the feature for this issue, be careful to honor this new functionality as well - only clone if the configuration permits. Otherwise use the local directory.
This is so the dev logs don't get cluttered with connection failed messages
#4 is a temporary fix, but the long-term fix is to back-end Syringe with a proper database for the small amount of state it keeps.
This will allow you to create multiple instances of syringe, as they'll be stateless. When they need to make a change, lock the value in etcd, change, and unlock. Do things properly
This will make #3 unnecessary, as when syringe crashes, the state will be kept elsewhere. It will also mean that we can truly have a proper load-balancing setup, as everything will scale out. What little state exists will exist in a database built to keep it distributed and properly accessed/locked.
Increate syringe replicas to 3 once done
Also should make sure that when a new request comes in, that other requests are delayed or declined until the first is completed.
Many network images, such as the vqfx and vmx images currently supported, allow for the passing of configs at boot time.
This probably won't work with snapshots, but for the rare occassions where it's preferable to boot an image from start, providing a config file on first boot can help save precious seconds.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.