suse / scf Goto Github PK
View Code? Open in Web Editor NEWSUSE Cloud Foundry
License: Other
SUSE Cloud Foundry
License: Other
I'm working with CAP 1.2 (chart version 2.13.3) and looking forward to the new charts for 2.14.5. Maybe this issue is already solved in newer versions.
CF API delivers v3 urls with http:// instead of https://. We are using a loadbalancer in front of CF und don't wanna have a http-available API so the loadbalancer makes a redirect to https. CF cli doesn't like that at all and leads to errors like this:
> CF_TRACE=1 cf org system
Getting info for org system as admin...
[...]
REQUEST: [2018-11-29T13:48:51+01:00]
GET /v3/isolation_segments?organization_guids=679703cb-33c1-4848-b66a-974cf01f21c6 HTTP/1.1
[...]
RESPONSE: [2018-11-29T13:48:51+01:00]
HTTP/1.1 401 Unauthorized
[...]
Authentication error
FAILED
I direct api call /v3/isolation_segments does work because it goes over https. Inside calls like the one above the client makes a get request on /v3 to receive the v3 urls/paths which seems to be used afterwards:
> cf curl /v3
[...]
"isolation_segments": {
"href": "http://api.my.domain/v3/isolation_segments"
},
[...]
Hi team:
When i use kube-ready-state-check.sh to test k8s env,it show:
Configuration problem detected: swapaccount enable
Verified: docker info should not show aufs
Verified: kube-dns should be running (show 4/4 ready)
Verified: tiller should be running (1/1 ready)
Verified: An ntp daemon or systemd-timesyncd must be installed and active
Verified: A storage class should exist in K8s
Configuration problem detected: Privileged must be enabled in 'kube-apiserver'
Configuration problem detected: Privileged must be enabled in 'kubelet'
Verified: TasksMax must be set to infinity
But i already config kubelet and kube-apiserver --all-privileged=true
root 102704 13.6 0.3 1639560 95088 ? Ssl 06:14 0:01 /usr/bin/kubelet --allow-privileged=true --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni
root 38613 8.9 1.3 409428 346476 ? Ssl 05:29 3:51 kube-apiserver --authorization-mode=Node,RBAC --advertise-address=192.168.74.130 --allow-privileged=true --client-ca-file=/etc/kubernetes/pki/ca.crt --disable-admission-plugins=PersistentVolumeLabel --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0
demo:~ # SECRET=$(kubectl get pods --namespace uaa -o jsonpath='{.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretK
eyRef.name}')
demo:~ # echo $SECRET
secrets-2.13.3-1 secrets-2.13.3-1 <<----- HERE
demo:~ # CA_CERT="$(kubectl get secret $SECRET --namespace uaa -o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"
demo:~ # echo $CA_CERT
demo:~ #
demo:~ # SECRET=secrets-2.13.3-1
demo:~ # kubectl get secret $SECRET --namespace uaa -o jsonpath="{.data['internal-ca-cert']}"
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU4akNDQXRxZ0F3SUJBZ0lVUFhKczRlbzFERnpjY1NtcFRPeVd3QT
We have a use case whereby we provision static IPs on our IaaS and configure DNS to point to these, before deploying SCF. This allows users to install UAA and SCF one after the other without stopping to reconfigure DNS with whichever LoadBalancer IP the IaaS happened to choose.
Suggest that three properties are added to the respective Helm charts:
services.uaa_load_balancer_ip
services.router_load_balancer_ip
services.ssh_load_balancer_ip
And then the Helm charts (or whatever bit of Fissile generates them) are changed to contain the following logic:
{{- if .Values.services.loadbalanced }}
type: "LoadBalancer"
{{ if .Values.services.router_load_balancer_ip }}loadBalancerIP: {{ .Values.services.router_load_balancer_ip }} {{ end }}
{{- end }}
Hi team:
My config is:
env:
# Domain for SCF. DNS for *.DOMAIN must point to a kube node's (not master)
# external ip address.
DOMAIN: cf-dev.io
# UAA host/port that SCF will talk to. If you have a custom UAA
# provide its host and port here. If you are using the UAA that comes
# with the SCF distribution, simply use the two values below and
# substitute the cf-dev.io for your DOMAIN used above.
UAA_HOST: uaa.cf-dev.io
# UAA_PORT: 2793
kube:
# The IP address assigned to the kube node pointed to by the domain.
#### the external_ip setting changed to accept a list of IPs, and was
#### renamed to external_ips
external_ips:
- 192.168.7.22
storage_class:
# Make sure to change the value in here to whatever storage class you use
persistent: "persistent"
shared: "shared"
auth: rbac
secrets:
# Password for user 'admin' in the cluster
CLUSTER_ADMIN_PASSWORD: mypaas
# Password for SCF to authenticate with UAA
UAA_ADMIN_CLIENT_SECRET: uaa
When i use to deploy scf , tcp-router and other compoent not work.
it status is
NAME READY STATUS RESTARTS AGE
api-0 0/1 Running 1 16m
blobstore-0 1/1 Running 0 16m
cc-clock-77d84dd98-l6hkd 1/1 Running 0 16m
cc-uploader-57b668788f-6qxhq 1/1 Running 0 16m
cc-worker-74f76c677d-sjpmq 1/1 Running 0 16m
cf-usb-5b969887f9-29q9l 0/1 Running 0 16m
diego-access-5c5cb7b6d7-tdmg2 1/1 Running 0 16m
diego-api-6c8b96c597-sj5zx 1/1 Running 0 16m
diego-brain-6594dfdbc5-5rpkc 1/1 Running 0 16m
diego-cell-0 0/1 Running 0 16m
diego-locket-6756d56bb6-k8bht 1/1 Running 0 16m
doppler-0 1/1 Running 0 16m
loggregator-5698c4569f-pm7lt 1/1 Running 0 16m
mysql-0 1/1 Running 0 16m
mysql-proxy-59b486f8dc-phmp9 1/1 Running 0 16m
nats-0 1/1 Running 0 16m
nfs-broker-5555d9c585-v292p 1/1 Running 0 16m
post-deployment-setup-1-slgvn 1/1 Running 0 16m
router-5bb54cf844-cvg58 0/1 Running 0 16m
routing-api-0 0/1 Running 0 16m
secret-generation-1-78wcv 0/1 Completed 0 16m
syslog-adapter-6f5c4d9558-db7ls 1/1 Running 0 16m
syslog-rlp-65f997ccdd-r4nlg 1/1 Running 0 16m
syslog-scheduler-7c7788df59-bxrnf 1/1 Running 0 16m
tcp-router-86cb8cd89f-665l2 0/1 Running 0 16m
router-api exception:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned scf/routing-api-0 to geekyzk-virtual-machine
Normal SandboxChanged 31m (x6 over 33m) kubelet, geekyzk-virtual-machine Pod sandbox changed, it will be killed and re-created.
Warning Failed 31m (x8 over 33m) kubelet, geekyzk-virtual-machine Error: secrets "secrets-2.8.0-1" not found
Normal Pulled 31m (x9 over 33m) kubelet, geekyzk-virtual-machine Container image "docker.io/splatform/scf-routing-api:73fa8ca792bb04a69d34be261bd240ce6ef2f705" already present on machine
Normal Created 31m kubelet, geekyzk-virtual-machine Created container
Normal Started 30m kubelet, geekyzk-virtual-machine Started container
Warning Unhealthy 3m (x165 over 30m) kubelet, geekyzk-virtual-machine Readiness probe failed: dial tcp 10.244.0.106:3000: connect: connection refused
tcp-router . exception is
Trying: curl --connect-timeout 5 --fail --header Accept: application/json https://scf.uaa.cf-dev.io:2793/info
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:20 --:--:-- 0
curl: (28) Resolving timed out after 5512 milliseconds
FAILED
Waiting 30s ...
Trying: curl --connect-timeout 5 --fail --header Accept: application/json https://scf.uaa.cf-dev.io:2793/info
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:20 --:--:-- 0
curl: (28) Resolving timed out after 5514 milliseconds
FAILED
Waiting 30s ...
I install bind9 dns server ,and config *.cf-dev.io -> 192.168.7.22 , but is also fail, Can you help me?
Sorry @viovanov ,
We need an issue to track our problem. So I open this consul issue for tracking. Thanks! :)
I have a question about Consul, I am not sure if you can start the HA Consul clusters, such as 3 members?
Such as I set 3 consuls in SCF, then SCF will render consul confab.json as:
"servers":{"lan":["consul-0.consul-set.cf.svc.cluster.local","consul-1.consul-set.cf.svc.cluster.local","consul-2.consul-set.cf.svc.cluster.local"]
The consul-0 can start correctly, but for consul-1 and consul-2, they cannot start correctly because they need to communicate each other successfully, so for service of consul, if pod is not ready, we cannot access its pod by using the pod of service name, I think it is a dead lock.....
for example, I see the error in consul-1 consul_agent.stderr.log:
* Failed to resolve consul-2.consul-set.cf.svc.cluster.local: lookup consul-2.consul-set.cf.svc.cluster.local on 172.21.0.10:53: no such host)
So consul-1 cannot startup correctly, then consul-set service cannot discover it
But later, I am not sure why consul-2 is provisioned and started, then search consul-1 then get the same error.
So these two wait each other to startup but none of them is ready then dead lock here....
But several hours later, for some reason, they may know each other correctly, then they can start up correctly. But I don't know why or how they do that.
Hi,
I remember SCF brings Consul back for the new auto-scaling feature.
I have confirmed with auto-scaling that they don't need consul any more and remove consul related config in auto-scaling.
Can you help remove consul job if there is no any other jobs need Consul?
@qibobo here, if you have any auto-scaling question.
Thanks!
Hi,
Quick question regarding the routing-release
submodule. It seems that in the current scf latest release its using cf-deployment 1.9, then the correct routing-release
version should be something higher than 0.166.0, apparently 0.170.0
. Any ideas on this?
Regards,
Enrique Encalada
Hi,
Right now, we are testing upgrade function in SCF. But api is a special case because it needs to do the ccdb migration in the beginning.
But api is statefulset, during upgrade, it will upgrade reversely. it mean, api-1 will be upgraded, then api-0 is upgraded then do ccdb migration. I think it is incorrect.
I check the api code:
<% if spec.bootstrap && p('cc.run_prestart_migrations') %>
perform_migration
seed_db
<% else %>
echo "Skipping DB migrations and seeds"
<% end %>
every time, api-0 will do perform_migration and seed_db.
I also asked in kubernetes git, there is no any option to upgrade in order for statefulset right now:
kubernetes/kubernetes#51907 (comment)
So I am not sure if we can upgrade like this:
1, set statefulset api upgradestrategy as OnDelete
make a scripts for api upgrade, after make upgrade, use script to delete api-0 -> api-1 -> etc... in order.
Or do you have any plan for that?
Thanks a lot!
Hi folks,
CF Persi team here. We are attempting to patch our SCF+Eirini deployment to test changes we have made in order to support volume services (in Eirini).
We have a running SCF+Eirini (in cfcr actually). We have made changes to cloud_controller
and to eirini
. Our thinking is to remake the capi-release
and separately eirini-release
, push them to a docker registry and update our helm charts to use these two new images.
However, when we run:
$ cd ~/workspace/scf
$ make capi-release
It fails with the following output:
Building a release from directory '/Users/pivotal/workspace/scf/src/capi-release/':
- Running prep scripts:
Running command: 'bash -x /Users/pivotal/workspace/scf/src/capi-release/packages/nginx_newrelic_plugin/pre_packaging', stdout: '~/.bosh/tmp/bosh-resource-archive633828037/nginx ~/.bosh/tmp/bosh-resource-archive633828037
newrelic_nginx_agent/
newrelic_nginx_agent/config/
newrelic_nginx_agent/Gemfile
newrelic_nginx_agent/LICENSE
newrelic_nginx_agent/newrelic_nginx_agent
newrelic_nginx_agent/newrelic_nginx_agent.daemon
newrelic_nginx_agent/README.txt
newrelic_nginx_agent/config/newrelic_plugin.yml.in
~/.bosh/tmp/bosh-resource-archive633828037
~/.bosh/tmp/bosh-resource-archive633828037/nginx/newrelic_nginx_agent ~/.bosh/tmp/bosh-resource-archive633828037
', stderr: '+ set -e -x
+ pushd /home/docker-user/.bosh/tmp/bosh-resource-archive633828037/nginx
+ tar zxvf newrelic_nginx_agent-1.2.1.tar.gz
+ popd
+ pushd /home/docker-user/.bosh/tmp/bosh-resource-archive633828037/nginx/newrelic_nginx_agent
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/nginx_newrelic_plugin/pre_packaging: line 8: bundle: command not found
':
exit status 127
- Running prep scripts:
Running command: 'bash -x /Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging', stdout: 'ERROR: Failed to run 'bundle package' after 3 attempts
', stderr: '+ set -e -x
+ cd /home/docker-user/.bosh/tmp/bosh-resource-archive363148158/cloud_controller_ng
+ rm -rf ./vendor/cache/
+ set +e
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ set -e
+ '[' 127 '!=' 0 ']'
+ echo 'ERROR: Failed to run '\''bundle package'\'' after 3 attempts'
+ exit 127
':
exit status 127
Exit code 1
OS is a Mac.
Please include/generate a values-ha.yaml so we can choose from a test/eval env to a prod env.
I see SCF update the secret name and modify the way to get it in make/run
get_uaa_secret_name() {
kubectl get pod mysql-0 --namespace "${UAA_NAMESPACE}" \
-o jsonpath='{@.spec.containers[0].env[?(@.name=="MONIT_PASSWORD")].valueFrom.secretKeyRef.name}'
}
But it hardcode mysql-0 in make/run
, and we are not using mysql in our env, so it will fail here.
I think SCF can change it more common, such as get the first pod name as:
pod_name=`kubectl get pod --namespace "${UAA_NAMESPACE}" | grep "\-0" | awk '{ print $1 }' | head -1`
Then you don't need hardcode a pod name here.
Pls let me know if it is better, I can provide a PR if need.
There is no jq installed in the stemcell of fissile-stemcell-ubuntu:trusty-7.ga393242-2.3
Sorry that I am just asking a question here, since I didn't find a better place to ask questions, like Slack. I am just wondering when Helm charts for CF/Diego will be officially released (GAed), we expect to do some experimental or pilot with these charts as early as possible. And will it support Ubuntu OS? Thanks.
Please document the need for Base Development (pattern) and libvirt-devel for the vagrant-libvirt plugin to install successfully.
I tried to follow the guide of Deploying SCF on Vagrant to deploy CF on Vagrant, but ran into the following problem, is there a way to install stampy properly to work around it? Thanks.
/home/vagrant/scf/make/docker-deps
42.2-6.ga651b2d-28.33: Pulling from splatform/fissile-stemcell-opensuse
Digest: sha256:b7d9c65ef8ea1064bd5d4797e1e6e837ce8fbb9eb812b4f9a2259c76126fb780
Status: Image is up to date for splatform/fissile-stemcell-opensuse:42.2-6.ga651b2d-28.33
/home/vagrant/scf/make/bosh-release src/diego-release
/home/vagrant/scf/bin/create-release.sh: line 18: stampy: command not found
Makefile:82: recipe for target 'diego-release' failed
make: *** [diego-release] Error 127
After playing around with the vagrant box for a while, doing some make stop
and make run
, stopping and starting the box and other experiments, the /mnt
partition inside the vagrant box get 100% full:
vagrant@vagrant-kube:~/scf> df -h │
Filesystem Size Used Avail Use% Mounted on │
devtmpfs 4.9G 0 4.9G 0% /dev │
tmpfs 4.9G 0 4.9G 0% /dev/shm │
tmpfs 4.9G 4.9M 4.9G 1% /run │
tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup │
/dev/vda2 9.0G 3.4G 5.3G 39% / │
/dev/vda2 9.0G 3.4G 5.3G 39% /.snapshots │
/dev/sda1 99G 94G 140M 100% /mnt │
192.168.121.1:/home/dimitris/workspace/suse/scf/.fissile/.bosh 916G 131G 785G 15% /home/vagrant/.bosh │
192.168.121.1:/home/dimitris/workspace/suse/scf 916G 131G 785G 15% /home/vagrant/scf │
tmpfs 1000M 0 1000M 0% /run/user/1000
Is there a way to clean it up an release some space? I think even if I increase the size of it it is just a matter of time until this get filled up again.
Here are the output of: sudo find /mnt -type f -size +200M -exec ls -lh {} \;
ant 1002 513M Jun 21 08:33 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:33 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:21 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 457M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/2c/8e/2c8e7a97-41b5-4a03-ab1b-8a52baa728a5_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92 │
-rw------- 1 vagrant 1002 307M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/c1/9e/c19e43d3-a767-4d6a-9015-35210d7730c2_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250 │
-rw------- 1 vagrant 1002 324M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/6f/7c/6f7c30e5-27e5-439e-ad84-0fbca820b3a3_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82 │
-rw------- 1 vagrant 1002 341M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/c5/23/c523279e-2e52-4583-bebb-ffdf48980a7c_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:06 /mnt/hostpath/34118b40-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:05 /mnt/hostpath/34118b40-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 09:03 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:03 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:33 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 04:51 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:51 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:33 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 341M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/18/b6/18b6458a-761a-4683-b83f-172950e80c62_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec │
-rw------- 1 vagrant 1002 457M Jun 20 13:43 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/16/7a/167a8ea5-bb95-4597-80d9-65ad17fc0406_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92 │
-rw------- 1 vagrant 1002 307M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/9b/0f/9b0f902e-7efb-462c-8d1e-939bcd0d3a60_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250 │
-rw------- 1 vagrant 1002 324M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/7e/2f/7e2fb51b-6af1-461b-8ba3-b791358d3ac6_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82 │
-rw------- 1 vagrant 1002 513M Jun 21 08:18 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:18 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:53 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 20 08:50 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 08:55 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 08:47 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 307M Jun 21 08:31 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/3e/f0/3ef01065-6b8e-4764-83c7-402a769e5cb5_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250 │
-rw------- 1 vagrant 1002 341M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/b5/e0/b5e058cb-0013-4c18-9f75-52231f74703b_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec │
-rw------- 1 vagrant 1002 457M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/50/3a/503a0ad4-2d30-456e-829d-44dfe5277ddb_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92 │
-rw------- 1 vagrant 1002 324M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/f1/e0/f1e0a678-1680-402e-9b4f-39d49fedb2f2_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82 │
-rw------- 1 vagrant 1002 513M Jun 21 04:51 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:51 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:34 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 457M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/ed/8d/ed8de3b4-f114-45a4-a497-08d767887065_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92 │
-rw------- 1 vagrant 1002 307M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/7f/01/7f01b559-f137-489d-9fc4-0304ef11f29e_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250 │
-rw------- 1 vagrant 1002 341M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/0f/45/0f45a159-629a-42e3-a268-aee6d588a8fe_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec │
-rw------- 1 vagrant 1002 324M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/01/57/0157eab7-5512-4766-a796-21ad5022130a_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82 │
-rw------- 1 vagrant 1002 513M Jun 21 08:19 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:19 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:54 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 09:03 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:03 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:34 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 20 11:36 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:35 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 10:10 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 457M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/37/c7/37c70b7b-eb42-4652-a97f-1bae561d59ec_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92 │
-rw------- 1 vagrant 1002 307M Jun 21 03:38 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/df/7c/df7c08ba-f2b0-4631-b3a1-2f703a082a1f_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250 │
-rw------- 1 vagrant 1002 324M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/15/9e/159e0d53-d6db-43ae-b35d-ff3ebd159edf_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82 │
-rw------- 1 vagrant 1002 341M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/2f/5d/2f5d3302-52be-4ba0-9aaf-396d84ffd4e1_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec │
-rw------- 1 vagrant 1002 513M Jun 21 09:19 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:19 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:08 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 03:30 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:30 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:38 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 08:32 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:32 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:19 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 20 13:35 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:35 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 10:11 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/ib_logfile1 │
-rw------- 1 vagrant 1002 513M Jun 21 09:06 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/galera.cache │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:06 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0 │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:04 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1
Hi, after talked with @viovanov ,
Recently there is a change that cf change nats name from nats to go-nats, and the repo ulr is also changed:
https://github.com/cloudfoundry/diego-release/tree/20c48cc98c7b78aff2c65a5c3df2b5ff612e2d61/src/github.com/nats-io
The original nats one is missed:
https://github.com/nats-io/nats/tree/c0ad3f079763c06c3ce94ad12fa3f17e78966d99
manual workaround:
git submodule sync --recursive ; git submodule update --init
and change .gitmodule:
[submodule "src/github.com/nats-io/nats"]
path = src/github.com/nats-io/nats
url = https://github.com/nats-io/go-nats
branch = master
by perl -pi -e "s/https:\/\/github.com\/nats-io\/nats/https:\/\/github.com\/nats-io\/go-nats/g" .gitmodules
Open this issue and wait for official fix from SCF team.
Do we have documentation of how to deploy SCF on GKE?
I am reading the one on kubernetes and AWS / EKE, what changes do I need to make to deploy on GKE?
When kube upgrade 1.9, kubectl default show all pods including terminated one from group talks.
Th jobs-run-to-completion will show in command output like this:
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.11-2+fa6873d3e386d7", GitCommit:"fa6873d3e386d7ead42923b24aea3b76e74395a3", GitTreeState:"clean", BuildDate:"2018-04-17T08:10:40Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
bjxzi$ kubectl get pods -n uaa
NAME READY STATUS RESTARTS AGE
postgres-0 1/1 Running 0 16h
secret-generation-1-lwn44 0/1 Completed 0 16h
uaa-77448b8688-9tk4k 1/1 Running 0 16h
uaa-77448b8688-cvq9g 1/1 Running 0 16h
bjxzi$ kubectl get pods -n uaa | awk '{ if (match($2, /^([0-9]+)\/([0-9]+)$/, c) && c[1] != c[2]) { print ; exit 1 } }'
secret-generation-1-lwn44 0/1 Completed 0 16h
The awk script is used in https://github.com/SUSE/scf/blob/develop/make/uaa-wait#L5
While debugging some other issue I came across TLS handshake errors
messages in various components.
i.e. the gorouter
:
router/1:/var/vcap/sys/log/gorouter# tail gorouter.err.log
[2018-12-13 09:30:12+0000] 2018/12/13 09:30:12 http: TLS handshake error from 172.16.6.0:36751: EOF
[2018-12-13 09:30:16+0000] 2018/12/13 09:30:16 http: TLS handshake error from 172.16.0.1:56141: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.6.0:60281: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.0.1:9568: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.6.0:48117: EOF
[2018-12-13 09:30:21+0000] 2018/12/13 09:30:21 http: TLS handshake error from 172.16.0.1:11520: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.10.0:45172: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.6.0:48755: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.6.0:34185: EOF
[2018-12-13 09:30:26+0000] 2018/12/13 09:30:26 http: TLS handshake error from 172.16.10.0:41303: EOF
or diego-brain
:
diego-brain/1:/var/vcap/sys/log/auctioneer# tail auctioneer.stderr.log
2018/12/13 09:36:14 http: TLS handshake error from 172.16.1.66:41364: EOF
2018/12/13 09:36:24 http: TLS handshake error from 172.16.1.66:41756: EOF
2018/12/13 09:36:34 http: TLS handshake error from 172.16.1.66:42066: EOF
2018/12/13 09:36:44 http: TLS handshake error from 172.16.1.66:42486: EOF
2018/12/13 09:36:54 http: TLS handshake error from 172.16.1.66:42840: EOF
2018/12/13 09:37:04 http: TLS handshake error from 172.16.1.66:43240: EOF
2018/12/13 09:37:14 http: TLS handshake error from 172.16.1.66:43630: EOF
2018/12/13 09:37:24 http: TLS handshake error from 172.16.1.66:43972: EOF
2018/12/13 09:37:34 http: TLS handshake error from 172.16.1.66:44304: EOF
2018/12/13 09:37:44 http: TLS handshake error from 172.16.1.66:44732: EOF
Interestingly enough is that this applies on SCF in various versions (seen in cf-2.13.3 and cf-2.14.5), running both onprem on SUSE CaaSP v3 and Azure AKS.
I've tried to trace back the IP's mentioned but didn't succeed. Does anyone have the same behaviour and maybe an idea to trace back the component?
Cheers
I was changing my base DNS and redeploying but inside router-0
I was seeing errors from bad x509 certs that contained the old domains. I guessed deleting + reinstalling wasn't recreating the secrets?
I can confirm that deleting the helm chart installations does not delete the generated secrets.
$ eirini helm delete cf --purge
release "cf" deleted
$ eirini helm delete uaa --purge
release "uaa" deleted
$ kubectl get secrets --all-namespaces
NAMESPACE NAME TYPE DATA AGE
cf default-token-29hjx kubernetes.io/service-account-token 3 1h
cf secrets-2.14.0-1 Opaque 157 1h
...
uaa default-token-7bq9k kubernetes.io/service-account-token 3 2h
uaa secrets-2.14.0-1 Opaque 26 1h
Perhaps a corollary is that deleting the charts does not delete the namespaces, which would have deleted the secrets?
Hi all,
I am using the latest SCF for my deployment. But I think I found an issue that HA function cannot work fine with new Secret-Generator function.
Because secret-generation
job will use a property to generate certs in role-manifest:
properties.scf.secrets.variables: '((DOMAIN))((KUBE_SIZING_API_COUNT))((KUBE_SIZING_CC_UPLOADER_COUNT))((KUBE_SIZING_DIEGO_API_COUNT))((KUBE_SIZING_DIEGO_BRAIN_COUNT))((KUBE_SIZING_DIEGO_CELL_COUNT))((KUBE_SIZING_ETCD_COUNT))'
But for example KUBE_SIZING_DIEGO_CELL_COUNT
is got from count setting in values.yml.
But if I set the HA: true
in values.yml but not change the count setting of diego-cell, SCF will still use the count setting instead.
For example, the initial size of diego-cell is 1, I use HA mode and set HA count is 6, but secret-generation will still use 1 as the count of diego-cell.
Then only diego-cell-0 works fine and others will fail at auction:
{"timestamp":"1514257592.442109346","source":"auctioneer","message":"auctioneer.auction.failed-to-get-state","log_level":2,"data":{"cell-guid":"diego-cell-4","error":"Get https://diego-cell-4.diego-cell-set.cf.svc.cluster.local:1801/state: x509: certificate is valid for diego-cell, *.diego-cell, diego-cell.cf.svc, *.diego-cell.cf.svc, diego-cell.cf.svc.cluster.local, *.diego-cell.cf.svc.cluster.local, diego-cell-0.diego-cell-set, diego-cell-0.diego-cell-set.cf.svc, diego-cell-0.diego-cell-set.cf.svc.cluster.local, diego-cell.cf.svc.cluster.local, *.diego-cell.cf.svc.cluster.local, rep_server, not diego-cell-4.diego-cell-set.cf.svc.cluster.local","session":"629146"}}
I am not sure if anyone uses HA mode, only HA and non-HA mode is not enough for the end user multiple choice I think.
Pls let me know if I understand wrong.
Thanks! :)
Hey there, @qu1queee and I investigated an issue colleagues of us had with a version of SCF we have deployed. In that cluster, nats
is a statefulset with two nodes. The first one came up just fine, the second one failed to start with some weird error messages.
We debugged it so far that we think that configure-HA-hosts.sh relies on a certain structure of Kubernetes API result JSON that changes between version 1.8 and 1.9 of Kubernetes.
In 1.8, the JSON result has kubernetes.io/created-by
:
{
"checksum/config": "[...]",
"kubernetes.io/created-by": "{[...],\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"StatefulSet\",\"name\":\"nats\", [...]}"
}
From what we can see, our 1.9 Kubernetes cluster API result does not have anything like that. There seems to be a new section with details about the name that could be used.
We wanted to check with you if you encountered a similar issue, too? Thanks in advance for your feedback.
Could you please explain the syntax for ((^HCP_COMPONENT_INDEX))and ((/HCP_COMPONENT_INDEX)) in role-manifest.yaml
what is the value of "index" if HCP_COMPONENT_INDEX=0? Thanks!
index: ((HCP_COMPONENT_INDEX))((^HCP_COMPONENT_INDEX))0((/HCP_COMPONENT_INDEX))
When I try to run make kube
as per the documentation here
I get errors like:
vagrant@vagrant-kube:~/scf> make kube
/home/vagrant/scf/make/uaa-kube
make[1]: Entering directory '/home/vagrant/scf/src/uaa-fissile-release'
/home/vagrant/scf/src/uaa-fissile-release/make/kube-configs
Loading defaults from env files
Secret 'MYSQL_ADMIN_PASSWORD' has no value
Makefile:16: recipe for target 'kube-configs' failed
make[1]: *** [kube-configs] Error 1
make[1]: Leaving directory '/home/vagrant/scf/src/uaa-fissile-release'
Makefile:206: recipe for target 'uaa-kube' failed
make: *** [uaa-kube] Error 2
I think the problem lies in this line:
where the files we use do not include this one:
https://github.com/SUSE/scf/blob/develop/bin/settings/settings.env
changing the above line to --defaults-file="$(echo env/*.env | tr ' ' ','),../../bin/settings/settings.env" \
makes it work but I'm not sure this is the right way to do it.
Hi,
I heard from community that Garden will use the overlayFS2 for the Storage Driver so that the Garden is not supported using on Kubernetes as nested container (container on container)
garden is using (IIRC) aufs, and that seems to work (sometimes) over aufs, maybe even sometimes over overlay for reasons we cannot identify (it "shouldn't" work)
When garden switches to overlay, overlay checks to see, "am I nested" and pukes if isNested == true
So do you know this change? And can SCF still work fine if using the new Garden on Kube?
Thanks!
Hi @mook-as ,
I see you upgrade fissile to support --use-secrets-generator.
But when I use the official fissile-5.1.0+20.g253c0ab from fissile master. I see only fissile build helm
supports parameter --use-secrets-generator, but fissile build kube
doesn't support.
Then when make vagrant-prep
, it will call make images
but make images
will finally call make helm
and make kube
:
images: bosh-images uaa-images helm kube
Then the build process will fail at make kube because:
make/uaa-kube
make[1]: Entering directory /bcf-maker/bcf/src/scf/src/uaa-fissile-release' make/kube unknown flag: --use-secrets-generator make[1]: *** [kube] Error 1 make[1]: Leaving directory
/bcf-maker/bcf/src/scf/src/uaa-fissile-release'
make: *** [uaa-kube] Error 2
When I execute fissile build kube --help
:
Flags:
-D, --defaults-file string Env files that contain defaults for the parameters generated by kube
--output-dir string Kubernetes configuration files will be written to this directory (default ".")
--tag-extra string Additional information to use in computing the image tags
--use-memory-limits Include memory limits when generating kube configurations (default true)
I also didn't find parameter --use-secrets-generator.
Can you help check, thanks!
Hi,
Function failed to get secret when I provisioned multi uaa pods. This command failed:
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Because the method to get secret name/key received duplicate value:
bjxzi$ kubectl get pods --namespace uaa
NAME READY STATUS RESTARTS AGE
postgres-0 1/1 Running 0 18h
secret-generation-1-tmbg5 0/1 Completed 0 18h
uaa-77448b8688-m89fb 1/1 Running 0 18h
uaa-77448b8688-nwzct 1/1 Running 0 18h
bjxzi$ kubectl get pods --namespace uaa -o 'jsonpath={.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}'
secrets-270.28.0-1 secrets-270.28.0-1
When you check the pod status from kubectl proxy
or kube api http://localhost:4194/api/v1.3/docker/
, you can view other pods' cpu and memory normally. But only diego-cell cpu and memory are all empty.
You can see this issue by two ways:
1, in vagrant, docker ps
and get diego-cell pod container id, then curl http://localhost:4194/api/v1.3/docker/<Container_id>, then you can see all cpu and memory stats are zero
2, view from 'kubectl proxy`:
Background
api pod will fetch property in cc_uploader by BOSH links, like here. https://github.com/cloudfoundry/capi-release/blob/develop/jobs/cloud_controller_ng/templates/cloud_controller_ng.yml.erb#L360
Description
This works well during the first deployment, but when we inject a sidecar container into the api pod, it will trigger a pod recreate, sometimes
there's one error occurs like below(can't be reproduced 100%).
+ bash /opt/fissile/startup/scripts/forward_logfiles.sh
+ configgin --jobs /opt/fissile/job_config.json --env2conf /opt/fissile/env2conf.yml
Resolving link cc_uploader via service {"role"=>"cc-uploader", "job"=>"cc_uploader"}
/usr/local/rvm/gems/ruby-2.3.5/gems/bosh-template-2.0.0/lib/bosh/template/evaluation_link.rb:30:in `p': Can't find property '["internal_hostname"]' (Bosh::Template::UnknownProperty)
from /var/vcap/jobs-src/cloud_controller_ng/templates/cloud_controller_api.yml.erb:445:in `block in get_binding'
from /usr/local/rvm/gems/ruby-2.3.5/gems/bosh-template-2.0.0/lib/bosh/template/evaluation_context.rb:152:in `if_link'
from /var/vcap/jobs-src/cloud_controller_ng/templates/cloud_controller_api.yml.erb:443:in `get_binding'
from /usr/local/rvm/rubies/ruby-2.3.5/lib/ruby/2.3.0/erb.rb:864:in `eval'
from /usr/local/rvm/rubies/ruby-2.3.5/lib/ruby/2.3.0/erb.rb:864:in `result'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/lib/job.rb:57:in `generate'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:79:in `block (2 levels) in <top (required)>'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:78:in `each'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:78:in `block in <top (required)>'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:75:in `each'
from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:75:in `<top (required)>'
from /usr/local/rvm/gems/ruby-2.3.5/bin/configgin:23:in `load'
from /usr/local/rvm/gems/ruby-2.3.5/bin/configgin:23:in `<main>'
from /usr/local/rvm/gems/ruby-2.3.5/bin/ruby_executable_hooks:15:in `eval'
from /usr/local/rvm/gems/ruby-2.3.5/bin/ruby_executable_hooks:15:in `<main>'
These files job_config.json
and env2conf.yml
are generated during build time by fissile, right?
So there should be no change for the first deployment and the restart.
Is there some race issues here?
In some cf component, like nats or doppler etc, there is a Timestamp failed
of rsyslog_file if you do monit summary
after the pod is started for a while.
nats/1:/# monit summary
File 'rsyslog_file' Timestamp failed
The error is caused by file /var/log/message which has not been updated for more than 65 minutes, according to the monit definition, if timestamp is greater than 65 minutes, there will be an alert.
/var/log/message is used for general syslog messages, there isn't so much log as VM do. This config /etc/monit/monitrc.d/rsyslog
is baked in stemcell .
check file rsyslog_file with path /var/log/messages
group rsyslogd
if timestamp > 65 minutes then alert
host-192-168-123-38:~/deploy # kubectl create --namespace=scf --filename="kube/cf/bosh-task/acceptance-tests.yaml"
Error from server (BadRequest): error when creating "kube/cf/bosh-task/acceptance-tests.yaml": Pod in version "v1" cannot be handled as a Pod: [pos 818]: json: expect char '"' but got char '2'
Problem
Pushing apps will fail while staging because of error 137 (a memory error, afaik).
Bad Solution
Setting DIEGO_CELL_MEMORY_CAPACITY_MB
to a value like 4096
works.
Detail
This script https://github.com/SUSE/scf/blob/master/container-host-files/etc/scf/config/scripts/set-diego-cell-memory-limits.sh doesn't work.
total memory is:
$ cat /proc/meminfo | awk '/MemTotal:/ { printf "%.0f\n", $2 * 1024 }'
16820776960
and:
$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || bc <<< '2 ^ 63'
9223372036854771712
usually seems to be the highest possible page size (int64).
which means that if DIEGO_CELL_MEMORY_CAPACITY_MB
is set to auto
the value will stay unset and staging apps that need to be built usually fails.
Environment
Release manifest: /home/vagrant/scf/src/grootfs-release/dev_releases/grootfs/grootfs-0.20.0+dev.1.yml
/home/vagrant/scf/make/compile
Please allow a long time for mariadb to compile
Compiling packages for dev releases:
diego (1.19.0+dev.1)
etcd (104+dev.1)
garden-runc (1.8.0+dev.1)
cf-mysql (35+dev.1)
cflinuxfs2 (1.133.0+dev.1)
cf-opensuse42 (0+dev.1)
routing (0.157.0+dev.1)
hcf (0+dev.1)
grootfs (0.20.0+dev.1)
capi (1.30.0+dev.1)
loggregator (89+dev.1)
nats (18+dev.1)
consul (157+dev.1)
binary-buildpack (1.0.11+dev.1)
go-buildpack (1.7.19+dev.1)
java-buildpack (3.14+dev.1)
nodejs-buildpack (1.5.30+dev.1)
php-buildpack (4.3.29+dev.1)
python-buildpack (1.5.16+dev.1)
ruby-buildpack (1.6.35+dev.1)
staticfile-buildpack (1.4.0+dev.1)
depdone: cf-mysql/cf-mysql-broker - mysqlclient
depdone: cf-mysql/cf-mysql-broker - ruby
compile: cf-mysql/cf-mysql-broker
compilation-cf-mysql-broker > Compiling to /var/vcap/packages/cf-mysql-broker
compilation-cf-mysql-broker > ./packaging: line 4: rsync: command not found
done: cf-mysql/cf-mysql-broker
result > failure: cf-mysql/cf-mysql-broker - Error - compilation for package cf-mysql-broker exited with code 127
Error compiling packages: Error - compilation for package cf-mysql-broker exited with code 127
Makefile:178: recipe for target 'compile' failed
make: *** [compile] Error 1
You should push the latest fissile-stemcell-opensuse image which is including rsync command to docker.io
Hi all,
I see you change the name of diego-database to diego-api (although I don't know why:)), and you remove the volume. But the diego-api without volume will change from statefulset to deployment.
I am not sure if it is as design. But I think there is one error in bbs.json config is:
"advertise_url":"https://diego-api-184916.diego-api-set.cf.svc.cluster.local:8889"
From the name of diego-api-set
I think this diego-api should be the statefulset, I am not sure if there is any error if use wrong advertise_url.
But my real problem is:
Have you tried to have multiple diego-api? (CF officially suggest have >=2 diego-api for HA)
I have tried to have two diego-api by using both deployment and statefulset, ONLY diego-api-0 can startup and work, because the readiness of others are fail (bbs job is not start correctly and 8889 port is not on).
I check the bbs log and found that the bbs startup is hang at the beginning to get consul lock:
{"timestamp":"1511534594.314656019","source":"bbs","message":"bbs.consul-lock.acquiring-lock","log_level":1,"data":{"key":"v1/locks/bbs_lock","session":"2","value":"{\"id\":\"897f63fc-90dd-4c38-510a-f479e8a59b6d\",\"url\":\"https://diego-api-1.diego-api-set.cf.svc.cluster.local:8889\"}"}}
I think because diego-api-0 is holding the lock, and I also see an issue about this:
cloudfoundry/diego-release#201
In our CF env on VM, we can have two BBS/diego-api, but I really don't know why we cannot have multiple diego-api on SCF.
Can you give me some suggestion? Thank you so much!
Hi @viovanov @manno @jandubois,
To integrate Istio with scf, we need to add support for Istio deployment in scf, although as you may know the deployment is independent and could be done anywhere, some Kubernetest platform may have already integrated Istio as a service, in scf I think we can add one more step make install-istio
to cover this part. What do you think ?
And there're some slightly different configurations for Istio deployment between different Kubernetes platform(https://istio.io/docs/setup/kubernetes/), so I would like to know which Kubernetest need to support?
Firstly in Vagrant VM, it is hyperkube
, right?
Hello,
For weird reasons, we wanted to bump the diego-release
in SCF a couple of versions and ran into an issue with the readiness check done by the diego-access
pod. In a future SCF version, you could have the same issue, too.
So it is not of immediate concern, but we thought we share our observations already. The file-server
package was bumped to a newer version. This comes with an API change, which manifests itself by returning a 401
when requesting a directory. Since the readiness check requests /v1/static/
(/v1/static
will be 302-redirected to /v1/static/
), which is a directory, an offending 401
is returned. This causes Kubernetes to mark the container of that pod unhealthy.
We are currently testing an alternative, and it seems the readiness check should evaluate an specific file, rather than a directory. Or, a completely different readiness check should be used.
I tried to follow this guide:
https://software.opensuse.org/download.html?project=Virtualization%3Acontainers&package=kubernetes
But it cannot work.
I tried to update manually, I can upgrade, but I don't know why your suse kubernetes 1.7 requires a low version docker docker_1_12_6, so it report runc error during create container.
So I am not sure if I follow the correct step.
Or do you have any guide?
I also try to use packer to create a new virtualbox iso, but also fail because the packages:
kubernetes-addons-kubedns-1.5.3
kubernetes-node-cni-1.5.3
don't exist.
Please help take a look, thanks a lot! :)
Hello guys,
at the moment fissile sets for the apps the same domain as for the system:
properties.app_domains: '["((DOMAIN))"]'
properties.domain: '"((DOMAIN))"'
that means, that when the api pods get recreated, they automatically re-register the domain as shared domain.
If you have a look at the source of the cloud_controller it would make sense to separate app and system domain.
Thanks for your feedback.
I see SCF uses helm list
to get/remove some releases.
But there is a limitation in the latest helm 2.8.1:
If you have many helm releases, for example more than 100 releases, when you just use helm list
you will get an error:
root@0013473c434b:/# helm list
Error: grpc: received message larger than max (9677174 vs. 4194304)
Then it will fail the build process and get nothing.
I have talked with helm team but no fix right now, there is an issue:
helm/helm#3322
For example your code will fail at:
# Determine the kube revision of the chart controlling the specified namespace.
release_version() { helm list --namespace "$1" | awk '{ print $2 }' | tail -n 1 ; }
I use some additional parameters to limit the list size like this:
release_version() { helm list -d -r --max 20 --namespace "$1" | awk '{ print $2 }' | head -n 1 ; }
We can get the latest one by using -d -r and with max size.
Let me know if you need a PR for it.
Thanks!
While trying to change the acceptance tests, i've encountered the following issue:
In a script, we create an app from a docker image in a repository (in my case I've also used a tcp route, but I do not think this influences the problem).
cf push ${HSM_SERVICE_INSTANCE} -o docker-registry.helion.space:443/rohcf/sidecar-acctests:latest -d tcp-${CF_DOMAIN} --random-route --no-start
I've also tried with other images with the same results.
when I try to start the app:
cf start ${HSM_SERVICE_INSTANCE}
Sometimes I reiceive the error: ERR Failed to talk to docker registry: ...
After one or two more tries, the app starts without any more errors.
This might be related to vmware-archive/pcfdev#22
Hi,
I would like to add smoke test in my building.
But I found I need to generate Kubernetes configuration first. and use make smoke
or make make cats
to run the tests.
But SCF prefer to use helm deploy, and I have to re-generate kube config for tests...
Not sure if SCF can generate smoke/cats test like post-deployment-setup. then I can use it directly.
I have a bosh release and in it there is a property: broker.catalog, the value of it is a mapping. When I build my bosh release it threw an error that no bosh release use "broker.catalog.services". The "services" is just a sub-elements of catalog while catalog is used by my bosh release. Is it a bug of scf?
catalog:
services:
- id: autoscaler-guid
name: autoscaler
description: Automatically increase or decrease the number of application instances based on a policy you define.
bindable: true
plans:
- id: autoscaler-free-plan-id
name: autoscaler-free-plan
description: This is the free service plan for the Auto-Scaling service.
Could we get kubectl logs
& kail
output to include the logs from the CF component's /var/vcap/sys/log files? Or is there a way to do this without kubectl exec
?
There is some problem with repositories from src/
After
git submodule sync --recursive
git submodule update --init --recursive
some repositories are empty. For instance src/diego-release and I have the following error:
Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:suse/testbrain.git' into submodule path '/home/vagrant/scf/src/hcf-release/src/github.com/suse/testbrain' failed
Failed to clone 'src/hcf-release/src/github.com/suse/testbrain' a second time, aborting
Following the Eirini instructions, when I helm install cf
the api-0 pod cannot find its image
$ kubectl describe pod -n cf api-0
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 99s default-scheduler Successfully assigned cf/api-0 to gke-knative-default-pool-147e217d-twk3
Warning Failed 28s kubelet, gke-knative-default-pool-147e217d-twk3 Failed to pull image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67": rpc error: code = Canceled desc = context canceled
Warning Failed 28s kubelet, gke-knative-default-pool-147e217d-twk3 Error: ErrImagePull
Normal BackOff 27s kubelet, gke-knative-default-pool-147e217d-twk3 Back-off pulling image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67"
Warning Failed 27s kubelet, gke-knative-default-pool-147e217d-twk3 Error: ImagePullBackOff
Normal Pulling 15s (x2 over 98s) kubelet, gke-knative-default-pool-147e217d-twk3 pulling image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67"
hi scf team,
we successfully deployed scf 2.7.0 on a kubespray kubernetes cluster.
as host/node os, we currently use ubuntu on aws with overlay2 as docker storage driver.
while deploying apps, we noticed "Disk quota exceeded" and likewise exceptions all over
(like from npm and other package managers).
we increased the maximal disk space of apps from 2GB to 10 GB
(using MAX_APP_DISK_IN_MB helm value -> cc.maximum_app_disk_in_mb)
if we now push the app while specifying the disk size with e.g. -k 10G it magically works.
the disk usage calculation with "cf app {app-name}" seems correct ~ 100 MB
we assume, the quota calculation on the diego-cell (fissile?) wrongly sums up both, the app's stack size (opensuse / cflinux2) and the app size, and is therefore more or less always over its quota.
let me know if I can provide any information to help fixing that issue
best regards, adrian
Quick question, does the release notes, have any information of the current cf-deployment
version used in that SCF release? If not, can we add this?
Thanks.
Enrique Encalada
IBM
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.