Coder Social home page Coder Social logo

scf's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scf's Issues

requests to v3 api over http instead of https

I'm working with CAP 1.2 (chart version 2.13.3) and looking forward to the new charts for 2.14.5. Maybe this issue is already solved in newer versions.

CF API delivers v3 urls with http:// instead of https://. We are using a loadbalancer in front of CF und don't wanna have a http-available API so the loadbalancer makes a redirect to https. CF cli doesn't like that at all and leads to errors like this:

> CF_TRACE=1 cf org system
Getting info for org system as admin...  
[...]  
REQUEST: [2018-11-29T13:48:51+01:00]  
GET /v3/isolation_segments?organization_guids=679703cb-33c1-4848-b66a-974cf01f21c6 HTTP/1.1  
[...]  
RESPONSE: [2018-11-29T13:48:51+01:00]  
HTTP/1.1 401 Unauthorized  
[...]  
Authentication error  
FAILED  

I direct api call /v3/isolation_segments does work because it goes over https. Inside calls like the one above the client makes a get request on /v3 to receive the v3 urls/paths which seems to be used afterwards:

> cf curl /v3  
[...]  
      "isolation_segments": {  
         "href": "http://api.my.domain/v3/isolation_segments"  
      },  
[...]  

Privileged must be enabled in kube-apiserver and kubelet

Hi team:
When i use kube-ready-state-check.sh to test k8s env,it show:

Configuration problem detected: swapaccount enable
Verified: docker info should not show aufs
Verified: kube-dns should be running (show 4/4 ready)
Verified: tiller should be running (1/1 ready)
Verified: An ntp daemon or systemd-timesyncd must be installed and active
Verified: A storage class should exist in K8s
Configuration problem detected: Privileged must be enabled in 'kube-apiserver'
Configuration problem detected: Privileged must be enabled in 'kubelet'
Verified: TasksMax must be set to infinity

But i already config kubelet and kube-apiserver --all-privileged=true

root     102704 13.6  0.3 1639560 95088 ?       Ssl  06:14   0:01 /usr/bin/kubelet --allow-privileged=true --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni


root      38613  8.9  1.3 409428 346476 ?       Ssl  05:29   3:51 kube-apiserver --authorization-mode=Node,RBAC --advertise-address=192.168.74.130 --allow-privileged=true --client-ca-file=/etc/kubernetes/pki/ca.crt --disable-admission-plugins=PersistentVolumeLabel --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0

secret name is gathered incorrectly

demo:~ # SECRET=$(kubectl get pods --namespace uaa -o jsonpath='{.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretK
eyRef.name}')
demo:~ # echo $SECRET
secrets-2.13.3-1 secrets-2.13.3-1 <<----- HERE
demo:~ # CA_CERT="$(kubectl get secret $SECRET --namespace uaa -o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"
demo:~ # echo $CA_CERT

demo:~ #

 

demo:~ # SECRET=secrets-2.13.3-1
demo:~ # kubectl get secret $SECRET --namespace uaa -o jsonpath="{.data['internal-ca-cert']}"
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU4akNDQXRxZ0F3SUJBZ0lVUFhKczRlbzFERnpjY1NtcFRPeVd3QT

Allow well-known IPs to be used by key load balancers

We have a use case whereby we provision static IPs on our IaaS and configure DNS to point to these, before deploying SCF. This allows users to install UAA and SCF one after the other without stopping to reconfigure DNS with whichever LoadBalancer IP the IaaS happened to choose.

Suggest that three properties are added to the respective Helm charts:

services.uaa_load_balancer_ip
services.router_load_balancer_ip
services.ssh_load_balancer_ip

And then the Helm charts (or whatever bit of Fissile generates them) are changed to contain the following logic:

    {{- if .Values.services.loadbalanced }}
    type: "LoadBalancer"
    {{ if .Values.services.router_load_balancer_ip }}loadBalancerIP: {{ .Values.services.router_load_balancer_ip }} {{ end }}
    {{- end }}

How to config UAA-HOST? tcp-router connect to uaa fail.

Hi team:
My config is:

env:
    # Domain for SCF. DNS for *.DOMAIN must point to a kube node's (not master)
    # external ip address.
    DOMAIN: cf-dev.io

    # UAA host/port that SCF will talk to. If you have a custom UAA
    # provide its host and port here. If you are using the UAA that comes
    # with the SCF distribution, simply use the two values below and
    # substitute the cf-dev.io for your DOMAIN used above.
    UAA_HOST: uaa.cf-dev.io
    # UAA_PORT: 2793

kube:
    # The IP address assigned to the kube node pointed to by the domain.
    #### the external_ip setting changed to accept a list of IPs, and was 
    #### renamed to external_ips 
    external_ips:
    - 192.168.7.22
    storage_class:
        # Make sure to change the value in here to whatever storage class you use
        persistent: "persistent"
        shared: "shared"
    auth: rbac

secrets:
    # Password for user 'admin' in the cluster
    CLUSTER_ADMIN_PASSWORD: mypaas

    # Password for SCF to authenticate with UAA
    UAA_ADMIN_CLIENT_SECRET: uaa

When i use to deploy scf , tcp-router and other compoent not work.
it status is

NAME                                READY     STATUS      RESTARTS   AGE
api-0                               0/1       Running     1          16m
blobstore-0                         1/1       Running     0          16m
cc-clock-77d84dd98-l6hkd            1/1       Running     0          16m
cc-uploader-57b668788f-6qxhq        1/1       Running     0          16m
cc-worker-74f76c677d-sjpmq          1/1       Running     0          16m
cf-usb-5b969887f9-29q9l             0/1       Running     0          16m
diego-access-5c5cb7b6d7-tdmg2       1/1       Running     0          16m
diego-api-6c8b96c597-sj5zx          1/1       Running     0          16m
diego-brain-6594dfdbc5-5rpkc        1/1       Running     0          16m
diego-cell-0                        0/1       Running     0          16m
diego-locket-6756d56bb6-k8bht       1/1       Running     0          16m
doppler-0                           1/1       Running     0          16m
loggregator-5698c4569f-pm7lt        1/1       Running     0          16m
mysql-0                             1/1       Running     0          16m
mysql-proxy-59b486f8dc-phmp9        1/1       Running     0          16m
nats-0                              1/1       Running     0          16m
nfs-broker-5555d9c585-v292p         1/1       Running     0          16m
post-deployment-setup-1-slgvn       1/1       Running     0          16m
router-5bb54cf844-cvg58             0/1       Running     0          16m
routing-api-0                       0/1       Running     0          16m
secret-generation-1-78wcv           0/1       Completed   0          16m
syslog-adapter-6f5c4d9558-db7ls     1/1       Running     0          16m
syslog-rlp-65f997ccdd-r4nlg         1/1       Running     0          16m
syslog-scheduler-7c7788df59-bxrnf   1/1       Running     0          16m
tcp-router-86cb8cd89f-665l2         0/1       Running     0          16m

router-api exception:

Events:
  Type     Reason          Age                 From                              Message
  ----     ------          ----                ----                              -------
  Normal   Scheduled       33m                 default-scheduler                 Successfully assigned scf/routing-api-0 to geekyzk-virtual-machine
  Normal   SandboxChanged  31m (x6 over 33m)   kubelet, geekyzk-virtual-machine  Pod sandbox changed, it will be killed and re-created.
  Warning  Failed          31m (x8 over 33m)   kubelet, geekyzk-virtual-machine  Error: secrets "secrets-2.8.0-1" not found
  Normal   Pulled          31m (x9 over 33m)   kubelet, geekyzk-virtual-machine  Container image "docker.io/splatform/scf-routing-api:73fa8ca792bb04a69d34be261bd240ce6ef2f705" already present on machine
  Normal   Created         31m                 kubelet, geekyzk-virtual-machine  Created container
  Normal   Started         30m                 kubelet, geekyzk-virtual-machine  Started container
  Warning  Unhealthy       3m (x165 over 30m)  kubelet, geekyzk-virtual-machine  Readiness probe failed: dial tcp 10.244.0.106:3000: connect: connection refused

tcp-router . exception is

Trying: curl --connect-timeout 5 --fail --header Accept: application/json https://scf.uaa.cf-dev.io:2793/info
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
curl: (28) Resolving timed out after 5512 milliseconds

  FAILED

Waiting 30s ...
Trying: curl --connect-timeout 5 --fail --header Accept: application/json https://scf.uaa.cf-dev.io:2793/info
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
curl: (28) Resolving timed out after 5514 milliseconds

  FAILED

Waiting 30s ...

I install bind9 dns server ,and config *.cf-dev.io -> 192.168.7.22 , but is also fail, Can you help me?

Consul cluster member cannot started normally

Sorry @viovanov ,

We need an issue to track our problem. So I open this consul issue for tracking. Thanks! :)

I have a question about Consul, I am not sure if you can start the HA Consul clusters, such as 3 members?
Such as I set 3 consuls in SCF, then SCF will render consul confab.json as:

"servers":{"lan":["consul-0.consul-set.cf.svc.cluster.local","consul-1.consul-set.cf.svc.cluster.local","consul-2.consul-set.cf.svc.cluster.local"]

The consul-0 can start correctly, but for consul-1 and consul-2, they cannot start correctly because they need to communicate each other successfully, so for service of consul, if pod is not ready, we cannot access its pod by using the pod of service name, I think it is a dead lock.....
for example, I see the error in consul-1 consul_agent.stderr.log:

* Failed to resolve consul-2.consul-set.cf.svc.cluster.local: lookup consul-2.consul-set.cf.svc.cluster.local on 172.21.0.10:53: no such host)

So consul-1 cannot startup correctly, then consul-set service cannot discover it

But later, I am not sure why consul-2 is provisioned and started, then search consul-1 then get the same error.

So these two wait each other to startup but none of them is ready then dead lock here....

But several hours later, for some reason, they may know each other correctly, then they can start up correctly. But I don't know why or how they do that.

Remove consul from SCF

Hi,

I remember SCF brings Consul back for the new auto-scaling feature.

I have confirmed with auto-scaling that they don't need consul any more and remove consul related config in auto-scaling.

Can you help remove consul job if there is no any other jobs need Consul?

@qibobo here, if you have any auto-scaling question.

Thanks!

Routing submodule should be updated

Hi,

Quick question regarding the routing-releasesubmodule. It seems that in the current scf latest release its using cf-deployment 1.9, then the correct routing-release version should be something higher than 0.166.0, apparently 0.170.0. Any ideas on this?

Regards,
Enrique Encalada

ccdb migration in statefulset api upgrade

Hi,

Right now, we are testing upgrade function in SCF. But api is a special case because it needs to do the ccdb migration in the beginning.

But api is statefulset, during upgrade, it will upgrade reversely. it mean, api-1 will be upgraded, then api-0 is upgraded then do ccdb migration. I think it is incorrect.

I check the api code:

 <% if spec.bootstrap && p('cc.run_prestart_migrations') %>
  perform_migration
  seed_db
  <% else %>
  echo "Skipping DB migrations and seeds"
  <% end %>

every time, api-0 will do perform_migration and seed_db.

I also asked in kubernetes git, there is no any option to upgrade in order for statefulset right now:
kubernetes/kubernetes#51907 (comment)

So I am not sure if we can upgrade like this:
1, set statefulset api upgradestrategy as OnDelete
make a scripts for api upgrade, after make upgrade, use script to delete api-0 -> api-1 -> etc... in order.

Or do you have any plan for that?

Thanks a lot!

Attempting to build capi-release with `make capi-release` fails

Hi folks,

CF Persi team here. We are attempting to patch our SCF+Eirini deployment to test changes we have made in order to support volume services (in Eirini).

We have a running SCF+Eirini (in cfcr actually). We have made changes to cloud_controller and to eirini. Our thinking is to remake the capi-release and separately eirini-release, push them to a docker registry and update our helm charts to use these two new images.

However, when we run:

$ cd ~/workspace/scf
$ make capi-release

It fails with the following output:

Building a release from directory '/Users/pivotal/workspace/scf/src/capi-release/':
  - Running prep scripts:
      Running command: 'bash -x /Users/pivotal/workspace/scf/src/capi-release/packages/nginx_newrelic_plugin/pre_packaging', stdout: '~/.bosh/tmp/bosh-resource-archive633828037/nginx ~/.bosh/tmp/bosh-resource-archive633828037
newrelic_nginx_agent/
newrelic_nginx_agent/config/
newrelic_nginx_agent/Gemfile
newrelic_nginx_agent/LICENSE
newrelic_nginx_agent/newrelic_nginx_agent
newrelic_nginx_agent/newrelic_nginx_agent.daemon
newrelic_nginx_agent/README.txt
newrelic_nginx_agent/config/newrelic_plugin.yml.in
~/.bosh/tmp/bosh-resource-archive633828037
~/.bosh/tmp/bosh-resource-archive633828037/nginx/newrelic_nginx_agent ~/.bosh/tmp/bosh-resource-archive633828037
', stderr: '+ set -e -x
+ pushd /home/docker-user/.bosh/tmp/bosh-resource-archive633828037/nginx
+ tar zxvf newrelic_nginx_agent-1.2.1.tar.gz
+ popd
+ pushd /home/docker-user/.bosh/tmp/bosh-resource-archive633828037/nginx/newrelic_nginx_agent
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/nginx_newrelic_plugin/pre_packaging: line 8: bundle: command not found
':
        exit status 127
  - Running prep scripts:
      Running command: 'bash -x /Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging', stdout: 'ERROR: Failed to run 'bundle package' after 3 attempts
', stderr: '+ set -e -x
+ cd /home/docker-user/.bosh/tmp/bosh-resource-archive363148158/cloud_controller_ng
+ rm -rf ./vendor/cache/
+ set +e
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ for i in '{1..3}'
+ bundle config --local specific_platform true
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 9: bundle: command not found
+ BUNDLE_WITHOUT=development:test
+ bundle package --all --all-platforms --no-install --path ./vendor/cache
/Users/pivotal/workspace/scf/src/capi-release/packages/cloud_controller_ng/pre_packaging: line 10: bundle: command not found
+ exit_code=127
+ '[' 127 == 0 ']'
+ sleep 1
+ set -e
+ '[' 127 '!=' 0 ']'
+ echo 'ERROR: Failed to run '\''bundle package'\'' after 3 attempts'
+ exit 127
':
        exit status 127
Exit code 1

OS is a Mac.

@davewalter @julian-hj

Include HA Manifest

Please include/generate a values-ha.yaml so we can choose from a test/eval env to a prod env.

hardcode mysql-0 in make/run is not good for common usage

I see SCF update the secret name and modify the way to get it in make/run

get_uaa_secret_name() {
    kubectl get pod mysql-0 --namespace "${UAA_NAMESPACE}" \
            -o jsonpath='{@.spec.containers[0].env[?(@.name=="MONIT_PASSWORD")].valueFrom.secretKeyRef.name}'
}

But it hardcode mysql-0 in make/run, and we are not using mysql in our env, so it will fail here.

I think SCF can change it more common, such as get the first pod name as:

pod_name=`kubectl get pod --namespace "${UAA_NAMESPACE}" | grep "\-0" | awk '{ print $1 }' | head -1`

Then you don't need hardcode a pod name here.

Pls let me know if it is better, I can provide a PR if need.

When will Helm charts for CF/Diego be officially released?

Sorry that I am just asking a question here, since I didn't find a better place to ask questions, like Slack. I am just wondering when Helm charts for CF/Diego will be officially released (GAed), we expect to do some experimental or pilot with these charts as early as possible. And will it support Ubuntu OS? Thanks.

vagrant env build

Please document the need for Base Development (pattern) and libvirt-devel for the vagrant-libvirt plugin to install successfully.

stampy command not found during deploying SCF on Vagrant

I tried to follow the guide of Deploying SCF on Vagrant to deploy CF on Vagrant, but ran into the following problem, is there a way to install stampy properly to work around it? Thanks.

/home/vagrant/scf/make/docker-deps
42.2-6.ga651b2d-28.33: Pulling from splatform/fissile-stemcell-opensuse
Digest: sha256:b7d9c65ef8ea1064bd5d4797e1e6e837ce8fbb9eb812b4f9a2259c76126fb780
Status: Image is up to date for splatform/fissile-stemcell-opensuse:42.2-6.ga651b2d-28.33
/home/vagrant/scf/make/bosh-release src/diego-release
/home/vagrant/scf/bin/create-release.sh: line 18: stampy: command not found
Makefile:82: recipe for target 'diego-release' failed
make: *** [diego-release] Error 127

/mnt partition becomes full

After playing around with the vagrant box for a while, doing some make stop and make run, stopping and starting the box and other experiments, the /mnt partition inside the vagrant box get 100% full:

vagrant@vagrant-kube:~/scf> df -h                                                                                                                                                                                                             │
Filesystem                                                      Size  Used Avail Use% Mounted on                                                                                                                                              │
devtmpfs                                                        4.9G     0  4.9G   0% /dev                                                                                                                                                    │
tmpfs                                                           4.9G     0  4.9G   0% /dev/shm                                                                                                                                                │
tmpfs                                                           4.9G  4.9M  4.9G   1% /run                                                                                                                                                    │
tmpfs                                                           4.9G     0  4.9G   0% /sys/fs/cgroup                                                                                                                                          │
/dev/vda2                                                       9.0G  3.4G  5.3G  39% /                                                                                                                                                       │
/dev/vda2                                                       9.0G  3.4G  5.3G  39% /.snapshots                                                                                                                                             │
/dev/sda1                                                        99G   94G  140M 100% /mnt                                                                                                                                                    │
192.168.121.1:/home/dimitris/workspace/suse/scf/.fissile/.bosh  916G  131G  785G  15% /home/vagrant/.bosh                                                                                                                                     │
192.168.121.1:/home/dimitris/workspace/suse/scf                 916G  131G  785G  15% /home/vagrant/scf                                                                                                                                       │
tmpfs                                                          1000M     0 1000M   0% /run/user/1000

Is there a way to clean it up an release some space? I think even if I increase the size of it it is just a matter of time until this get filled up again.

Here are the output of: sudo find /mnt -type f -size +200M -exec ls -lh {} \;

ant 1002 513M Jun 21 08:33 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:33 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:21 /mnt/hostpath/f300ddf6-567b-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 457M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/2c/8e/2c8e7a97-41b5-4a03-ab1b-8a52baa728a5_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92               │
-rw------- 1 vagrant 1002 307M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/c1/9e/c19e43d3-a767-4d6a-9015-35210d7730c2_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250               │
-rw------- 1 vagrant 1002 324M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/6f/7c/6f7c30e5-27e5-439e-ad84-0fbca820b3a3_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82               │
-rw------- 1 vagrant 1002 341M Jun 20 10:16 /mnt/hostpath/3a3e158e-55c2-11e7-9781-5254004205b9/shared/cc-buildpacks/c5/23/c523279e-2e52-4583-bebb-ffdf48980a7c_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec               │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:06 /mnt/hostpath/34118b40-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:05 /mnt/hostpath/34118b40-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 09:03 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:03 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:33 /mnt/hostpath/c52db049-567d-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 04:51 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:51 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:33 /mnt/hostpath/da973bbf-5653-11e7-9820-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 341M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/18/b6/18b6458a-761a-4683-b83f-172950e80c62_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec               │
-rw------- 1 vagrant 1002 457M Jun 20 13:43 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/16/7a/167a8ea5-bb95-4597-80d9-65ad17fc0406_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92               │
-rw------- 1 vagrant 1002 307M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/9b/0f/9b0f902e-7efb-462c-8d1e-939bcd0d3a60_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250               │
-rw------- 1 vagrant 1002 324M Jun 20 13:42 /mnt/hostpath/098658a6-55df-11e7-99a4-5254004205b9/shared/cc-buildpacks/7e/2f/7e2fb51b-6af1-461b-8ba3-b791358d3ac6_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82               │
-rw------- 1 vagrant 1002 513M Jun 21 08:18 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:18 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:53 /mnt/hostpath/158bd52b-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 20 08:50 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 08:55 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 08:47 /mnt/hostpath/9317d8a8-55b6-11e7-b030-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 307M Jun 21 08:31 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/3e/f0/3ef01065-6b8e-4764-83c7-402a769e5cb5_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250               │
-rw------- 1 vagrant 1002 341M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/b5/e0/b5e058cb-0013-4c18-9f75-52231f74703b_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec               │
-rw------- 1 vagrant 1002 457M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/50/3a/503a0ad4-2d30-456e-829d-44dfe5277ddb_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92               │
-rw------- 1 vagrant 1002 324M Jun 21 08:32 /mnt/hostpath/f0b76635-567b-11e7-80d5-5254004205b9/shared/cc-buildpacks/f1/e0/f1e0a678-1680-402e-9b4f-39d49fedb2f2_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82               │
-rw------- 1 vagrant 1002 513M Jun 21 04:51 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:51 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:34 /mnt/hostpath/00d0e3f3-5654-11e7-9820-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 457M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/ed/8d/ed8de3b4-f114-45a4-a497-08d767887065_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92               │
-rw------- 1 vagrant 1002 307M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/7f/01/7f01b559-f137-489d-9fc4-0304ef11f29e_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250               │
-rw------- 1 vagrant 1002 341M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/0f/45/0f45a159-629a-42e3-a268-aee6d588a8fe_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec               │
-rw------- 1 vagrant 1002 324M Jun 21 08:41 /mnt/hostpath/ee25f0e1-567d-11e7-80d5-5254004205b9/shared/cc-buildpacks/01/57/0157eab7-5512-4766-a796-21ad5022130a_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82               │
-rw------- 1 vagrant 1002 513M Jun 21 08:19 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:19 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 04:54 /mnt/hostpath/16eaf771-565f-11e7-b8f7-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 09:03 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:03 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:34 /mnt/hostpath/f0f45588-567d-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 20 11:36 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:35 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 10:10 /mnt/hostpath/39a1b894-55c2-11e7-9781-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 457M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/37/c7/37c70b7b-eb42-4652-a97f-1bae561d59ec_cf51b1b45088509623779c9856ee376658f120953798aeabd15b0d6b7f6a8f92               │
-rw------- 1 vagrant 1002 307M Jun 21 03:38 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/df/7c/df7c08ba-f2b0-4631-b3a1-2f703a082a1f_92115bd12a5157e9a30bbb82f0760cd1bc067908445207438cca1aec5595f250               │
-rw------- 1 vagrant 1002 324M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/15/9e/159e0d53-d6db-43ae-b35d-ff3ebd159edf_af401080e0ea1c4f2b009ba86ffbe3ab9d5fef38d888cbfa2e6bd919d0601c82               │
-rw------- 1 vagrant 1002 341M Jun 21 03:39 /mnt/hostpath/fe4027a7-5653-11e7-9820-5254004205b9/shared/cc-buildpacks/2f/5d/2f5d3302-52be-4ba0-9aaf-396d84ffd4e1_d843b7cf8463951ebf591bd0efc5e4e213a2dc4ee28994e452dfbc782cdc71ec               │
-rw------- 1 vagrant 1002 513M Jun 21 09:19 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:19 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:08 /mnt/hostpath/a17fbef9-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 03:30 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 03:30 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:38 /mnt/hostpath/0a3675e8-55df-11e7-99a4-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 08:32 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:32 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 08:19 /mnt/hostpath/c7309559-567b-11e7-80d5-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 20 13:35 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 13:35 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 20 10:11 /mnt/hostpath/3ad44abc-55c2-11e7-9781-5254004205b9/mysql/ib_logfile1                                                                                                                              │
-rw------- 1 vagrant 1002 513M Jun 21 09:06 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/galera.cache                                                                                                                             │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:06 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/ib_logfile0                                                                                                                              │
-rw-rw---- 1 vagrant 1002 1.0G Jun 21 09:04 /mnt/hostpath/2f7507b8-5682-11e7-80d5-5254004205b9/mysql/ib_logfile1 

diego-release submodule nats is out of date which make build process fail

Hi, after talked with @viovanov ,

Recently there is a change that cf change nats name from nats to go-nats, and the repo ulr is also changed:
https://github.com/cloudfoundry/diego-release/tree/20c48cc98c7b78aff2c65a5c3df2b5ff612e2d61/src/github.com/nats-io

The original nats one is missed:
https://github.com/nats-io/nats/tree/c0ad3f079763c06c3ce94ad12fa3f17e78966d99

manual workaround:
git submodule sync --recursive ; git submodule update --init
and change .gitmodule:

[submodule "src/github.com/nats-io/nats"]
        path = src/github.com/nats-io/nats
        url = https://github.com/nats-io/go-nats
        branch = master

by perl -pi -e "s/https:\/\/github.com\/nats-io\/nats/https:\/\/github.com\/nats-io\/go-nats/g" .gitmodules

Open this issue and wait for official fix from SCF team.

Checking unready pods failed due to completed jobs in uaa-wait

When kube upgrade 1.9, kubectl default show all pods including terminated one from group talks.

Th jobs-run-to-completion will show in command output like this:

kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-27T00:13:02Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.11-2+fa6873d3e386d7", GitCommit:"fa6873d3e386d7ead42923b24aea3b76e74395a3", GitTreeState:"clean", BuildDate:"2018-04-17T08:10:40Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

bjxzi$ kubectl get pods -n uaa
NAME                        READY     STATUS      RESTARTS   AGE
postgres-0                  1/1       Running     0          16h
secret-generation-1-lwn44   0/1       Completed   0          16h
uaa-77448b8688-9tk4k        1/1       Running     0          16h
uaa-77448b8688-cvq9g        1/1       Running     0          16h

bjxzi$ kubectl get pods -n uaa | awk '{ if (match($2, /^([0-9]+)\/([0-9]+)$/, c)  && c[1] != c[2]) { print ; exit 1 } }'
secret-generation-1-lwn44   0/1       Completed   0          16h

The awk script is used in https://github.com/SUSE/scf/blob/develop/make/uaa-wait#L5

Not all pods come up automatically when doing a `vagrant reload`

Given I am in a state where all pods are running fine, if I do a vagrant halt and vagrant up or simply a vagrant reload, then some of the pods are not coming up:

clipboard

even after half an hour. It works if I do a make stop and then make run. I can't find something useful in the failing pod logs.

TLS handshake error in various components

While debugging some other issue I came across TLS handshake errors messages in various components.
i.e. the gorouter:

router/1:/var/vcap/sys/log/gorouter# tail gorouter.err.log
[2018-12-13 09:30:12+0000] 2018/12/13 09:30:12 http: TLS handshake error from 172.16.6.0:36751: EOF
[2018-12-13 09:30:16+0000] 2018/12/13 09:30:16 http: TLS handshake error from 172.16.0.1:56141: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.6.0:60281: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.0.1:9568: EOF
[2018-12-13 09:30:17+0000] 2018/12/13 09:30:17 http: TLS handshake error from 172.16.6.0:48117: EOF
[2018-12-13 09:30:21+0000] 2018/12/13 09:30:21 http: TLS handshake error from 172.16.0.1:11520: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.10.0:45172: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.6.0:48755: EOF
[2018-12-13 09:30:22+0000] 2018/12/13 09:30:22 http: TLS handshake error from 172.16.6.0:34185: EOF
[2018-12-13 09:30:26+0000] 2018/12/13 09:30:26 http: TLS handshake error from 172.16.10.0:41303: EOF

or diego-brain:

diego-brain/1:/var/vcap/sys/log/auctioneer# tail auctioneer.stderr.log
2018/12/13 09:36:14 http: TLS handshake error from 172.16.1.66:41364: EOF
2018/12/13 09:36:24 http: TLS handshake error from 172.16.1.66:41756: EOF
2018/12/13 09:36:34 http: TLS handshake error from 172.16.1.66:42066: EOF
2018/12/13 09:36:44 http: TLS handshake error from 172.16.1.66:42486: EOF
2018/12/13 09:36:54 http: TLS handshake error from 172.16.1.66:42840: EOF
2018/12/13 09:37:04 http: TLS handshake error from 172.16.1.66:43240: EOF
2018/12/13 09:37:14 http: TLS handshake error from 172.16.1.66:43630: EOF
2018/12/13 09:37:24 http: TLS handshake error from 172.16.1.66:43972: EOF
2018/12/13 09:37:34 http: TLS handshake error from 172.16.1.66:44304: EOF
2018/12/13 09:37:44 http: TLS handshake error from 172.16.1.66:44732: EOF

Interestingly enough is that this applies on SCF in various versions (seen in cf-2.13.3 and cf-2.14.5), running both onprem on SUSE CaaSP v3 and Azure AKS.

I've tried to trace back the IP's mentioned but didn't succeed. Does anyone have the same behaviour and maybe an idea to trace back the component?

Cheers

helm delete does not remove generated secrets

I was changing my base DNS and redeploying but inside router-0 I was seeing errors from bad x509 certs that contained the old domains. I guessed deleting + reinstalling wasn't recreating the secrets?

I can confirm that deleting the helm chart installations does not delete the generated secrets.

$ eirini helm delete cf --purge
release "cf" deleted
$ eirini helm delete uaa --purge
release "uaa" deleted
$ kubectl get secrets --all-namespaces
NAMESPACE     NAME                                             TYPE                                  DATA   AGE
cf            default-token-29hjx                              kubernetes.io/service-account-token   3      1h
cf            secrets-2.14.0-1                                 Opaque                                157    1h
...
uaa           default-token-7bq9k                              kubernetes.io/service-account-token   3      2h
uaa           secrets-2.14.0-1                                 Opaque                                26     1h

Perhaps a corollary is that deleting the charts does not delete the namespaces, which would have deleted the secrets?

SCF HA mode is not supported for new Secret-Generator function

Hi all,

I am using the latest SCF for my deployment. But I think I found an issue that HA function cannot work fine with new Secret-Generator function.

Because secret-generation job will use a property to generate certs in role-manifest:
properties.scf.secrets.variables: '((DOMAIN))((KUBE_SIZING_API_COUNT))((KUBE_SIZING_CC_UPLOADER_COUNT))((KUBE_SIZING_DIEGO_API_COUNT))((KUBE_SIZING_DIEGO_BRAIN_COUNT))((KUBE_SIZING_DIEGO_CELL_COUNT))((KUBE_SIZING_ETCD_COUNT))'

But for example KUBE_SIZING_DIEGO_CELL_COUNT is got from count setting in values.yml.

But if I set the HA: true in values.yml but not change the count setting of diego-cell, SCF will still use the count setting instead.

For example, the initial size of diego-cell is 1, I use HA mode and set HA count is 6, but secret-generation will still use 1 as the count of diego-cell.

Then only diego-cell-0 works fine and others will fail at auction:

{"timestamp":"1514257592.442109346","source":"auctioneer","message":"auctioneer.auction.failed-to-get-state","log_level":2,"data":{"cell-guid":"diego-cell-4","error":"Get https://diego-cell-4.diego-cell-set.cf.svc.cluster.local:1801/state: x509: certificate is valid for diego-cell, *.diego-cell, diego-cell.cf.svc, *.diego-cell.cf.svc, diego-cell.cf.svc.cluster.local, *.diego-cell.cf.svc.cluster.local, diego-cell-0.diego-cell-set, diego-cell-0.diego-cell-set.cf.svc, diego-cell-0.diego-cell-set.cf.svc.cluster.local, diego-cell.cf.svc.cluster.local, *.diego-cell.cf.svc.cluster.local, rep_server, not diego-cell-4.diego-cell-set.cf.svc.cluster.local","session":"629146"}}

I am not sure if anyone uses HA mode, only HA and non-HA mode is not enough for the end user multiple choice I think.

Pls let me know if I understand wrong.

Thanks! :)

Statefulset nodes fail to start on Kubernetes 1.9

Hey there, @qu1queee and I investigated an issue colleagues of us had with a version of SCF we have deployed. In that cluster, nats is a statefulset with two nodes. The first one came up just fine, the second one failed to start with some weird error messages.

We debugged it so far that we think that configure-HA-hosts.sh relies on a certain structure of Kubernetes API result JSON that changes between version 1.8 and 1.9 of Kubernetes.

In 1.8, the JSON result has kubernetes.io/created-by:

{
  "checksum/config": "[...]",
  "kubernetes.io/created-by": "{[...],\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"StatefulSet\",\"name\":\"nats\", [...]}"
}

From what we can see, our 1.9 Kubernetes cluster API result does not have anything like that. There seems to be a new section with details about the name that could be used.

We wanted to check with you if you encountered a similar issue, too? Thanks in advance for your feedback.

@zhangtbj @zhanggbj @ScarletTanager FYI

`make kube` fails because of missing env variables

When I try to run make kube as per the documentation here
I get errors like:

vagrant@vagrant-kube:~/scf> make kube
/home/vagrant/scf/make/uaa-kube
make[1]: Entering directory '/home/vagrant/scf/src/uaa-fissile-release'
/home/vagrant/scf/src/uaa-fissile-release/make/kube-configs
Loading defaults from env files
Secret 'MYSQL_ADMIN_PASSWORD' has no value
Makefile:16: recipe for target 'kube-configs' failed
make[1]: *** [kube-configs] Error 1
make[1]: Leaving directory '/home/vagrant/scf/src/uaa-fissile-release'
Makefile:206: recipe for target 'uaa-kube' failed
make: *** [uaa-kube] Error 2

I think the problem lies in this line:

https://github.com/SUSE/uaa-fissile-release/blob/001f793a0890d0a240eb3f1447eefcef81895f6a/make/kube-configs#L15

where the files we use do not include this one:
https://github.com/SUSE/scf/blob/develop/bin/settings/settings.env

changing the above line to --defaults-file="$(echo env/*.env | tr ' ' ','),../../bin/settings/settings.env" \ makes it work but I'm not sure this is the right way to do it.

Does new Garden still support nested container when using SCF?

Hi,

I heard from community that Garden will use the overlayFS2 for the Storage Driver so that the Garden is not supported using on Kubernetes as nested container (container on container)

garden is using (IIRC) aufs, and that seems to work (sometimes) over aufs, maybe even sometimes over overlay for reasons we cannot identify (it "shouldn't" work)

When garden switches to overlay, overlay checks to see, "am I nested" and pukes if isNested == true

So do you know this change? And can SCF still work fine if using the new Garden on Kube?

Thanks!

Latest fissile doesn't support --use-secrets-generator for build kube

Hi @mook-as ,

I see you upgrade fissile to support --use-secrets-generator.

But when I use the official fissile-5.1.0+20.g253c0ab from fissile master. I see only fissile build helm supports parameter --use-secrets-generator, but fissile build kube doesn't support.
Then when make vagrant-prep, it will call make images but make images will finally call make helm and make kube:
images: bosh-images uaa-images helm kube

Then the build process will fail at make kube because:
make/uaa-kube
make[1]: Entering directory /bcf-maker/bcf/src/scf/src/uaa-fissile-release' make/kube unknown flag: --use-secrets-generator make[1]: *** [kube] Error 1 make[1]: Leaving directory /bcf-maker/bcf/src/scf/src/uaa-fissile-release'
make: *** [uaa-kube] Error 2

When I execute fissile build kube --help:

Flags:
  -D, --defaults-file string   Env files that contain defaults for the parameters generated by kube
      --output-dir string      Kubernetes configuration files will be written to this directory (default ".")
      --tag-extra string       Additional information to use in computing the image tags
      --use-memory-limits      Include memory limits when generating kube configurations (default true)

I also didn't find parameter --use-secrets-generator.

Can you help check, thanks!

Failed to get secret for multi uaa pods

Hi,

Function failed to get secret when I provisioned multi uaa pods. This command failed:

Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found
Error from server (NotFound): secrets "secrets-270.28.0-1 secrets-270.28.0-1" not found

Because the method to get secret name/key received duplicate value:

bjxzi$ kubectl get pods --namespace uaa
NAME                        READY     STATUS      RESTARTS   AGE
postgres-0                  1/1       Running     0          18h
secret-generation-1-tmbg5   0/1       Completed   0          18h
uaa-77448b8688-m89fb        1/1       Running     0          18h
uaa-77448b8688-nwzct        1/1       Running     0          18h

bjxzi$ kubectl get pods --namespace uaa -o 'jsonpath={.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}'
secrets-270.28.0-1 secrets-270.28.0-1

diego-cell pod stats cpu memory are all zero

When you check the pod status from kubectl proxy or kube api http://localhost:4194/api/v1.3/docker/, you can view other pods' cpu and memory normally. But only diego-cell cpu and memory are all empty.

You can see this issue by two ways:

1, in vagrant, docker ps and get diego-cell pod container id, then curl http://localhost:4194/api/v1.3/docker/<Container_id>, then you can see all cpu and memory stats are zero

2, view from 'kubectl proxy`:

2018-02-12 9 37 05

Failed to fetch properties through links during pod restart

Background
api pod will fetch property in cc_uploader by BOSH links, like here. https://github.com/cloudfoundry/capi-release/blob/develop/jobs/cloud_controller_ng/templates/cloud_controller_ng.yml.erb#L360

Description
This works well during the first deployment, but when we inject a sidecar container into the api pod, it will trigger a pod recreate, sometimes there's one error occurs like below(can't be reproduced 100%).

+ bash /opt/fissile/startup/scripts/forward_logfiles.sh
+ configgin --jobs /opt/fissile/job_config.json --env2conf /opt/fissile/env2conf.yml
Resolving link cc_uploader via service {"role"=>"cc-uploader", "job"=>"cc_uploader"}
/usr/local/rvm/gems/ruby-2.3.5/gems/bosh-template-2.0.0/lib/bosh/template/evaluation_link.rb:30:in `p': Can't find property '["internal_hostname"]' (Bosh::Template::UnknownProperty)
	from /var/vcap/jobs-src/cloud_controller_ng/templates/cloud_controller_api.yml.erb:445:in `block in get_binding'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/bosh-template-2.0.0/lib/bosh/template/evaluation_context.rb:152:in `if_link'
	from /var/vcap/jobs-src/cloud_controller_ng/templates/cloud_controller_api.yml.erb:443:in `get_binding'
	from /usr/local/rvm/rubies/ruby-2.3.5/lib/ruby/2.3.0/erb.rb:864:in `eval'
	from /usr/local/rvm/rubies/ruby-2.3.5/lib/ruby/2.3.0/erb.rb:864:in `result'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/lib/job.rb:57:in `generate'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:79:in `block (2 levels) in <top (required)>'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:78:in `each'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:78:in `block in <top (required)>'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:75:in `each'
	from /usr/local/rvm/gems/ruby-2.3.5/gems/configgin-0.14.1/bin/configgin:75:in `<top (required)>'
	from /usr/local/rvm/gems/ruby-2.3.5/bin/configgin:23:in `load'
	from /usr/local/rvm/gems/ruby-2.3.5/bin/configgin:23:in `<main>'
	from /usr/local/rvm/gems/ruby-2.3.5/bin/ruby_executable_hooks:15:in `eval'
	from /usr/local/rvm/gems/ruby-2.3.5/bin/ruby_executable_hooks:15:in `<main>'

These files job_config.json and env2conf.yml are generated during build time by fissile, right?
So there should be no change for the first deployment and the restart.

Is there some race issues here?

Timestamp failed of rsyslog_file in some component

In some cf component, like nats or doppler etc, there is a Timestamp failed of rsyslog_file if you do monit summary after the pod is started for a while.

nats/1:/# monit summary
File 'rsyslog_file'                 Timestamp failed

The error is caused by file /var/log/message which has not been updated for more than 65 minutes, according to the monit definition, if timestamp is greater than 65 minutes, there will be an alert.
/var/log/message is used for general syslog messages, there isn't so much log as VM do. This config /etc/monit/monitrc.d/rsyslog is baked in stemcell .

check file rsyslog_file with path /var/log/messages
   group rsyslogd
   if timestamp > 65 minutes then alert

acceptance-tests on CaaSP v2

host-192-168-123-38:~/deploy # kubectl create --namespace=scf --filename="kube/cf/bosh-task/acceptance-tests.yaml"

Error from server (BadRequest): error when creating "kube/cf/bosh-task/acceptance-tests.yaml": Pod in version "v1" cannot be handled as a Pod: [pos 818]: json: expect char '"' but got char '2'

Diego cell memory limits not set

Problem
Pushing apps will fail while staging because of error 137 (a memory error, afaik).

Bad Solution
Setting DIEGO_CELL_MEMORY_CAPACITY_MB to a value like 4096 works.

Detail
This script https://github.com/SUSE/scf/blob/master/container-host-files/etc/scf/config/scripts/set-diego-cell-memory-limits.sh doesn't work.

total memory is:

$ cat /proc/meminfo | awk '/MemTotal:/ { printf "%.0f\n", $2 * 1024 }'
16820776960

and:

$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || bc <<< '2 ^ 63'
9223372036854771712

usually seems to be the highest possible page size (int64).

which means that if DIEGO_CELL_MEMORY_CAPACITY_MB is set to auto the value will stay unset and staging apps that need to be built usually fails.

Environment

  • AKS cluster on Azure
  • and probably most others aswell

rsync could not found in stemcell when compiling cf-mysql-broker package

Release manifest: /home/vagrant/scf/src/grootfs-release/dev_releases/grootfs/grootfs-0.20.0+dev.1.yml
/home/vagrant/scf/make/compile
Please allow a long time for mariadb to compile
Compiling packages for dev releases:
         diego (1.19.0+dev.1)
         etcd (104+dev.1)
         garden-runc (1.8.0+dev.1)
         cf-mysql (35+dev.1)
         cflinuxfs2 (1.133.0+dev.1)
         cf-opensuse42 (0+dev.1)
         routing (0.157.0+dev.1)
         hcf (0+dev.1)
         grootfs (0.20.0+dev.1)
         capi (1.30.0+dev.1)
         loggregator (89+dev.1)
         nats (18+dev.1)
         consul (157+dev.1)
         binary-buildpack (1.0.11+dev.1)
         go-buildpack (1.7.19+dev.1)
         java-buildpack (3.14+dev.1)
         nodejs-buildpack (1.5.30+dev.1)
         php-buildpack (4.3.29+dev.1)
         python-buildpack (1.5.16+dev.1)
         ruby-buildpack (1.6.35+dev.1)
         staticfile-buildpack (1.4.0+dev.1)
depdone: cf-mysql/cf-mysql-broker - mysqlclient
depdone: cf-mysql/cf-mysql-broker - ruby
compile: cf-mysql/cf-mysql-broker
compilation-cf-mysql-broker > Compiling to /var/vcap/packages/cf-mysql-broker
compilation-cf-mysql-broker > ./packaging: line 4: rsync: command not found
done:    cf-mysql/cf-mysql-broker
result   > failure: cf-mysql/cf-mysql-broker - Error - compilation for package cf-mysql-broker exited with code 127
Error compiling packages: Error - compilation for package cf-mysql-broker exited with code 127
Makefile:178: recipe for target 'compile' failed
make: *** [compile] Error 1

You should push the latest fissile-stemcell-opensuse image which is including rsync command to docker.io

Should diego-api be statefulset or deployment and have multiple ones?

Hi all,

I see you change the name of diego-database to diego-api (although I don't know why:)), and you remove the volume. But the diego-api without volume will change from statefulset to deployment.

I am not sure if it is as design. But I think there is one error in bbs.json config is:

"advertise_url":"https://diego-api-184916.diego-api-set.cf.svc.cluster.local:8889"

From the name of diego-api-set I think this diego-api should be the statefulset, I am not sure if there is any error if use wrong advertise_url.

But my real problem is:
Have you tried to have multiple diego-api? (CF officially suggest have >=2 diego-api for HA)

I have tried to have two diego-api by using both deployment and statefulset, ONLY diego-api-0 can startup and work, because the readiness of others are fail (bbs job is not start correctly and 8889 port is not on).

I check the bbs log and found that the bbs startup is hang at the beginning to get consul lock:

{"timestamp":"1511534594.314656019","source":"bbs","message":"bbs.consul-lock.acquiring-lock","log_level":1,"data":{"key":"v1/locks/bbs_lock","session":"2","value":"{\"id\":\"897f63fc-90dd-4c38-510a-f479e8a59b6d\",\"url\":\"https://diego-api-1.diego-api-set.cf.svc.cluster.local:8889\"}"}}

I think because diego-api-0 is holding the lock, and I also see an issue about this:
cloudfoundry/diego-release#201

In our CF env on VM, we can have two BBS/diego-api, but I really don't know why we cannot have multiple diego-api on SCF.

Can you give me some suggestion? Thank you so much!

[Istio Integration] Support Istio deployment in scf

Hi @viovanov @manno @jandubois,

To integrate Istio with scf, we need to add support for Istio deployment in scf, although as you may know the deployment is independent and could be done anywhere, some Kubernetest platform may have already integrated Istio as a service, in scf I think we can add one more step make install-istio to cover this part. What do you think ?
And there're some slightly different configurations for Istio deployment between different Kubernetes platform(https://istio.io/docs/setup/kubernetes/), so I would like to know which Kubernetest need to support?
Firstly in Vagrant VM, it is hyperkube, right?

Failing readiness check for diego-access

Hello,

For weird reasons, we wanted to bump the diego-release in SCF a couple of versions and ran into an issue with the readiness check done by the diego-access pod. In a future SCF version, you could have the same issue, too.

So it is not of immediate concern, but we thought we share our observations already. The file-server package was bumped to a newer version. This comes with an API change, which manifests itself by returning a 401 when requesting a directory. Since the readiness check requests /v1/static/ (/v1/static will be 302-redirected to /v1/static/), which is a directory, an offending 401 is returned. This causes Kubernetes to mark the container of that pod unhealthy.

We are currently testing an alternative, and it seems the readiness check should evaluate an specific file, rather than a directory. Or, a completely different readiness check should be used.

How can I upgrade the Kubernetes version to 1.7 in Vagrant?

I tried to follow this guide:
https://software.opensuse.org/download.html?project=Virtualization%3Acontainers&package=kubernetes

But it cannot work.

I tried to update manually, I can upgrade, but I don't know why your suse kubernetes 1.7 requires a low version docker docker_1_12_6, so it report runc error during create container.

So I am not sure if I follow the correct step.

Or do you have any guide?

I also try to use packer to create a new virtualbox iso, but also fail because the packages:
kubernetes-addons-kubedns-1.5.3
kubernetes-node-cni-1.5.3
don't exist.

Please help take a look, thanks a lot! :)

Make App-Domains Configurable

Hello guys,

at the moment fissile sets for the apps the same domain as for the system:

properties.app_domains: '["((DOMAIN))"]'
properties.domain: '"((DOMAIN))"'

that means, that when the api pods get recreated, they automatically re-register the domain as shared domain.
If you have a look at the source of the cloud_controller it would make sense to separate app and system domain.

Thanks for your feedback.

`helm list`error maximum size exceeded

I see SCF uses helm list to get/remove some releases.

But there is a limitation in the latest helm 2.8.1:

If you have many helm releases, for example more than 100 releases, when you just use helm list you will get an error:
root@0013473c434b:/# helm list
Error: grpc: received message larger than max (9677174 vs. 4194304)

Then it will fail the build process and get nothing.

I have talked with helm team but no fix right now, there is an issue:
helm/helm#3322

For example your code will fail at:

# Determine the kube revision of the chart controlling the specified namespace.
release_version() { helm list --namespace "$1" | awk '{ print $2 }' | tail -n 1 ; }

I use some additional parameters to limit the list size like this:

release_version() { helm list -d -r --max 20 --namespace "$1" | awk '{ print $2 }' | head -n 1 ; }

We can get the latest one by using -d -r and with max size.

Let me know if you need a PR for it.

Thanks!

starting an app pushed as a container sometimes failes

While trying to change the acceptance tests, i've encountered the following issue:
In a script, we create an app from a docker image in a repository (in my case I've also used a tcp route, but I do not think this influences the problem).
cf push ${HSM_SERVICE_INSTANCE} -o docker-registry.helion.space:443/rohcf/sidecar-acctests:latest -d tcp-${CF_DOMAIN} --random-route --no-start
I've also tried with other images with the same results.
when I try to start the app:
cf start ${HSM_SERVICE_INSTANCE}
Sometimes I reiceive the error: ERR Failed to talk to docker registry: ...

After one or two more tries, the app starts without any more errors.

This might be related to vmware-archive/pcfdev#22

Cannot generate smoke and CATs Test by using helm?

Hi,

I would like to add smoke test in my building.

But I found I need to generate Kubernetes configuration first. and use make smoke or make make cats to run the tests.

But SCF prefer to use helm deploy, and I have to re-generate kube config for tests...

Not sure if SCF can generate smoke/cats test like post-deployment-setup. then I can use it directly.

error when value of a bosh release property is a mapping

I have a bosh release and in it there is a property: broker.catalog, the value of it is a mapping. When I build my bosh release it threw an error that no bosh release use "broker.catalog.services". The "services" is just a sub-elements of catalog while catalog is used by my bosh release. Is it a bug of scf?
catalog:
services:
- id: autoscaler-guid
name: autoscaler
description: Automatically increase or decrease the number of application instances based on a policy you define.
bindable: true
plans:
- id: autoscaler-free-plan-id
name: autoscaler-free-plan
description: This is the free service plan for the Auto-Scaling service.

git submodule problem

There is some problem with repositories from src/
After

git submodule sync --recursive
git submodule update --init  --recursive

some repositories are empty. For instance src/diego-release and I have the following error:

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:suse/testbrain.git' into submodule path '/home/vagrant/scf/src/hcf-release/src/github.com/suse/testbrain' failed
Failed to clone 'src/hcf-release/src/github.com/suse/testbrain' a second time, aborting

Cannot find docker.io/splatform/scf-api image during eirini deployment

Following the Eirini instructions, when I helm install cf the api-0 pod cannot find its image

$ kubectl describe pod -n cf api-0
  Type     Reason     Age                From                                             Message
  ----     ------     ----               ----                                             -------
  Normal   Scheduled  99s                default-scheduler                                Successfully assigned cf/api-0 to gke-knative-default-pool-147e217d-twk3
  Warning  Failed     28s                kubelet, gke-knative-default-pool-147e217d-twk3  Failed to pull image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67": rpc error: code = Canceled desc = context canceled
  Warning  Failed     28s                kubelet, gke-knative-default-pool-147e217d-twk3  Error: ErrImagePull
  Normal   BackOff    27s                kubelet, gke-knative-default-pool-147e217d-twk3  Back-off pulling image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67"
  Warning  Failed     27s                kubelet, gke-knative-default-pool-147e217d-twk3  Error: ImagePullBackOff
  Normal   Pulling    15s (x2 over 98s)  kubelet, gke-knative-default-pool-147e217d-twk3  pulling image "docker.io/splatform/scf-api:3d90fb8724c0e125853bedd1665065447da3aa67"

"disk quota exceeded" on overlay2 and ubuntu

hi scf team,

we successfully deployed scf 2.7.0 on a kubespray kubernetes cluster.
as host/node os, we currently use ubuntu on aws with overlay2 as docker storage driver.

while deploying apps, we noticed "Disk quota exceeded" and likewise exceptions all over
(like from npm and other package managers).

we increased the maximal disk space of apps from 2GB to 10 GB
(using MAX_APP_DISK_IN_MB helm value -> cc.maximum_app_disk_in_mb)
if we now push the app while specifying the disk size with e.g. -k 10G it magically works.
the disk usage calculation with "cf app {app-name}" seems correct ~ 100 MB

we assume, the quota calculation on the diego-cell (fissile?) wrongly sums up both, the app's stack size (opensuse / cflinux2) and the app size, and is therefore more or less always over its quota.

let me know if I can provide any information to help fixing that issue

best regards, adrian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.