Coder Social home page Coder Social logo

Comments (78)

saiharshitachava avatar saiharshitachava commented on July 28, 2024 1

Thanks everyone @myechuri @justnoise for a quick code change. .Please let me know once this is merged..

I will try it out :)

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024 1

@myechuri I will likely test this tomorrow (IST) and get back..

from kip.

ldx avatar ldx commented on July 28, 2024 1

@saiharshitachava OIC what you mean. I think I know what the issue is, let me fix it and get back to you as soon as there's a new build.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

I did try giving the VPC ID in my provider yaml..dint work out..
Any help here @myechuri @justnoise

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : can you please clarify the following?

  1. How did you deploy kip?

a) Did you provision a new cluster using https://github.com/elotl/kip/tree/master/deploy/terraform-aws by following documentation at https://github.com/elotl/kip#installation-option-1-create-a-minimal-k8s-cluster?

Or

b) Did you deploy kip onto an existing cluster using https://github.com/elotl/kip#installation-option-2-using-an-existing-cluster ? If you used this option, did you use AWS Credentials Option 1 - Configuring AWS API keys or AWS Credentials Option 2 - Instance Profile Credentials?

  1. Can you please share contents of your provider.yaml after redacting sensitive information? Thanks!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri I deployed kip on existing cluster using the files in base
I used the option of api keys

kube-proxy came up fine with no errors

apiVersion: v1
cloud:
aws:
region: "us-east-1"
accessKeyID: ""
secretAccessKey: "
"
vpcID: "Gave my internal VPC ID found in AWS VPC section"
etcd:
internal:
dataDir: /opt/kip/data
cells:
bootImageSpec:
owners: 68949425850dd1
filters: name=elotl-kip-*
defaultInstanceType: "t3.small"
defaultVolumeSize: "2G"
nametag: kip
itzo:
url: https://itzo-kip-download.s3.amazonaws.com
version: latest
kubelet:
cpu: "2"
memory: "2Gi"
pods: "2"

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava . Can you please clarify if you filled in accessKeyID and secretAccessKey in your provider.yaml or did you leave them empty?

accessKeyID: ""
secretAccessKey: ""

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Nope I did give them..I just stripped them out before posting here @myechuri

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava . A couple more questions:

  1. When you say I have a private AWS setup, do you mean you have a private VPC or something else?

  2. In your provider.yaml, can you please confirm that the vpcID listed below does contain the cluster that you are trying to deploy kip to?

vpcID: "Gave my internal VPC ID found in AWS VPC section"
  1. Is the cluster you are deploying kip to an EKS cluster or provisioned via kops or some other provisioner?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

When you say I have a private AWS setup, do you mean you have a private VPC or something else?---Yes we have a private VPC

In your provider.yaml, can you please confirm that the vpcID listed below does contain the cluster that you are trying to deploy kip to?---I have an existing cluster in onprem..Im trying to estblish virtual kubelet which is of AWS provider in onprem cluster

Is the cluster you are deploying kip to an EKS cluster or provisioned via kops or some other provisioner?--Im trying from an onprem cluster

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava , super helpful.

Do you have a VPN Gateway set up for the target VPC?

We have successfully deployed kip on an existing on-prem cluster and shipping pods to AWS with a VPN Gateway using https://github.com/elotl/kip/tree/master/deploy/terraform-vpn#vpn-setup . Hence curious to learn if you have VPN Gateway for the VPC.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Im sorry Im like quite new to AWS as well:)

So you are saying unless I create a Site to Site VPN Connection AWS VPC cannot communicate with onprem..?

And kip deploy from onprem to AWS definetely should go through VPN connections to work?

I dont see any VPN connections for my VPC

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava .

VPN is not necessary - asked because i wanted to find out what your current setup involved.

  1. Can you please add subnetID key value pair in provider.yaml? https://github.com/elotl/kip/blob/master/deploy/manifests/kip/overlays/minikube/provider.yaml#L9
    Where did you get your provider.yaml template from?

  2. Can you please share more about your on-prem control plane? Are you running a VMware k8s control plane, OpenShift control plane, minikube, or some other control plane?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

I got the provider yaml from base ..I changed a few parameters in the base directory itself and applied all the files in base..Anything else is needed other than the files in base?I take base would have all the files needed
https://github.com/elotl/kip/blob/master/deploy/manifests/kip/base

We run IBM Cloud private on vmware nodes..

I will also try adding subnet details and let you know the results

Also the main thing I noticed here is kip trying to reach might be public cloud here(I guess)..Is there a way I can pass my private AWS URL which we use to login to AWS clusters..You think this would be an issue here? @myechuri
caused by: Post https://ec2.us-east-1.amazonaws.com/: x509: certificate signed by unknown authority

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava .

Look forward to hearing back on what you saw after subnetID is added.

Kip uses aws-sdk-go client to setup EC2 session. Can you please share a little more details on private AWS URL that you use to connect to AWS sdk? Is it a custom endpoint URL? i am wondering if your URL is similar to custom endpoint hence asking.

from kip.

justnoise avatar justnoise commented on July 28, 2024

Hi @saiharshitachava, thank you for giving Kip a try! I'm sorry to hear that you experienced problems with your setup. If I understand it correctly, you are attempting to connect to AWS from your on premise kubernetes cluster. The problem you are encountering is that your company connects to the AWS API using a custom endpoint, not the default public endpoint for ec2 in us-east-1: https://ec2.us-east-1.amazonaws.com/. In other words, if you were to connect to AWS using the AWS CLI, you would be specifying --endpoint-url on the command line.

Kip does not currently support customizing the AWS service endpoint but it should be a small/quick code change to make the endpoint customizable. I'll add another option in provider.yaml cloud.aws.endpointURL that will be passed through into the AWS session config in Kip.

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @justnoise !

@saiharshitachava : we will ping you shortly once #143 is merged.

from kip.

justnoise avatar justnoise commented on July 28, 2024

@saiharshitachava, The change has been merged and we cut a new release. You'll want to update kip/deploy/manifests/kip/base/provider.yaml and add endpointURL: "https://your-private-aws-url.com to the cloud.aws section and then re-apply the manifests with: kustomize build base | kubectl apply -f -

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Thanks for the fix @justnoise @myechuri

Looks like we have our console integrated with sso and uses saml profile for auth..Does this changes anything with the parameters we are giving?
Does this config need any extra parameters in the provider.yaml?

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : What is the output of authentication using saml profile? Is there a way to get an access_key_id, secret_access_key and (possibly) asession token? Can you create an aws shared credential file? Based on the output of the SSO authentication workflow, we can figure out if the existing parameters in provider.yaml are sufficient or not. Thanks!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri Yes I have a credential file which has ID,Key and the session token in it

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks for clarifying, @saiharshitachava . We will scope out adding session token to provider.yaml and share an update by end of day friday. Thank you for your patience!

from kip.

justnoise avatar justnoise commented on July 28, 2024

I was a bit slow to realize that, while we can add a session token to provider.yaml, it is possible to specify the session token with an environment variable: AWS_SESSION_TOKEN. That value can be added to kip's container definition (in statefulset.yaml):

      containers:
      - name: kip
        image: elotl/kip:latest
        env
        - name: AWS_SESSION_TOKEN
          value: <session token value>

It might not be necessary for this use case, it's good to note that the credentials values in provider.yaml (accessKeyID, secretAccessKey and sessionToken) can all be passed as environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN

While those values can be hard-coded into a manifest, they can also be stored as kubernetes secrets which has an extra bit of security if your cluster has enabled encryption of secret data at rest:

---
apiVersion: v1
kind: Secret
metadata:
  name: kip-aws-secrets
type: Opaque
data:
  AWS_ACCESS_KEY_ID: <base64 encoded accessKeyID>
  AWS_SECRET_ACCESS_KEY: <base64 encoded secretAccessKey>
  AWS_SESSION_TOKEN: <base64 encoded sessionToken>

The Kip pod in the Kip StatefulSet can be extended to set environment variables from the secret:

      containers:
        name: kip
        image: elotl/kip:latest
        envFrom:
        - secretRef:
            name: kip-aws-secrets

If this configuration is used, the environment varaibles will override the corresponding values in provider.yaml (I would also suggest the values in provider.yaml be removed to prevent confusion).

So to summarize: The quickest way to pass in a session token is to add an additional environment variable AWS_SESSION_TOKEN in the Kip StatefulSet manifest.

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @justnoise .

@saiharshitachava : please let us know if passing AWS_SESSION_TOKEN via environment variable unblocks you. Thanks!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri from the logs it looks like it took creds from env variables..

But the problem is still there I did build a new image by adding our internal proxy (thought below was something with firewall between env) even that dint help

Is there a way you know off to disable the cert check --ssl verify=false or something so that we can bypass this error?

I0703 15:11:01.788467 1 aws.go:94] Checking for credential errors
I0703 15:11:01.788541 1 aws.go:99] Using credentials from EnvConfigCredentials
I0703 15:11:01.788575 1 aws.go:105] Validating read access
Error: error initializing provider kip: error configuring cloud client: Error setting up cloud client: Could not configure AWS cloud client authorization: Error validationg connection to AWS: RequestError: send request failed
caused by: Post https://ec2.us-east-1.amazonaws.com/: x509: certificate signed by unknown authority

from kip.

myechuri avatar myechuri commented on July 28, 2024

Thanks, @saiharshitachava . We'll look into possible ways to disable cert check and get back shortly.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri Could you let me know where kip actually checks for the trusted root certs..Assuming this is go and debian flavour will it search under /etc/ssl/certs/ca-certificates.crt

If thats the case can I replace this cert with my internal trusted certs?would it cause any issues with the startup?

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : since replacing default certs with your internal trusted certs is a better option than disabling cert check, we will explore both paths and report back on the quickest path to unblock you.

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava: @justnoise confirmed that you are correct. The certs are located at /etc/ssl/certs/ca-certificates.crt . Replacing or appending your internal certificate onto /etc/ssl/certs/ca-certificates.crt in the kip container is a good way to go.

We've also made a release with another parameter in provider.yaml: insecureTLSSkipVerify. That can be used to turn off checking TLS certs. As the name of the parameter suggests, this flag should only be used for testing.

cloud:
  aws:
    endpointURL: <internal endpoint value>
    insecureTLSSkipVerify: true

@justnoise has done his best to test this out but since we cannot reproduce your exact setup in our environment he tested out with a MITM proxy - that setup proved to be problematic with AWS's authentication system. Please let us know if there are issues using insecureTLSSkipVerify in your setup. Thanks!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri @justnoise Good news is AWS checks got passed..yay !! I replaced the ca-certificates.crt with my internal root cert
but now I'm having 509 cert errors while accessing kubernetes cluster and I believe there is something wrong the api certs..I tried giving the MASTER_URI and the cluter dns manually but my understanding is these should be picked by default..

When I gave these two manually it went on checking for network-agent which it failed to find but its there in the cluster..So I believe something is going wrong when connecting to kubernetes cluster here..

Are there any custm certs u are generating to connect to kube api cluster?how is authentication done to api server from kip container?

E0705 10:32:18.551874 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Secret: Gethttps://8001/api/v1/secrets?limit=500&resourceVersion=0: x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, , not
I0705 10:32:18.557903 1 config.go:245] Validated access to AWS
I0705 10:32:18.639260 1 network.go:147] Current vpc: 10.205.246.128/25
I0705 10:32:18.639283 1 network.go:154] Getting subnets and availability zones for VPC vpc-
I0705 10:32:18.828488 1 network.go:154] Getting subnets and availability zones for VPC vpc-
I0705 10:32:19.017534 1 aws.go:172] cells will run in a private subnet (no route to internet gateway)
I0705 10:32:19.017697 1 config.go:444] controller will connect to nodes via private IPs
I0705 10:32:19.185258 1 security_groups.go:73] security group name kip-stvmugsmfvgwpmnxzdq2lge23m-cellsecuritygroup [sg-0f416f2248be24ac3]
I0705 10:32:19.185282 1 server.go:152] ensuring region has not changed
I0705 10:32:19.369980 1 helpers.go:49] host nameservers [sever ips] searches [private domain name]
Error: error initializing provider kip: creating DNS configurer: missing or misconfigured kube-dns service

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : good to hear that replacing default certs with your trusted root cert worked!

Let me get back on the missing kube-dns service issue shortly. @justnoise is currently on the road and will be back online on tuesday. @ldx and i will look into the issue and share an update latest by monday. Sorry for the delay!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri @ldx @justnoise I was actually trying to check with curl calls with the existing certs init-cert container is generating.however I think there should be a public cert parameter to give the ca cert same as api cert and key which we give in env vars(--client-ca-file )

Also The certs which are generated by init-cert somehow are not getting authenticated even with the ca cert..So I thought of using my existing manager.crt and key which would have a cluster scope access for crds and all..

kube-system-data-kip-provider-0-pvc-b27ea69d-bb8e-11ea-9f5e-005056ab3b00/kubelet-pki # curl --cacert /etc/cfc/kubelet/ca.crt --key kip-provider-0.key --cert kip-provider-0.crt https://29.91.0.1:443/api/v1/secrets
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401

I replaced all the API related parameters in the statefulset and used manager certs in place of init-cert which are being generated.. and gave custom kube-config path

But somehow I guess it needs public ca cert to authenticate..I still see these errors..might be a new env variable needs to be added in kip to take this cert as input?

1 server.go:423] Could not create new Cell Controller: Error ensuring k8s cell CRD exists in controller: Get https://29.91.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/cells.kip.elotl.co?timeout=30s: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

from kip.

ldx avatar ldx commented on July 28, 2024

I was actually trying to check with curl calls with the existing certs init-cert container is generating.however I think there should be a public cert parameter to give the ca cert same as api cert and key which we give in env vars(--client-ca-file )

The certificate/key generated by init cert will be used by port 10250 in Kip, not as a client cert.

To connect to the Kubernetes API server, Kip will use its service account token at /var/run/secrets/kubernetes.io/serviceaccount (so Kip uses regular in-pod authentication via InClusterConfig()).

1 server.go:423] Could not create new Cell Controller: Error ensuring k8s cell CRD exists in controller: Get https://29.91.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/cells.kip.elotl.co?timeout=30s: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

You're getting that error because the API server certificate does not have 29.91.0.1 as a SAN in it. If you use curl, you can check the names in the server certificate via curl -v "https://29.91.0.1:443/" (look for subjectAltName:). Try to use that as your kubernetes API server hostname.

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : please let us know if subjectAltName mentioned in the above comment does not help get past the error. Thanks!

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : checking in to see if you were able to make progress? Let us know if you are still blocked!

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : any luck with getting past the error? Thanks!

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Sorry for the huge delay..
Im gonna start working on this again today..
will keep the thread posted

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Im facing issues with the init-cert and Im pretty sure its somethign with my env as this worked earlier..

CERT_DATA variable for init-cert is getting added with the kubectl versions not sure where these are coming from though..

These start when the create.csr script starts..This adds to CERT_DIT variable along with cert at the end and the base 64 decode fails..

  • envsubst
    /opt/kubectl/1.14
    /opt/kubectl/1.15
    /opt/kubectl/1.16
    /opt/kubectl/1.17
    /opt/kubectl/1.18
    using /opt/kubectl/1.14/kubectl
    certificatesigningrequest.certificates.k8s.io/kip-1594912492 created

@myechuri have you seen this before..I know its somethign specific to my env but just cross checking if you guys have seen this before

from kip.

ldx avatar ldx commented on July 28, 2024

@saiharshitachava maybe the CSR approval fails or it takes a long time and times out? Can you provide logs from the init-cert container?

Can you check the CSR via kubectl, e.g. if init-cert created kip-1594912492 then kubectl get -oyaml csr kip-1594912492 too?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx everything is created fine but when we see the cert_data variable in get_csr.sh we see that the the envsubst values like mentioned are added to the variable along with the cert

As a result base64 decode is failing..im clueless right now from where these versions of kubectl are getting attached to the variable

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

This variable to be specific https://github.com/elotl/kip/blob/master/scripts/init-cert/csr/get-cert.sh#L53

  • echo '/opt/kubectl/1.14
    /opt/kubectl/1.15
    /opt/kubectl/1.16
    /opt/kubectl/1.17
    /opt/kubectl/1.18
    LS0tLS1CRUdJTiBDRVJUSUZJQ0FU+++++++++cert+++++++'

  • base64 -d
    base64: invalid input

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Ok I pulled the latest image of init-cert which was built minutes ago and that issue is resolved...Thanks @ldx

Back to my original issue when I give different alt names which are used by current existing nodes in the cluster and got the same error.. @ldx @myechuri

E0716 18:42:33.445178 1 server.go:423] Could not create new Cell Controller: Error ensuring k8s cell CRD exists in controller: Get https://29.91.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/cells.kip.elotl.co?timeout=30s: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

and this 29.91.0.1 is the default kubernetes service IP..While trying the combinations of alt names I came across the alt names in this message which are ok with this cert..which out of one is default service but that seems to be throwing error for default as well

x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, test.icp, not icp-test

from kip.

ldx avatar ldx commented on July 28, 2024

Ok I pulled the latest image of init-cert which was built minutes ago and that issue is resolved...Thanks @ldx

Great, happy to hear!

Back to my original issue when I give different alt names which are used by current existing nodes in the cluster and got the same error.. @ldx @myechuri

E0716 18:42:33.445178 1 server.go:423] Could not create new Cell Controller: Error ensuring k8s cell CRD exists in controller: Get https://29.91.0.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/cells.kip.elotl.co?timeout=30s: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

and this 29.91.0.1 is the default kubernetes service IP..While trying the combinations of alt names I came across the alt names in this message which are ok with this cert..which out of one is default service but that seems to be throwing error for default as well

x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, test.icp, not icp-test

When your cluster got created, the API server IP address should have been added to its certificate as a SAN - wondering why it's not in there. As a workaround for now, you can try overriding the following environment variable for the kip container:

    - name: KUBERNETES_SERVICE_HOST
      value: kubernetes.default.svc.cluster.local

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Should i still be giving master uri along with the above parameter?

from kip.

ldx avatar ldx commented on July 28, 2024

Should i still be giving master uri along with the above parameter?

Are you setting MASTER_URI for the kip container? If yes, what is it set to?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx we have custom Console URL which we use to access API with port 8001..I was giving that and got errors of SAN

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Also there is something wrong with the api server authentication itself..I guess this isnt happening the way its expected and we are seeing errors from there on.. @ldx @myechuri @justnoise Could someone help me on how the pod communicates to api server and get these details..(pods,servicesetc..) I know you said it uses secrets using service token..I see that generated but still pod is failing to communicate to my api server..

Also @ldx mentioned we are not using the certs which are generated by int-cert for any sort of auth but why do we ave them in these variables?

    - name: APISERVER_CERT_LOCATION
      value: /opt/kip/data/kubelet-pki/$(VKUBELET_NODE_NAME).crt
    - name: APISERVER_KEY_LOCATION
      value: /opt/kip/data/kubelet-pki/$(VKUBELET_NODE_NAME).key

Also do we have a list of variables which I can go through to use for kip and kubernetes communication..might be I can get a closer look on if there are any other custom paramters which needs to be set for my env...(variables like KUBERNETES_SERVICE_HOST..etc)

E0717 08:50:45.682646 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Unauthorized
E0717 08:50:45.683329 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ConfigMap: Unauthorized
E0717 08:50:45.684625 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Pod: Unauthorized
E0717 08:50:45.685396 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Secret: Unauthorized
I0717 08:50:45.862161 1 aws.go:172] cells will run in a private subnet (no route to internet gateway)
I0717 08:50:45.862185 1 config.go:444] controller will connect to nodes via private IPs
I0717 08:50:46.047316 1 security_groups.go:73] security group name kip-stvmugsmfvgwpmnxzdq2lge23m-cellsecuritygroup [sg-0f416f2248be24ac3]
I0717 08:50:46.047344 1 server.go:152] ensuring region has not changed
Error: error initializing provider kip: creating DNS configurer: missing or misconfigured kube-dns service
Usage:

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

I tried giving KUBERNETES_SERVICE_HOST as kubernetes.default.svc and no kube config and no master URI..

I see these errors..I want to try giving parameters like below ..Its just that I dunno the keys which KIP can understand..

  api_server: https://kubernetes.default.svc.cluster.local
scheme: https
tls_config:
  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  insecure_skip_verify: true

W0717 09:27:35.327513 1 serverurl.go:72] building config from kubeconfig: stat /root/.kube/config: no such file or directory, continuing with alternative config methods
E0717 09:27:35.549506 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Secret: Get https://kubernetes.default.svc:443/api/v1/secrets?limit=500&resourceVersion=0: x509: certificate signed by unknown authority
E0717 09:27:35.550768 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Get https://kubernetes.default.svc:443/api/v1/services?limit=500&resourceVersion=0: x509: certificate signed by unknown authority
E0717 09:27:35.550769 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Pod: Get https://kubernetes.default.svc:443/api/v1/pods?fieldSelector=spec.nodeName%3Dkip-provider-0&limit=500&resourceVersion=0: x509: certificate signed by unknown authority
E0717 09:27:35.564054 1 serverurl.go:44] trying to determine API server URL: Get https://kubernetes.default.svc:443/api/v1/namespaces/default/endpoints/kubernetes: x509: certificate signed by unknown authority
F0717 09:27:35.564081 1 main.go:100] can't determine API server URL, please set --kubeconfig or MASTER_URI

from kip.

ldx avatar ldx commented on July 28, 2024

I tried giving KUBERNETES_SERVICE_HOST as kubernetes.default.svc and no kube config and no master URI..

I see these errors..I want to try giving parameters like below ..Its just that I dunno the keys which KIP can understand..

  api_server: https://kubernetes.default.svc.cluster.local
scheme: https
tls_config:
  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  insecure_skip_verify: true

By default, Kip uses the pod serviceaccount for authenticating to the API server. Is the above config snippet from a kubeconfig file? If yes, how do you provide the kubeconfig file to pods in the cluster?

We do have some code that allows Kip to use a kubeconfig file, but I'll have to test it to see if it works everywhere when Kip needs to talk to the API server.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx I see an option in kip with flag --kubeconfig to provide the file..
And the above isnt the snippet from kube-config(just trying to get a picture of what I wanted to pass)

When I give the api server details in config and pass the file in --kube-config I dont see the api calls going to the server which I gave they by default are going to kubernetes service IP like its somwhere hard coded..

apiVersion: v1
clusters:

  • cluster:
    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server: https://apiserver:port
    name: default-cluster
    users:
  • name: default-auth
    user:
    client-certificate: kip-provider-0.crt
    client-key: kip-provider-0.key

I gave a different server in the config file but I see this in logs..If this works I can think I can pass the ca cert along with authentication for api calls..and from there kip could pick up the dns and other configurations..this is wat I could understand till now..

2020-07-17 15:24:54.982537 I | etcdserver/api: enabled capabilities for version 3.3
E0717 15:24:55.180447 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ConfigMap: Get **https://29.91.0.1:443/**api/v1/configmaps?limit=500&resourceVersion=0: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

from kip.

ldx avatar ldx commented on July 28, 2024

@ldx I see an option in kip with flag --kubeconfig to provide the file..
And the above isnt the snippet from kube-config(just trying to get a picture of what I wanted to pass)

When I give the api server details in config and pass the file in --kube-config I dont see the api calls going to the server which I gave they by default are going to kubernetes service IP like its somwhere hard coded..

apiVersion: v1
clusters:

* cluster:
  certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  server: https://apiserver:port
  name: default-cluster
  users:

* name: default-auth
  user:
  client-certificate: kip-provider-0.crt
  client-key: kip-provider-0.key

That certificate and key (kip-provider0.crt and kip-provider-0.key) are used by Kip for its listener port, it's not a client certificate.

Do you have a kubeconfig file and either a token or a client certificate you can use for authenticating to the API server?

I gave a different server in the config file but I see this in logs..If this works I can think I can pass the ca cert along with authentication for api calls..and from there kip could pick up the dns and other configurations..this is wat I could understand till now..

2020-07-17 15:24:54.982537 I | etcdserver/api: enabled capabilities for version 3.3
E0717 15:24:55.180447 1 reflector.go:156] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ConfigMap: Get **https://29.91.0.1:443/**api/v1/configmaps?limit=500&resourceVersion=0: x509: cannot validate certificate for 29.91.0.1 because it doesn't contain any IP SANs

Yes, there's a direct call to InClusterConfig() when the cell controller is set up. I can look into it if you need a kubeconfig file to authenticate against the API server.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Yes i do have a kube config file which i can use for authenticating to api server and when i pass it to kip..it doesnt take the api server which i give which somehow makes me think the config from the flag isnt picking up

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Im thinking kube config way is a better way here because i have all these specific ca crt and stuff to paas while authenticating in my env

from kip.

ldx avatar ldx commented on July 28, 2024

Im thinking kube config way is a better way here because i have all these specific ca crt and stuff to paas while authenticating in my env

Let me look into this. As I mentioned, we have some support for using a kubeconfig file, but it's incomplete.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Sure..thank you..i wanna try giving that configurations in a file as it has lot more features on how we can connect and authenticate to a cluster..

So once authentication is done all the dns stuff and configurations should work by themseleves?the other errors which i gave above?

from kip.

ldx avatar ldx commented on July 28, 2024

So once authentication is done all the dns stuff and configurations should work by themseleves?the other errors which i gave above?

Once Kip is able to access the API server, it should be able to figure everything else out, if it has the right permissions.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Great please let me know once kube config flag is complete

from kip.

ldx avatar ldx commented on July 28, 2024

Great please let me know once kube config flag is complete

I just merged this, documentation is here: https://github.com/elotl/kip/blob/master/docs/kubeconfig.md

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx @myechuri @justnoise First step of testing this might have been a success..I was able to succesfully spin the agent and when I do get nodes I could see the kip agent in the nodes..Thanks for all the help..kube config stuff did work :)

kip-provider-0 Ready agent 4m48s

So I guess I can start putting the load on server which will create a default instance node in aws for me..

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

OK I did try putting pods on the server..but some reason no new AWS insatnces are spinnign up and neither i see that in logs..
It jus keeps checking availbility zones and subnets stuff but never spun an instance in aws vpc..

Events:
Type Reason Age From Message


Warning FailedScheduling 10m (x81 over 71m) default-scheduler 0/16 nodes are available: 1 Insufficient pods, 15 node(s) didn't match node selector, 3 Insufficient cpu.
Normal ProviderCreateSuccess 10m kip-provider-0/pod-controller Create pod in provider successfully

cdldvcldvkb0001:/home/chavas/kip # kubectl get pods | grep -i pending
audit-logging-fluentd-ds-gmcmf 0/1 Pending 0 71m
calico-node-kslvd 0/1 Pending 0 71m
image-manager-init-certs-g2kf6 0/1 Pending 0 71m
k8s-proxy-ngrwz 0/1 Pending 0 71m
cdldvcldvkb0001:/home/chavas/kip #

my provider.yaml

apiVersion: v1
cloud:
aws:
region: "us-east-1"
vpcID: "myvpcid"
subnetID: "my subnet id"
etcd:
internal:
dataDir: /opt/kip/data
cells:
bootImageSpec:
owners: owner id
imageIDs: ami-name
defaultInstanceType: "t3.nano"
defaultVolumeSize: "2G"
nametag: kip
itzo:
url: https://itzo-kip-download.s3.amazonaws.com
version: latest
kubelet:
cpu: "20"
memory: "20Gi"
pods: "10"

I0718 08:09:32.249489 1 server.go:141] configuring k8s client from kubeconfig
I0718 08:09:32.762595 1 controller_manager.go:111] Starting controllers
I0718 08:09:32.762615 1 controller_manager.go:116] Starting NodeController
I0718 08:09:32.762624 1 controller_manager.go:116] Starting GarbageController
I0718 08:09:32.762628 1 controller_manager.go:116] Starting MetricsController
I0718 08:09:32.762632 1 controller_manager.go:116] Starting CellController
I0718 08:09:32.762637 1 controller_manager.go:116] Starting NodeStatusController
I0718 08:09:32.762641 1 controller_manager.go:116] Starting PodController
I0718 08:09:32.762646 1 controller_manager.go:119] Finished starting controllers
I0718 08:09:32.763378 1 node_controller.go:445] Resume waiting on healty from 0 instances
I0718 08:09:32.765586 1 pod_controller.go:168] starting pod controller
I0718 08:09:32.919391 1 root.go:232] Initialized watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.919568 1 podcontroller.go:257] Pod cache in-sync watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.920860 1 podcontroller.go:299] starting workers watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.920948 1 podcontroller.go:332] started workers watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.939747 1 instanceselector.go:400] chose instance t3.nano
I0718 08:09:32.940426 1 instanceselector.go:400] chose instance t3.nano
I0718 08:09:32.942510 1 opencensus.go:154] Updated pod in provider pod=image-manager-init-certs-g2kf6 namespace=kube-system
I0718 08:09:32.942805 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"image-manager-init-certs-g2kf6", UID:"836107ac-c8c5-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"79823815", FieldPath:""}): type: 'Normal' reason: 'ProviderUpdateSuccess' Update pod in provider successfully watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.946451 1 opencensus.go:154] Updated pod in provider pod=calico-node-kslvd namespace=kube-system
I0718 08:09:32.946555 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"calico-node-kslvd", UID:"83610716-c8c5-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"79823817", FieldPath:""}): type: 'Normal' reason: 'ProviderUpdateSuccess' Update pod in provider successfully watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:32.953923 1 opencensus.go:154] Created pod in provider pod=k8s-proxy-ngrwz namespace=kube-system
I0718 08:09:32.954011 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-proxy-ngrwz", UID:"83669be6-c8c5-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"79844708", FieldPath:""}): type: 'Normal' reason: 'ProviderCreateSuccess' Create pod in provider successfully watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:33.049163 1 controller.go:106] node status changed ready: true network unavailabe: false
I0718 08:09:33.136888 1 instanceselector.go:400] chose instance t3.nano
I0718 08:09:33.138037 1 opencensus.go:154] Created pod in provider pod=audit-logging-fluentd-ds-gmcmf namespace=kube-system
I0718 08:09:33.138204 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"audit-logging-fluentd-ds-gmcmf", UID:"83803e18-c8c5-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"79844731", FieldPath:""}): type: 'Normal' reason: 'ProviderCreateSuccess' Create pod in provider successfully watchedNamespace= provider=kip operatingSystem=Linux node=kip-provider-0
I0718 08:09:39.918665 1 node_controller.go:735] latest image for spec map[imageIDs:ami-name owners:owner id]: {ami details /dev/sda1 2019-06-27 18:56:56 +0000 UTC}
I0718 08:12:01.924366 1 network.go:154] Getting subnets and availability zones for VPC vpc-name

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

or do I need to follow this steps for using my own custom ami?

https://github.com/elotl/kip/blob/master/docs/cells.md#bring-your-own-image

from kip.

myechuri avatar myechuri commented on July 28, 2024

do I need to follow this steps for using my own custom ami?

@saiharshitachava : you do not need to specify custom ami unless you want to override the default ami.

I0718 08:12:01.924366 1 network.go:154] Getting subnets and availability zones for VPC vpc-name

Can you please share log lines after this line?

CC @justnoise @ldx

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@myechuri i want to use custom ami hence mentioned it in provider yaml

There is no log after that line it continuosly tries to get zones and thats the last line i see in logs till the auth token expires

from kip.

myechuri avatar myechuri commented on July 28, 2024

i want to use custom ami hence mentioned it in provider yaml

@saiharshitachava : ah ok, thanks for clarifying.

There is no log after that line it continuosly tries to get zones and thats the last line i see in logs till the auth token expires

Thanks for sharing. Let me look into it and get back once @ldx and @justnoise are back online on monday. Sorry for the delay!

from kip.

ldx avatar ldx commented on July 28, 2024

or do I need to follow this steps for using my own custom ami?

https://github.com/elotl/kip/blob/master/docs/cells.md#bring-your-own-image

You need an AMI that has the cell agent on it, or have the instance download and start it at boot time via cloud-init.

Another thing you could try is increasing log verbosity, via adding --v=2 or even --v=5 to the argument list of Kip.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx so I essentially add my custom ami as below and upadate the init.yaml with the configuration mentioned so that it fetches at boot time..

Would that change or effect the base ami Im using in anyway?like modyfying the ami and stuff?

bootImageSpec:
cloudInitFile: /etc/kip/cloudinit.yaml
owners: "my id"
imageID: "ami-name"

cloudinit.yaml: |
runcmd:
- apt-get update && apt-get install -y ipset iproute2 iptables
- wget -O /usr/local/bin/itzo https://itzo-kip-download.s3.amazonaws.com/itzo-latest
- wget -O /usr/local/bin/kube-router https://milpa-builds.s3.amazonaws.com/kube-router
- wget -O /usr/local/bin/tosi https://tosi.s3.amazonaws.com/tosi
- chmod 755 /usr/local/bin/*
- mount --make-rprivate /
- /usr/local/bin/itzo -v=2 > /var/log/itzo.log 2>&1 &

from kip.

ldx avatar ldx commented on July 28, 2024

@ldx so I essentially add my custom ami as below and upadate the init.yaml with the configuration mentioned so that it fetches at boot time..

Would that change or effect the base ami Im using in anyway?like modyfying the ami and stuff?

bootImageSpec:
cloudInitFile: /etc/kip/cloudinit.yaml
owners: "my id"
imageID: "ami-name"

cloudinit.yaml: |
runcmd:

That should work for a Debian or Ubuntu based image.

Did you try increasing Kip log verbosity?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx I havent yet my ami image doesnt have cell agent so I m trying to follow the steps above..

I just wanted to confirm if there would be any change to my base image if I add a cloud-init.yaml ..Im assuming it would just download the binaries and required configs on to the guests at the time of spinjning an instance and would nt change anything in base ami..Please confirm..

I will be increasing the verbosity now..

Also please confirm if this is the way to do this by providing below in provider.yaml..Also mine base image is of SLES so I added zypper

cells:
bootImageSpec:
cloudInitFile: /etc/kip/cloudinit.yaml
owners: owner id
imageIDs: ami-name
defaultInstanceType: "t3.nano"
defaultVolumeSize: "2G"
nametag: kip
kubelet:
cpu: "4"
memory: "6Gi"
pods: "6"
cloud-init:
cloudinit.yaml: |
runcmd:
- zypper update && zypper install -y ipset iproute2 iptables
- wget -O /usr/local/bin/itzo https://itzo-kip-download.s3.amazonaws.com/itzo-latest
- wget -O /usr/local/bin/kube-router https://milpa-builds.s3.amazonaws.com/kube-router
- wget -O /usr/local/bin/tosi https://tosi.s3.amazonaws.com/tosi
- chmod 755 /usr/local/bin/*
- mount --make-rprivate /
- /usr/local/bin/itzo -v=2 > /var/log/itzo.log 2>&1 &

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

Ok I created a seperate config map for cloudinit.yaml and attached it to the pod as a volume by mentioning it in statefulset..

Giving the log below..I still dont see an instance created and pods are in pending state

kubectl exec -it kip-provider-0 -c kip cat /etc/kip/cloud/cloudinit.yaml
runcmd:

My provider.yaml

apiVersion: v1
cloud:
aws:
region: "us-east-1"
vpcID: "vpc-name"
subnetID: "subnet-id"
etcd:
internal:
dataDir: /opt/kip/data
cells:
cloudInitFile: /etc/kip/cloud/cloudinit.yaml
bootImageSpec:
owners: id
imageIDs: ami-name
defaultInstanceType: "t3.nano"
defaultVolumeSize: "2G"
nametag: kip
kubelet:
cpu: "4"
memory: "6Gi"
pods: "6"

I0718 18:17:55.970817 1 opencensus.go:138] updated node status in api server node.Status.Conditions=[{Ready True 2020-07-18 18:17:55 +0000 UTC 2020-07-18 18:17:25 +0000 UTC KubeletReady kubelet is ready} {NetworkUnavailable False 2020-07-18 18:17:55 +0000 UTC 2020-07-18 18:17:25 +0000 UTC RouteCreated RouteController created a route} {OutOfDisk False 2020-07-18 18:17:55 +0000 UTC 2020-07-18 18:17:25 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2020-07-18 18:17:55 +0000 UTC 2020-07-18 18:17:25 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2020-07-18 18:17:55 +0000 UTC 2020-07-18 18:17:25 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure}]
I0718 18:17:55.970849 1 node.go:294] Successful node ping provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=
I0718 18:18:05.004507 1 server.go:142] 404 request not found uri=/stats/summary/ vars=map[]
I0718 18:18:05.017972 1 server.go:142] 404 request not found uri=/stats/summary/ vars=map[]
I0718 18:18:05.099419 1 server.go:142] 404 request not found uri=/stats/summary/ vars=map[]
I0718 18:18:05.971030 1 server.go:794] received node ping
I0718 18:18:05.971052 1 controller.go:160] node ping
I0718 18:18:05.971063 1 controller.go:168] node pong
I0718 18:18:05.975603 1 opencensus.go:138] got node from api server method=UpdateNodeStatus

Thats the last log getting status of nodes from api server

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

I even tried with the elotl ami image..I still see nothing ..When I enable debug loag with level 5..It just shows me the node status updates I posted above..

I0718 18:43:55.805344 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"image-manager-init-certs-hxjgx", UID:"0cb2aa1d-c926-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"80054014", FieldPath:""}): type: 'Normal' reason: 'ProviderUpdateSuccess' Update pod in provider successfully provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=
I0718 18:43:55.806856 1 opencensus.go:154] Updated pod in provider pod=calico-node-n4ff8 namespace=kube-system
I0718 18:43:55.807025 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"calico-node-n4ff8", UID:"0cabb71c-c926-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"80054005", FieldPath:""}): type: 'Normal' reason: 'ProviderUpdateSuccess' Update pod in provider successfully provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=
I0718 18:44:00.786536 1 controller.go:106] node status changed ready: true network unavailabe: false
I0718 18:44:02.683744 1 node_controller.go:735] latest image for spec map[filters:name=elotl-kip-* owners:689494258501]: {ami-0264146701c835631 elotl-kip-556-20200317-233835 xvda 2020-03-18 00:35:46 +0000 UTC}
I0718 18:46:24.863829 1 network.go:154] Getting subnets and availability zones for VPC vpc-0077d8af04b1e78ff
I0718 18:46:25.211581 1 network.go:154] Getting subnets and availability zones for VPC vpc-0077d8af04b1e78ff

from kip.

ldx avatar ldx commented on July 28, 2024

Two more things to check:

  • Try increasing verbosity for kip temporarily to --v=5. Try to create a new pod after you restart it with --v=5, and see if there's anything in the logs on Kip starting a new instance for the pod, etc.
  • Check the status of Kip cells via kubectl get cells.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

I see something like this

I0718 19:58:25.399636 1 podcontroller.go:299] starting workers provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=

kubectl get cells --all-namespaces
No resources found.

from kip.

ldx avatar ldx commented on July 28, 2024

Do you have any pods assigned to kip-provider-0? What does kubectl get pods --field-selector=spec.nodeName=kip-provider-0 -A show?

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

yes there are..All the pods are ds except the one which says test..Anything I can check in AWS console before the node spins up.?

NAME READY STATUS RESTARTS AGE
audit-logging-fluentd-ds-zl7cv 0/1 Pending 0 9m31s
calico-node-8txlk 0/1 Pending 0 9m31s
test-577948b979-tzzhr 0/1 Pending 0 50s
image-manager-init-certs-jj2hv 0/1 Pending 0 9m31s
k8s-proxy-69sw2 0/1 Pending 0 9m31s

Also the node shows as agent in get nodes..Is that expected and It says starting kube-proxy but I guess its never satrted?

kip-provider-0 Ready agent 11m

Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


kube-system audit-logging-fluentd-ds-zl7cv 300m (7%) 300m (7%) 512Mi (8%) 512Mi (8%) 11m
kube-system calico-node-8txlk 250m (6%) 0 (0%) 100Mi (1%) 0 (0%) 11m
kube-system ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr 100m (2%) 400m (10%) 128Mi (2%) 500Mi (8%) 3m14s
kube-system image-manager-init-certs-jj2hv 10m (0%) 0 (0%) 64Mi (1%) 0 (0%) 11m
kube-system k8s-proxy-69sw2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 660m (16%) 700m (17%)
memory 804Mi (13%) 1012Mi (16%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message


Normal Starting 11m kube-proxy, kip-provider-0 Starting kube-proxy.

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx This is log of kip..The thing which is concerning to me is the subnet ID its not pocking the subnet ID I gave in provider.yaml but implicitly using a different one..Is that expected?mostly posted the log rest all is node status ping and pong

I0720 13:02:10.222657 1 reflector.go:153] Starting reflector *v1.Pod (1m0s) from k8s.io/client-go/informers/fact
ory.go:135
I0720 13:02:10.222800 1 reflector.go:188] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.g
o:135
I0720 13:02:10.223229 1 reflector.go:153] Starting reflector *v1.Service (1m0s) from k8s.io/client-go/informers/
factory.go:135
I0720 13:02:10.223243 1 reflector.go:188] Listing and watching *v1.Service from k8s.io/client-go/informers/facto
ry.go:135
I0720 13:02:10.223399 1 reflector.go:153] Starting reflector *v1.Secret (1m0s) from k8s.io/client-go/informers/f
actory.go:135
I0720 13:02:10.223417 1 reflector.go:188] Listing and watching *v1.Secret from k8s.io/client-go/informers/factor
y.go:135
I0720 13:02:10.223519 1 reflector.go:153] Starting reflector *v1.ConfigMap (1m0s) from k8s.io/client-go/informer
s/factory.go:135
I0720 13:02:10.223531 1 reflector.go:188] Listing and watching *v1.ConfigMap from k8s.io/client-go/informers/fac
tory.go:135
I0720 13:02:10.231274 1 decoder.go:225] decoding stream as YAML
I0720 13:02:10.231837 1 server.go:119] starting internal etcd
I0720 13:02:10.231896 1 etcd.go:105] Setting etcd compaction mode to periodic
I0720 13:02:10.231904 1 etcd.go:111] Setting etcd compaction interval to 1 hour
2020-07-20 13:02:10.287604 I | etcdserver: name = default
2020-07-20 13:02:10.287627 I | etcdserver: data dir = /opt/kip/data
2020-07-20 13:02:10.287633 I | etcdserver: member dir = /opt/kip/data/member
2020-07-20 13:02:10.287636 I | etcdserver: heartbeat = 100ms
2020-07-20 13:02:10.287639 I | etcdserver: election = 1000ms
2020-07-20 13:02:10.287643 I | etcdserver: snapshot count = 100000
2020-07-20 13:02:10.287656 I | etcdserver: advertise client URLs = http://localhost:2379
2020-07-20 13:02:10.318468 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit in
dex 6976
2020-07-20 13:02:10.318904 I | raft: 8e9e05c52164694d became follower at term 864
2020-07-20 13:02:10.318931 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 864, commit: 6976, applied: 0, lastindex: 6976, lastterm: 864]
2020-07-20 13:02:10.329177 W | auth: simple token is not cryptographically signed
2020-07-20 13:02:10.330434 I | etcdserver: starting server... [version: 3.3.18, cluster version: to_be_decided]
2020-07-20 13:02:10.330991 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2020-07-20 13:02:10.331099 N | etcdserver/membership: set the initial cluster version to 3.3
2020-07-20 13:02:10.331146 I | etcdserver/api: enabled capabilities for version 3.3
2020-07-20 13:02:11.520966 I | raft: 8e9e05c52164694d is starting a new election at term 864
2020-07-20 13:02:11.521024 I | raft: 8e9e05c52164694d became candidate at term 865
2020-07-20 13:02:11.521044 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 865
2020-07-20 13:02:11.521060 I | raft: 8e9e05c52164694d became leader at term 865
2020-07-20 13:02:11.521314 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 865
2020-07-20 13:02:11.522250 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
I0720 13:02:11.522282 1 etcd.go:129] Etcd server is ready to serve requests
I0720 13:02:11.522326 1 server.go:100] validating write access to etcd (will block until we can connect)
I0720 13:02:11.523290 1 server.go:110] write to etcd successful
I0720 13:02:11.523347 1 server.go:221] ControllerID: stvmugsmfvgwpmnxzdq2lge23m
I0720 13:02:11.523355 1 server.go:223] creating cert factory
I0720 13:02:11.532904 1 server.go:229] configuring cloud client
I0720 13:02:11.533004 1 config.go:228] using AWS region "us-east-1"
I0720 13:02:11.533226 1 config.go:248] Validating connection to AWS
I0720 13:02:11.533566 1 aws.go:123] Checking for credential errors
I0720 13:02:11.533592 1 aws.go:128] Using credentials from EnvConfigCredentials
I0720 13:02:11.533606 1 aws.go:134] Validating read access
I0720 13:02:12.478214 1 config.go:252] Validated access to AWS
I0720 13:02:12.634746 1 network.go:147] Current vpc: vpc-name 1st CIDR block
I0720 13:02:12.634775 1 network.go:154] Getting subnets and availability zones for VPC vpc-name
I0720 13:02:13.118498 1 network.go:274] Assuming implicit use of main routing table rtb-name for subnet-name
I0720 13:02:13.399280 1 aws.go:202] cells will run in a private subnet (no route to internet gateway)
I0720 13:02:13.399291 1 config.go:453] controller will connect to nodes via private IPs
I0720 13:02:13.716683 1 security_groups.go:73] security group name kip-name-cellsecuritygroup [sg-name]
I0720 13:02:13.716731 1 server.go:235] ensuring cloud region is unchanged
I0720 13:02:13.716739 1 server.go:161] ensuring region has not changed
I0720 13:02:13.716835 1 server.go:242] creating internal client certificate
I0720 13:02:13.717250 1 server.go:247] starting cloud status keeper
I0720 13:02:13.717307 1 server.go:256] setting up instance selector
I0720 13:02:13.723858 1 server.go:268] validating default instance type
I0720 13:02:13.723874 1 server.go:274] setting up events
I0720 13:02:13.723966 1 server.go:277] setting up registry
I0720 13:02:13.727220 1 server.go:295] creating DNS configurer
I0720 13:02:14.031290 1 helpers.go:47] host nameservers [IP IP] searches [domain_name]
I0720 13:02:14.031558 1 server.go:302] determining connectivity to cells
I0720 13:02:14.031596 1 server.go:308] setting up health checks
I0720 13:02:14.031610 1 server.go:327] creating pod controller
I0720 13:02:14.031648 1 server.go:348] creating image ID cache
I0720 13:02:14.031659 1 server.go:351] checking cloud-init file
I0720 13:02:14.031674 1 server.go:358] creating node controller
I0720 13:02:14.031684 1 server.go:389] creating garbage controller
I0720 13:02:14.031699 1 server.go:401] creating metrics controller
I0720 13:02:14.031712 1 server.go:141] configuring k8s client from kubeconfig
I0720 13:02:14.031944 1 server.go:415] creating cell controller
I0720 13:02:14.031971 1 decoder.go:225] decoding stream as YAML
I0720 13:02:14.546491 1 server.go:430] creating node status controller
I0720 13:02:14.546536 1 server.go:470] registering internal event handlers
I0720 13:02:14.546551 1 server.go:476] starting controller manager
I0720 13:02:14.546590 1 controller_manager.go:111] Starting controllers
I0720 13:02:14.546611 1 controller_manager.go:116] Starting NodeStatusController
I0720 13:02:14.546621 1 controller_manager.go:116] Starting PodController
I0720 13:02:14.546626 1 controller_manager.go:116] Starting NodeController
I0720 13:02:14.546636 1 controller_manager.go:116] Starting GarbageController
I0720 13:02:14.546642 1 controller_manager.go:116] Starting MetricsController
I0720 13:02:14.546650 1 controller_manager.go:116] Starting CellController
I0720 13:02:14.546667 1 controller_manager.go:119] Finished starting controllers
I0720 13:02:14.546911 1 server.go:496] validating boot image spec
I0720 13:02:14.547054 1 node_controller.go:445] Resume waiting on healty from 0 instances
I0720 13:02:14.551335 1 pod_controller.go:168] starting pod controller
W0720 13:02:14.672182 1 controller.go:113] UpdateNode() has not been called
I0720 13:02:14.827489 1 server.go:503] done creating instance provider
I0720 13:02:14.827569 1 server.go:781] ConfigureNode
I0720 13:02:14.827578 1 controller.go:180] setting pod CIDRs to [VPC CIDR]
I0720 13:02:14.830020 1 root.go:232] Initialized provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=
I0720 13:02:14.830052 1 server.go:800] registering node status callback
I0720 13:02:14.830059 1 controller.go:174] registered node status callback
I0720 13:02:14.830127 1 shared_informer.go:227] caches populated
I0720 13:02:14.830148 1 podcontroller.go:257] Pod cache in-sync provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=
I0720 13:02:14.830199 1 server.go:753] GetPods
I0720 13:02:14.831936 1 convert.go:508] skipping PackagePath "/etc/resolv.conf"
I0720 13:02:14.831956 1 convert.go:508] skipping PackagePath "/etc/hosts"
I0720 13:02:14.832014 1 convert.go:508] skipping PackagePath "/etc/resolv.conf"
I0720 13:02:14.832020 1 convert.go:508] skipping PackagePath "/etc/hosts"
I0720 13:02:14.832041 1 convert.go:508] skipping PackagePath "/etc/resolv.conf"
I0720 13:02:14.832050 1 convert.go:508] skipping PackagePath "/etc/hosts"
I0720 13:02:14.832069 1 convert.go:508] skipping PackagePath "/etc/resolv.conf"
I0720 13:02:14.832076 1 convert.go:508] skipping PackagePath "/etc/hosts"
I0720 13:10:55.854751 1 opencensus.go:138] sync handled key=kube-system/ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr
I0720 13:10:55.855428 1 server.go:713] GetPod "ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr"
I0720 13:10:55.855497 1 server.go:648] CreatePod "ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr"
I0720 13:10:55.856169 1 instanceselector.go:400] chose instance t3.nano
I0720 13:10:55.858024 1 opencensus.go:154] Created pod in provider namespace=kube-system pod=ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr
I0720 13:10:55.858102 1 opencensus.go:138] Processed queue item method=handleQueueItem
I0720 13:10:55.859024 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"ecm-pdfextract-connector-dit-cdl4-577948b979-tzzhr", UID:"72310049-ca8a-11ea-9f5e-005056ab3b00", APIVersion:"v1", ResourceVersion:"80897177", FieldPath:""}): type: 'Normal' reason: 'ProviderCreateSuccess' Create pod in provider successfully provider=kip operatingSystem=Linux node=kip-provider-0 watchedNamespace=

from kip.

saiharshitachava avatar saiharshitachava commented on July 28, 2024

@ldx @myechuri @justnoise Any debug mode or manual check which I could do inside kip container to check if kip can bring up the aws instance?

At this point I dont see much in logs of kip like why there is no instance coming up..if we can manually trigger teh call from within the container to aws might be we see an error?

Please let me know if this is possible..

Also I do have a IAM role which I could use to bring up instances..how do I provide that variable or name?Does the kip-provider pod essentially need to run on aws instance to use an IAM role?

from kip.

justnoise avatar justnoise commented on July 28, 2024

Hi @saiharshitachava, thanks for your patience with this process.

I've read through the logs you posted (those were helpful!), traced through the code to double check some assumptions and I'm very curious why instances are not being created for your pods. I see that the pods are being created in K8s, and then in the kip-provider. At that point, kip should see that there are pending pods and create VMs for those pods. The log line I would expect to see is Starting instance for node: {<bunch of node info>}. Unfortunately I don't see that message in any of the logs. This is an interesting situation where some information isn't bubbling up to the logs or kubernetes.

Any debug mode or manual check which I could do inside kip container to check if kip can bring up the aws instance?

There is and it could be very helpful! If you start the kip container with the additional argument --debug-server kip starts a listener that can be used to interact and query kip's internal state using kipctl which installed at the root of the kip image. To use that:

  1. Add --debug-server to the list of arguments passed to kip:

    kubectl -nkube-system edit statefulset kip-provider

  2. Exec into the newly launched kip pod:

    kubectl -nkube-system exec -it kip-provider-0 -c kip -- /bin/bash

  3. Query for the state of pods in kip:

    /kipctl get pods

This query is to see if there a difference between the state of pods in kip and the state of pods in k8s. The pending pods in k8s should be shown as "Waiting" in kip. I'm curious if kipctl shows (a) no pods (b) pods in a state that isn't "Waiting".

  1. Query for a list of events in kip (we should really send these back to k8s but haven't had the chance to implement this).

    /kipctl get events

This query helps show if there something happening to the pods after they are created in kip. Are they immediately terminated/deleted for some reason and we're not logging that? The list of kip events should show if a pod is getting created and then terminated. If we are seeing pods being terminated immediately after creation then we're onto something. If the pods are sitting in kip in the "Waiting" state then... well... I'm going to have to think about that one a bit more.

from kip.

myechuri avatar myechuri commented on July 28, 2024

@saiharshitachava : would you be open to joining our Slack channel to shorten response time while we triage this issue? If so, can you please send me your email address? i am at [email protected] . Thanks!

from kip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.